A key question in operating any plant is this: Are we doing the right amount of maintenance? Are we doing the right type of maintenance?
How do you know? Most plants we visit have developed preventive maintenance programs over the years for a hodge-podge of reasons. In some cases equipment PM’s have been based on OEM recommendations. In some cases, PM’s have been developed in response to major failures. But we almost never find that a systematic approach, based on manufacturing value, has been deployed to develop the care program for the asset.
This article gives a step-by-step method to systematically develop an asset healthcare program, resulting in the necessary reliability to meet your business plan, at the lowest cost. It discusses the concepts of asset healthcare, gives an overall closed-loop process to develop your program, and identifies how to measure success and make required adjustments.
For many years as I have given public and industry presentations I have asked the question: “How many of you (the audience) believe you have a good or excellent Preventive Maintenance program?” Without exception there are no hands raised in the audience.
What makes developing such a program so difficult? Other difficult things are accomplished in maintenance improvement: sometimes planning and scheduling are implemented plant-wide with good results. Frequently a storeroom offers good service, while minimizing total inventory cost. So why is preventive maintenance so difficult?
The elements of PM’s are well known. You need a set of tasks performed at a certain frequency, and these tasks are scheduled and performed thoroughly by qualified craftsmen or operators. Some of the problem, of course, is simply trying to implement prevention in a reactive environment. Work isn’t planned, parts aren’t available, or the equipment isn’t made available due to missing production schedules. That isn’t the problem, though, where good planning and scheduling exist. So the issue comes down to identifying the right tasks, and the proper frequencies.
Reliability Centered Maintenance (RCM) is often selected as the tool of choice for plants advanced enough to understand that the prevention tasks must be aimed at correcting specific defects or failure causes. This fails, too, because there is no plant in my experience that has the resources or fortitude to perform RCM studies on every piece of equipment or aspect of the facility. Risk-based RCM comes closer to the mark as a tool, but still tends to look at specific equipment. It is not used to develop the plant-wide prevention plan.
REPLACING “PREVENTIVE MAINTENANCE” WITH “ASSET HEALTHCARE” AS THE OPERATIVE CONCEPT
We think the first part of the issue is semantics or definitional: the term preventive maintenance, or even the more encompassing “preventive-predictive” maintenance fails as a concept. It connotes for most people, activities more than intent. For that reason we prefer the term Asset Healthcare.
When we examine the concept of healthcare as it applies to people, we understand it to mean maintaining function, or the condition of the body to perform certain activities. Likewise, we understand that our objective in maintenance is to assure the likelihood (probability) that equipment can perform a certain function when required. We understand, too, that reactive maintenance cannot assure that probability, but only minimize the impact of failure. For these reasons, we encourage a new concept (not of our invention, but not commonly used) of equipment or asset healthcare. Our preference is to use the word “asset” because it applies to the facility as well as the production equipment. In most cases, failure of the facility degrades our production capability in a similar manner to equipment problems. Thus we encourage clients to start with the concept of assuring asset healthcare.
INTRODUCING “PROBABILITY” AS A NECESSARY CONCEPT IN DEVELOPING THE ASSET HEALTHCARE PROGRAM
Another concept we need to introduce is that of probability. We know that decreasing the frequency of a failure mode increases the probability of performing the intended function. However, without an intimate understanding of molecular strength of every aspect of every component, and the forces to which it will be subjected, we are left with uncertainty about the timing of a given failure mode. Thus our goal is to manage the probability of equipment performing its intended function.
Why is this distinction important? Because as we approach the ultimate (100% assured availability) costs for maintenance go up exponentially. Our goal is being able to answer the important question: “What is the appropriate type and amount of maintenance necessary to assure a specified level of performance for the asset?”
All of our asset healthcare tasks (preventive maintenance) need to answer this question, or we will never know if we have succeeded in our goals.
THE ASSET HEALTHCARE CLOSED-LOOP PROCESS
In this figure we see a five-step process that describes a self-improving method for Asset Healthcare development and exection. Steps 3 and 4, Load and Schedule Work and Prepare and Execute Scheduled Maintenance are typical processes in the Planned Maintenance Cycle, and won’t get separate attention here. Steps 1 and 5, Create Measurement Process and Review and Analyze Variation, are also typical of any closed-loop process, but we will be identifying some new concepts here, so they will be covered, though not in great detail. Obviously the step that will get the most attention is number 2, Develop the Asset Care Program.
DEVELOPING THE ASSET HEALTHCARE MEASUREMENT PROCESS
We know we can’t permanently improve what we don’t measure. But in the plant environment the plethora of indicators that can be measured are overwhelming. There is a compelling need to simply the measurement process, to make this manageable in an era of downsized workforces.
There are, of course, leading or process measures that are required. These include PM (Asset Healthcare task—or AHT) compliance, % AHT to total work hours, etc. We need a measure of results as well.
We won’t dispute the value of those who measure “Uptime”, or “Overall Equipment Effectiveness”. These are excellent measures and give an overview to any plant that deploys them. Where they may have shortfalls is in identifying the cause of a problem. They don’t do much to identify the “delta” which needs work.
We have seen only a single plant that has maintained a plant-wide measure of “Mean Time Between Failure”. This requires a lot of data and continuous effort for reporting. It fails, however, to guide one from a business perspective: Where do we place our efforts and emphasis?
Instead of the above measures, we’d like to introduce a concept we learned from one of our clients: Cost of UnReliability (CoUR). This is an extension of the Cost of Quality concept used to measure deviations in quality theory. Figure 2 shows a graph of four years of tracking CoUR in a major facility with many operating units. One can quickly grasp, with clear evidence, where to place attention!
Fundamentally, CoUR measure the production value of the downtime for a department or a unit, and adds in the cost of repair, both labor and materials. We record and maintain a database for those CoUR events greater than X dollars. X, of course, depends on the production value of your plant, and your visibility and dedication to recording incidents.
Key data elements include:
- Date and time of incident
- Location (Department, equipment center or unit) and specific equipment number that failed
- Downtime and valuation of downtime
- Repair costs (usually the work orders that apply)
- Failure reason code
- Failure description
Using the power of the database, all failures can be sorted by location, size, reason code, etc. Also, for this client, when the cost of the failure hit a threshold (e.g. $100,000), a Root Cause Failure Analysis is required.
The advantage to CoUR is in the planning process. Practically, what has cost us money? Are there patterns? Where do we focus our efforts? It becomes a practical scorecard overall, to see if our CoUR is declining, while also directing our work towards specific failure causes. It records history in a way this is impractical for a CMMS, without the limitation of a huge data collection workload.
So, on to the task of creating the Asset Healthcare tasks appropriate for a plant or facility.
TYPICAL ASSET HEALTHCARE TASK
DEVELOPMENT AND RATIONALE
We might spend a moment considering two questions. First, in the history of this plant, when were asset healthcare tasks created? And second, by what methods were they created?
We seldom find that greenfield plants develop their prevention program before the plant starts operations. Usually this simply hasn’t been part of the start-up plan. When it is, there isn’t sufficient time or money given to its development. And where in isolated cases AHT’s were created for specific equipment, it was usually done according to the vendor’s specifications, without the benefit of experience within the operational context.
The next time we might see PM’s developed is when there are significant failures that gain lots of attention. Sometimes these are one-time events, but reaction requires we dwvelop a PM, and it gets generated every month, forever. We may also put a team together to develop PM’s. These are done as well as possible, with best guesses as to appropriate tasks and frequencies.
These are usually the most valuable of the PM collection that gets printed out each period and distributed to the craftsmen
We want to change these methods forever. What we seek is an effective, simple, measurable system that enables us to create a proactive maintenance strategy for every piece of equipment in the plant. Currently RCM, in its many flavors, is identified as the method to accomplish this task. In most applications, however, it is too cumbersome to apply to all the equipment in the plant. We propose a hybrid method that meets these characteristics:
- Covers the entire equipment spectrum
- Applies easy to understand rules that can be modified with experience
- Adds value during it’s development, not just in the future state
- Minimizes data reentry
- Can be implemented by the hourly workforce with minimal guidance beyond training
THE SEVEN STEPS TO DEVELOPING ASSET HEALTHCARE TASKS
Our Asset Healthcare System used employs these steps:
Select and Install the Software Tool (Asset Healthcare System, or AHS)
- Develop Hierarchy
- Develop Criticality
- Develop Equipment Condition
- Establish Strategies for Component Care
- Develop Failure Modes and Effects
- Develop Maintenance Activities
Working with one unit or department at a time, these steps develop the system of proactive Asset Healthcare. We discuss each one in overview.
1. Acquire, Install and Train in AHS Software. After working in many situations without a fully functional software tool, we have found that there is a better way. Find a good tool and use it to its maximum capability! We searched for and reviewed over 200 “RCM” software tools, and found a handful that met our requirements:
We use this tool for all purposes in the following steps.
2. Develop the Equipment Hierarchy. In many instances an equipment hierarchy exists in electronic form somewhere in the plant, and usually is embedded in the CMMS. We suggest going as many as four to five levels in describing the equipment hierarchy, depending on how far down it is necessary to go to get to a maintainable component. This may be a pump, motor, gearbox, or electrical panel. This initial identification of the equipment gives us the basis to develop a proactive maintenance strategy for every component.
One of the benefits of this step is the equipment owners, the operators and the maintainers, perform this task. In doing so, they educate themselves about the equipment, going over drawings, listings, and manuals. We hear many positive comments by the team during this task, which takes several days: “I didn’t know that was how it worked!”, or “Is there really a filter there? We’ve never cleaned it!” Another opportunity is to identify to engineering where the drawings are out-of-date, where changes haven’t been documented.
3. Develop Criticality. In order to determine the level of maintenance a component should receive, we need to understand its value in the operating context. To keep it simple, we ask “How critical is the process to which this is a part? Our answer set would be: a) Must be running all the time, b) Must run most of the time and on demand, or c) Must run occasionally. One can also use CoUR as a gauge of process criticality: for instance, using the value of any hour of downtime as the range of criteria.
Once the process has been classified on criticality, we classify the component. In this case, we use a number instead of a letter.
The result will be a table of equipment with associated criticalities, all entered into the Asset Healthcare System, as shown in Figure 6.
4. Develop Equipment Condition. For several reasons we now take time to evaluate the condition of the highest segment of critical equipment, at a minimum, all H-1’s and H-2’s. Our reasons to do this:
- We can get an immediate impact on plant performance and safety where we can eliminate defects on this highly critical equipment.
- In some cases we will identify conditions that require a longer term solution, e.g. a motor that is run beyond its limits. This gives time to plan and schedule intervention before the equipment fails.
- Evaluating the equipment, by the operations staff, creates the basis of ownership, and developing operator’s inspections “rounds”.
- This information is part of the annual planning process, to help determine the material and labor costs and schedule required to meet plan for the next year.
Once again we take a simplified approach. For each class of equipment, we create a template for evaluating component condition. Using a simple yes/no evaluation for each category we can evaluate the overall condition of the equipment. An example is shown in Figure 7. Any equipment whose composite health falls below a threshold, say 70%, is written up for attention with a work request.
5. Develop Strategies for Component Care. At this point we have created the equipment list for the unit down to the maintainable component; we have classified the component’s criticality, and we know its condition and operating requirements. We are now in a position to classify the type of care (maintenance) it should receive.
Types of maintenance include:
- Predictive based on Time or History
- Condition Monitoring
- Predictive base on Condition Projections
- Continuous Monitoring
- Requires FMEA or Tap-Root Analysis
We identify which type of maintenance to perform based on a simple matrix, once again applied by the unit team. Figure 8 show a model that uses CoUR as a basis to make the maintenance strategy.
6. Develop Failure Modes and Effects. For equipment whose criticality is high, we catalog the ways in which it has failed in the past, based on experience of the team, and identify the cause and effects of those failures. Where there is high criticality, we need to specially design our maintenance activities based on the failure modes and causes.
FMEA is a significant part of performing an RCM study, which we identified as being a large and often tedious effort. The methods we are presenting here don’t change the nature of the task; they do, however, create a structure where only those items that require the analysis get the effort. In addition, every other component in the system has a clearly considered maintenance strategy at the same time. The Asset Healthcare System we are using does simplify the task of performing RCM analyses, however, and gives us an audit trail that identifies how we made our decisions.
7. Develop (Asset Healthcare) Maintenance Activities. We now have identified the appropriate strategy for every component in the equipment system. We proceed to design the healthcare task according to the strategy. This makes run-to-failure a legitimate proactive AHT, because it is the best identified action for the business need.
Each strategy implies a set of activities that will optimize its use within the unit. Thus for each component, we proceed to design its specific care needs, and if we have performed a Failure Modes/Effects Analysis we designed specifically to mitigate the failure cause.
It would be the subject of another article to cover in sufficient detail the specific design process for asset healthcare tasks. However, our software tool, if appropriately chosen, has industry-specific equipment healthcare tasks that serve as templates in this design. In many cases the existing preventive and predictive tasks, if they have been found to be the best strategy, can be used as a starting point as well.
COMPLETING OUR CLOSED-LOOP PROCESS
Load and Schedule Work is the next activity after we have completed the development of the Asset Healthcare Program. In this activity we:
- Finalize jobs, with tasks, parts, skills, tools ,etc.
- Load into CMMS
- Set and optimize schedules as identified
To Prepare and Execute Scheduled Maintenance we:
- Develop the Weekly Schedule
- Identify that jobs have parts available
- Assure that the labor and equipment will be available
- Perform the scheduled asset care tasks and record the results (e.g. Conditions found, corrective maintenance required)
To Review and Analyze Variation we:
- Prepare Performance Indicator Reports (e.g. PM compliance, downtime)
- Review trends
- Review completed work orders for issues and opportunities
- Adjust frequencies as appropriate
- Flag failure modes for investigation & identify required changes in maintenance
BENEFITS WE HAVE SEEN
Operators and maintainers who apply this method to their production areas gain a much greater understanding of the equipment and production process, including:
- Equipment Function
- Component Criticality
- Proper maintenance activities and division of responsibilities
- Current condition of components
An other benefit is immediate improvements in operating procedures, equipment condition (through SWAT Teams), and levels of productivity. Improved cooperation between maintenance and production lead to significant gains in many areas. Finally, increased precision of maintenance or performing the right prevention for problems results in increases efficiency and decreased downtime.
Financial results include:
- Action Teams documented $1.5 million in benefits, more than enough to cover all the outside services
- A single large unit is producing at an increased rate valued at $15,000,000 in annual product
- Another refinery customized the process with our help, and identified $30,000,000 opportunity achievable with this process. Once we trained them, they are implementing successfully without SAMI’s help
A new language can help us break the paradigm of predictive and preventive maintenance as suitable for all types of risks and conditions. The Asset Healthcare framework will simplify the effort to create a comprehensive maintenance program for equipment, and match the effort and type of intervention specifically to the criticality of the system and the component.
Our results include proactive maintenance for all components, an ability to create an activity-based maintenance budget, gaining control of the work schedule, improved equipment health and lower costs.