By Sean Rosier and Ryan Myers
May 2021
As discussed in Pinnacle Reliability's Economics of Reliability reports, as an industrial community, we’re more reliable than we’ve ever been. However, the average reliability leader is challenged with greater complexity than ever before. Do the following programs sound familiar? Fixed equipment inspections, risk-based inspection, reliability centered maintenance, spare parts optimization, process safety management compliance, turnaround planning, process hazard analysis, special emphasis, critical rotating equipment machine learning, structural inspection optimization, and the list goes on. With the ever-expanding innovation of new technologies, advances in data science and computational power capable of processing large volumes of data, and the continued pressure from regulation and business performance, more organizations are looking to employ analytics to help drive reliability performance. And while doing this successfully can be powerful for the reliability of an operation, getting to success requires thinking about our programs differently and tearing down traditional reliability siloes.
How to Use Data-Driven Reliability to Develop Commonality Between Reliability and Mechanical Integrity Programs
Pinnacle recently conducted an exercise in which representatives from various maintenance and reliability backgrounds and expertise came to evaluate how data can be more simply leveraged across the entire reliability operation. It’s important to note that the discussion was focused on work process and data management, and did not cover topics such as personnel, culture, and organizational structure (though important).
A foundational assumption of the exercise was, in general, reliability and MI programs are primarily based on the same premise: collect data, analyze the data, take actions based on the analysis, and adjust as needed. This process is repeated with the goal of improving the analytics each time. However, the majority of tasks performed in a facility are primarily data gathering efforts. American Petroleum Institute (API) inspections, Nondestructive Examination (NDE), operator rounds, and predictive technologies all require data gathering efforts that cost facilities millions of dollars a year. Once maintenance tasks are derived from this gathered data, it easy to see how even minor inefficiencies can result in increased costs.
When asked in the exercise how to further merge a reliability and MI program, the response was: start by understanding the basics. More specifically, start by understanding what each program is trying to achieve and the mechanics of how the goals are achieved. Once these are identified, derive common processes that focus on the data that drives reliability rather than just the technologies themselves.
The following are some common questions that were considered during the exercise:
- What data and data structures are needed?
- How is the data being collected?
- How is probability of failure (PoF) analyzed?
- How is consequence of failure (CoF) determined?
- How are tasks being selected?
- How are the work process controls, such as planning, scheduling, and backlog and contractor management functioning?
Example Integration: Failure Mode and Effects Analysis (FMEA)
As an example, consider just one area: Failure Mode and Effects Analysis (FMEA). The FMEA process appears intuitive and is a common tool used as the basis for both reliability and MI programs. As such, it is often perceived that FMEAs are used in the same manner for both reliability and MI, however, a few fundamental differences exist in development of the FMEA, task utilization, and data insights.
Development
-
Reliability: Assets managed as part of a reliability program generally have multiple performance requirements. Therefore, it is critical to start by defining all functions for an asset and then working through the process of identifying failure modes, failure mechanisms, and the subsequent mitigation tasks.
- MI: The primary function from a MI perspective is to prevent loss of containment. As such, the functional performance to prevent loss of containment is implied and the development focuses on the failure modes as the starting point.
Task Utilization
-
Reliability: In a reliability program for non-fixed equipment, a combination of data-gathering, preventive, and corrective tasks are utilized to reduce accelerated degradation, take action to maintain the availability of an asset, or reset the asset’s life cycle in the most cost-efficient manner.
- MI: Most tasks required for fixed equipment will typically be classified as data-gathering or corrective, not preventive. This is because, by design, most assets will incur degradation with preventive mitigations requiring engineering alterations. Therefore, the focus from a task perspective will consist of collecting data to predict when a critical point will be reached, enabling corrective maintenance to be performed in a controlled, cost-effective manner.
Data Insights
-
Reliability: Reliability programs assign tasks based on the failure mode and the data collected will reflect the failure mechanism occurring. For example, a pump that is susceptible to failure may have a task assigned to perform an oil analysis which shows there are increased solids in the oil leading to a higher failure potential.
- MI: The traditional MI assessment approach is to assign tasks based on expected failure mechanisms. However, the data collected will typically reflect the failure mode. The most common measurable data collected is thickness, which is typically a representation of the overall wall thickness and would encompass all active degradation mechanisms occurring at that location. Consider a pipe expecting external atmospheric corrosion and internal sour water corrosion. Ultrasonic Thickness (UT) data would reveal an overall thickness and degradation rate, not how much wall loss or degradation rate was specifically caused externally or internally. Thus, the data collected is representing the failure mode, thinning, rather than the specific failure mechanism, atmospheric corrosion or sour water corrosion.
These are just a few examples. However, one can quickly begin to see that excess data does not contribute to making an asset more reliable especially when the following questions are considered:
- What data do I truly need and how often do I need it?
- How much information needs to be included in the FMEA?
- In what logic sequence does the information need to flow to conduct the FMEA?
- How does this information contribute to predicting PoF or CoF?
- Do the mitigation tasks from this information bring any benefit and can it be quantified?
During our exercise, we simplified the terms and verbiage down to their simplest forms to focus on what was actually needed rather than technical jargon. For example, we did not use traditional, familiar terms like “damage mechanism” or “mechanism activation.” As a result, we were able to consolidate the headers of our common FMEA into six primary columns:
- Function: What are the asset’s performance requirements?
- Failure Mode: What can cause the loss of function?
- Failure Mechanism: Why is the failure occurring?
- Surveillance Technique: What data is being capture to trend and calculate the assets probability of Failure
- Condition Monitoring Location (CML): the specific location where the data is being taken or originating from. For instance, you might be trending process flow through a process historian. However, the CML would be the specific transmitter number that is sending the data.
- Maintenance Task: What tasks will be done to mitigate the failure based when an establishes critical value threshold has been reached.
Through this consolidation, we were able to collaboratively develop a logic flow of data that can be used by all disciplines and is not dependent on equipment type or technology. This takes advantage of common data and overlapping activities that traditionally would be duplicated. We also found that task selection was much more concise and valuable as a result of the consolidation, which leads to improved methods of quantifying the value of tasks and increased integration of probability and consequence of failure analyses. Clearly, a couple of assets might need an additional logic field, however, those can be handled in one-off analyses.
An example illustrating how to use the common structure this type of analysis would be:
Centrifugal Pump:
Function
|
Failure Mode
|
Failure Mechanism
|
Surveillance Technique
|
CML
|
Maintenance Task(s)
|
Pump 100 gallons per minute
|
Bearing Failure
|
Misalignment
|
Vibration Analysis
|
Drive End Bearing Housing, Radial X, RMS Velocity
|
Align Shaft
|
Replace Bearing
|
Accumulator Drum Boot:
Function
|
Failure Mode
|
Failure Mechanism
|
Surveillance Technique
|
CML
|
Maintenance Task(s)
|
Assist in process/chemical reactions to meet product specifications
|
Thinning
|
Ammonium Chloride Corrosion
|
Manual UT Scan
|
Cylindrical Shell CML 1-10
|
Repair Asset
|
Digital RT
|
Cylindrical Shell CML #1
|
In addition, now that you have a common analysis structure, you can start to merge reliability and MI aspects into a single analysis to ultimately focus on the reliability of the asset. For instance, let’s consider a Fin-Fan heat exchanger. With the traditional analysis structure, you would have to go through and build up the analysis for the corrosion modeling and then build a second separate analysis to focus on the rotating parts. In this new data-driven model, both perspectives are built under one common analysis that clearly defines the respective failure modes, failure mechanisms, surveillance techniques, CMLs, and maintenance tasks to get a complete analysis encompassing the total reliability of the asset. The entire analysis is rooted in what data to collect and analyze to predict when a maintenance task should be triggered to mitigate the failure of an asset before it occurs”
Considerations for Integration Strategies
To successfully use data to unify your programs, start with the end in mind. Visualize where you want the data to take you and work backward. This mindset will help you focus on only acquiring the information you need and performing the tasks required to directly improve reliability. Next, you will need to determine:
- Key decisions you need to make for the process
- Type of information and proper needed to make key decisions effectively and efficiently
- Calculations required to produce this information
- Raw data required to run these calculations
- Best way to acquire this raw data
As with all transformative efforts worth pursuing, the integration process will never be perfect. The key to success is to continually make tangible progress which can be practically implemented to drive quantifiable results.
Conclusion
It is possible to unify, and many ways simplify, your reliability programs with data. The key to using data is understanding what data to use and how to use it rather than disparate methodologies. In today’s reliability and MI programs, there are a number of interfaces and data sources that create data management challenges. Employing common processes and tools will not only enable your organization to leverage the strengths of each discipline, but also start moving your program toward a data-driven maintenance and reliability operation.