Unplanned equipment failure costs manufacturing, energy, mining, and process industries billions in lost production, emergency repair costs, and in the most serious cases, safety incidents that are both preventable and catastrophic. The traditional approach to preventing these failures, running equipment until it breaks and then fixing it, or replacing components on a fixed calendar schedule regardless of their actual condition, is demonstrably inadequate. Reliability-Centred Maintenance is the methodology that replaces both approaches with a rigorous, evidence-based analysis of how equipment can fail, what the consequences of each failure mode are, and what maintenance tasks, if any, will prevent those failures cost-effectively.
RCM was originally developed for the commercial aviation industry in the late 1960s through the work of Stan Nowlan and Howard Heap at United Airlines, documented in a landmark 1978 report for the US Department of Defense that established the foundations of the methodology. It has since been applied across oil and gas, power generation, nuclear, manufacturing, defence, and increasingly in facilities management and healthcare. Its core insight, that most failures are not age-related and cannot be prevented by time-based replacement, changed the economics and effectiveness of industrial maintenance permanently.
Key Takeaways
|
89% Of equipment failures exhibit a failure pattern that is random or infant mortality-related rather than age-related, according to Nowlan and Heap’s original research. Time-based replacement strategies are effective for only 11% of failure modes. |
7 questions The RCM process is built around seven structured questions that, applied systematically to every function of every asset in scope, produce a defensible and optimised maintenance programme |
25-35% Typical reduction in total maintenance cost achieved by organisations that implement RCM rigorously, by eliminating unnecessary preventive maintenance tasks and replacing them with condition-based or run-to-failure strategies where appropriate |
SAE JA1011 The international standard that defines the minimum criteria any process must meet to be called RCM, published by the Society of Automotive Engineers and the primary reference for RCM practitioners and certification bodies |
- RCM is a structured methodology for determining what maintenance must be done to ensure that any physical asset continues to fulfil its intended functions in its current operating context.
- The core insight of RCM is that the right maintenance task for any asset function depends on the consequences of failure, the failure mode, and the detectability of the failure as it develops, not on a fixed schedule or the asset’s age.
- RCM produces four types of maintenance task: time-based (where age-related deterioration is demonstrated), condition-based (where failure can be detected before it occurs), failure-finding (for hidden failures that are not apparent until a demand is made), and run-to-failure (where the consequences of failure are acceptable and prevention is not cost-effective).
- RCM requires significant upfront investment in the analysis process but consistently produces a maintenance programme that is more effective at preventing failures, less costly to execute, and more rational in its allocation of maintenance resources than the time-based programmes it replaces.
Why Traditional Maintenance Strategies Fail
Traditional maintenance operates on two primary strategies: reactive maintenance (run-to-failure: fix it when it breaks) and preventive maintenance (time-based: replace or service it on a schedule). Both are appropriate for specific failure modes and asset types. Both are massively overused relative to what the evidence on equipment failure patterns actually supports.
The Nowlan and Heap research, which analysed thousands of failure records from commercial aircraft components, produced findings that fundamentally challenged the assumption that failure rates increase with age. They found six distinct failure patterns, of which only two (classic “bathtub” curve failure and wear-out failure with a defined useful life) are age-related and amenable to time-based replacement. The remaining four patterns, which accounted for approximately 89% of failures studied, showed no increase in failure probability with age and could not be prevented by time-based overhaul or replacement.
The implication is significant: applying time-based maintenance to assets with age-independent failure modes consumes maintenance budget, introduces human error (components incorrectly reinstalled after unnecessary overhaul), and fails to prevent the failures it is supposed to prevent. RCM replaces this assumption-driven approach with an evidence-based one: the maintenance task is selected based on a rigorous analysis of how each failure mode actually develops and what task, if any, can prevent it or reduce its consequences.
For organisations also managing asset integrity in high-hazard environments, our article on what is a safety management system covers how the safety management and maintenance management disciplines connect, particularly where equipment failure creates safety as well as operational consequences.
🔧 Build comprehensive asset and maintenance management capability
The Asset, Asset Integrity and Maintenance Management Certification Course develops the lifecycle management, maintenance strategy selection, and asset performance optimisation skills that maintenance and reliability professionals need to implement RCM and related disciplines effectively.
The Seven RCM Questions
SAE JA1011 defines RCM through seven questions that must be answered for each function of each asset in the analysis scope. These questions are sequential and interdependent: the answer to each question shapes what is possible in the subsequent questions.
| # | Question | What It Requires and Why It Matters |
|---|---|---|
| 1 | What are the functions and associated performance standards of the asset in its current operating context? | Every asset has a primary function (what it is supposed to do) and secondary functions (what else it must do to meet safety, environmental, control, or structural requirements). Performance standards define both the level of performance required and the minimum acceptable level below which the function has failed. Without clear functions and standards, failure cannot be defined. |
| 2 | In what ways can it fail to fulfil its functions (functional failures)? | A functional failure is any failure to fulfil a function at the required performance standard. For each function identified in Question 1, all the ways that function can fail must be enumerated. This step is often wider than it first appears: partial failures (the asset still works but below the required standard) are functional failures and must be included. |
| 3 | What causes each functional failure (failure modes)? | A failure mode is the specific physical or chemical process, design deficiency, human error, or external event that causes a functional failure. This is the level at which maintenance tasks are selected: the task must address the specific failure mode, not just the functional failure in general. Failure Mode and Effects Analysis (FMEA) is the standard tool for this step. |
| 4 | What happens when each failure occurs (failure effects)? | The failure effect describes what happens when a failure mode occurs: what evidence of failure is available, what physical damage or secondary effects follow, what impact on safety, environment, operations, and costs results. The failure effect description must be detailed enough to support the consequence evaluation in Question 5. |
| 5 | In what way does each failure matter (failure consequences)? | RCM classifies failure consequences into four categories: hidden failures (not apparent during normal operations and potentially hazardous when they occur), safety and environmental failures (direct risk to people or environment), operational failures (economic consequences through production loss or quality impact), and non-operational failures (only the cost of repair). Consequence classification determines how much effort and cost is justified in preventing the failure. |
| 6 | What should be done to predict or prevent each failure (proactive tasks and task intervals)? | This is where the maintenance task is selected. RCM evaluates three types of proactive task: condition-monitoring tasks (if the failure has a detectable developing stage), scheduled restoration or replacement tasks (if the asset has a clearly defined useful life), and failure-finding tasks (for hidden failures, to ensure protection systems function when needed). A proactive task is only selected if it reduces the probability of the failure to an acceptable level and is technically feasible and cost-effective relative to the consequences of failure. |
| 7 | What should be done if a suitable proactive task cannot be found (default actions)? | If no technically feasible and cost-effective proactive task exists: for safety or environmental consequences, the design must be modified to remove the hazard or reduce consequences to an acceptable level; for operational consequences, run-to-failure may be acceptable if the consequences and costs are tolerable; for non-operational consequences, run-to-failure is almost always the correct decision. |
The Four Maintenance Task Types in RCM
|
Condition-Based Tasks Monitor the asset’s condition to detect deterioration before it causes functional failure. Applicable when the failure has a detectable developing stage and the task can detect it in time to act. Examples: vibration analysis, thermography, oil analysis, ultrasonics, performance trending |
Time-Based Tasks Restore or replace the asset or component before a defined age limit at which failure probability becomes unacceptable. Only applicable where a meaningful age-failure relationship exists and the useful life is demonstrable. Examples: gearbox overhaul at defined hours, brake pad replacement at defined cycles, filter replacement at defined intervals |
Failure-Finding Tasks Test hidden functions periodically to confirm they are capable of performing when required. Applicable for protective devices (pressure relief valves, fire suppression systems, emergency shutdowns) whose failure is not apparent during normal operations. Examples: monthly trip testing, annual fire suppression system activation test, quarterly emergency shutdown valve operation check |
Run-to-Failure Allow the asset to run until it fails, then repair or replace. Appropriate where no proactive task is technically feasible or cost-effective and the consequences of failure are operationally and financially tolerable. Appropriate for: low-cost, redundant, or easily replaced components with non-safety failure consequences |
📋 Build maintenance planning and scheduling skills alongside RCM capability
The Certified Maintenance Planner Course develops the maintenance planning, scheduling, and work management skills that complement RCM analysis by ensuring that the maintenance tasks it produces are planned and executed effectively in practice.
Implementing RCM: The Practical Approach
Scoping and Prioritisation
RCM analysis is resource-intensive: a thorough analysis of a complex asset system can take days of facilitated workshop time per system. No organisation has the resources to apply full RCM to every asset simultaneously. The implementation must therefore begin with scoping and prioritisation: identifying which assets and systems will be analysed first based on their criticality (consequence of failure), the availability of failure data, and the current state of maintenance documentation.
Asset criticality ranking, using a consequence matrix that scores each asset on safety, environmental, production, quality, and cost dimensions of potential failure, provides the prioritisation framework. Systems ranked highest for criticality go first into the RCM analysis; lower-criticality systems may receive a simplified analysis or be managed by a streamlined maintenance approach that applies RCM thinking without the full analytical rigour.
The Analysis Team
RCM analysis is a team activity. The RCM team for each system should include: the operations staff who use the equipment and observe its performance; the maintenance technicians who service and repair it; the reliability or engineering professional who provides technical depth on failure modes and mechanisms; and an RCM facilitator who knows the methodology and guides the team through the seven questions. The facilitator does not need to be the expert on the asset: their role is to ensure the process is followed correctly and that the team’s knowledge is captured systematically.
For organisations developing the competency to run their own RCM analyses, our article on how to identify skills gaps in your workforce covers the structured approach to mapping current technical capabilities against the skills that RCM implementation requires, which is the starting point for any training investment in this area.
From Analysis to Implementation
The output of the RCM analysis is a set of maintenance tasks, each with a defined task description, task interval, skill requirement, and the failure mode it addresses. These tasks must be translated into a maintenance programme, entered into the Computerised Maintenance Management System (CMMS), resourced with the appropriate skills and materials, and executed on schedule.
The implementation phase is where many RCM programmes falter: the analysis is completed, the tasks are well-defined, but the maintenance execution system is not reliable enough to deliver the tasks consistently. Investing in maintenance planning and scheduling capability alongside the RCM analysis ensures that the new maintenance programme is actually executed rather than deferred indefinitely into the backlog.
Living RCM: Maintaining the Analysis
RCM is not a one-off exercise. The maintenance programme it produces must be treated as a living document that is updated when failure experience reveals new failure modes not captured in the original analysis, when asset modifications change the failure patterns, or when operating context changes affect the consequences of failure. Building a formal review process into the RCM programme, using incident and near-miss data as the primary feedback mechanism, ensures that the maintenance programme continues to reflect the actual failure behaviour of the assets rather than the analysis team’s predictions at the time of the original study.
RCM and Total Productive Maintenance: How They Complement Each Other
Total Productive Maintenance (TPM) is the Lean-derived approach to maintenance improvement that focuses on equipment effectiveness, operator involvement in basic maintenance, and the elimination of the six big losses (breakdowns, setup and adjustment, minor stoppages, reduced speed, startup defects, and production defects). RCM and TPM address different dimensions of maintenance excellence and are most powerful when applied together.
RCM provides the analytical rigour to determine the right maintenance strategy for each failure mode. TPM provides the operational discipline, operator capability, and continuous improvement culture to execute that strategy reliably and to identify new improvement opportunities as they emerge. Organisations that implement both achieve the lowest possible combination of maintenance cost and equipment downtime.
This integration with continuous improvement thinking connects directly to the Lean Six Sigma disciplines covered in our companion article on Lean Six Sigma vs Six Sigma. The DMAIC methodology that Six Sigma provides is an effective tool for analysing chronic equipment reliability problems and designing the process improvements that prevent them from recurring.
Conclusion: Maintenance as a Strategic Function
RCM repositions maintenance from a cost to be minimised to a strategic capability to be optimised. Organisations that implement it rigorously consistently achieve lower total maintenance cost (by eliminating unnecessary preventive tasks), higher equipment availability (by preventing the failures that cause unplanned downtime), better safety outcomes (by ensuring protective devices are tested and functional), and a more rational and defensible allocation of maintenance resources than any other maintenance strategy provides.
The methodology requires investment in facilitation, training, and the initial analysis. That investment is recoverable within the first year for any asset system where unplanned downtime is costly, and it produces a maintenance programme that improves over time as failure experience is fed back into the analysis. RCM is not a programme to be run once and forgotten. It is a discipline of thinking about asset reliability that, embedded in the maintenance organisation’s culture, continuously improves both the asset and the people who manage it.
Related reading: RCM analysis produces a maintenance strategy; executing it requires rigorous planning and scheduling. Our article on how to build a project risk register that actually gets used covers the risk identification and management principles that underpin both project risk management and asset reliability management, demonstrating how the same analytical disciplines apply across different operational contexts.
Build complete maintenance engineering and asset management capability
Explore Alpha Learning Centre’s full range of Maintenance Engineering courses, from reliability-centred maintenance and asset integrity management to maintenance planning, scheduling, and predictive maintenance technologies.
