Failure Modes

Failure modes are all of the possible different ways that a structure or system can fail or not function as intended. For example, consider the possible failure modes for a residential circuit breaker. One mode of failure would be the breaker not disrupting the circuit in the event of a power overload. Another mode could be the circuit completing even when the mechanism is in an "opened" position. Complex systems often can have a large number of potential failure modes demonstrating a need for a systematic approach to assessing failure modes during the design process. Two of the most common methods for assessing failure modes are Failure Modes and Effects Analysis (FMEA) and Fault Tree Analysis (FTA)[1].

Failure Modes and Effects Analysis is a mostly qualitative method used to assess the potential failures of each individual part of a system and determine how these failures will affect the entire system. This approach is considered a "bottom up", inductive approach that analyzes the pieces of the system (low level) and works up to the consequences of that failure on both other dependent pieces of the system and the system as a whole (high level). Failure modes that can lead to complete system failure are identified along with an estimate of the likelihood of their occurrance. Changes to the design can then be made depending on the likelihood for a particular failure and the severity of that failure's consequences. Severity for FMEA is usually reported on a qualitative scale such as a scale from 1 to 10. The specific processes and severity scales for performing FMEA vary by organization. FMEA typically assumes only a single failure mode will occur at a time, rather than a combination[2-4]. Below is a simplified example of FEMA on a residential circuit breaker.

FMEA starts at a low level with a part and defines its purpose such as the actuator mechanism which is designed to break the circuit in the event of a hazardous electrical load. Then the failure modes for that part and the resultant effects are listed, such as the breaker not tripping during a voltage spike, which may result in damage to the house wiring system or appliances, or even a house fire. This is then assigned an arbitrary severity such as 10, since it could result in loss of life. Sometimes possible causes for the failure are also listed.

Fault Tree Analysis, on the other hand, is a more quantitative method that can produce probabilities and can relate multiple different potential failure causes due to lower level failure modes. It is a "top down", deductive approach that begins with an overall failure state of the system (high level) and then works down to find the potential causes due to a single lower level failure, or combination of multiple lower level failures. The potential causes for a given overall failure are diagramed using a Boolean logic diagram, and then the individual probabilities for each step are evaluated and combined to determine the different probabilities of a system failure occurring due to specific causes[1]. Below is a simplified example applying FTA to a top level failure event of a residential electrical fire.

FTA starts with a top level failure event such as an residential electrical fire. Then it follows system states that could result in such a failure, such as a faulty circuit breaker during an current overload, or uninsulated wires. Then eventually the process results in Basic events which seperately or in unison can cause these system states. For the case of uninsulated wires, basic events could be insulation decay due to age, or insulation chewed by rodents.

While there could clearly be a very large number of possible failure modes for any system, in practice, generally only likely or hazardous sources of failure are considered. Failure modes can also be more than the sum of their parts. Even though every piece of a design will have its own set of failure modes, when they are put together in a system, unexpected new failure modes may arise due to emergent behaviors.

References and Resources

  1. MIL-HDBK-338B, Military Handbook: Electronic Reliability Design Handbook (10-01-1998)
  2. MIL-STD-1629A, Military Standard: Procedures for Performing a Failure Mode, Effects and Criticality Analysis (11-24-1980) Cancelled, accessed from http://everyspec.com/MIL-STD/MIL-STD-1600-1699/MIL_STD_1629A_1556/
  3. Flores, M.D., & Malin, J.T. (2013). Failure Modes and Effects Analysis (FMEA) Assistant Tool Feasibility Study. https://ntrs.nasa.gov/citations/20130013157
  4. 431-REF-000370, Flight Assurance Procedure: Performing a Failure Mode and Effects Analysis, NASA/GSFC (08-10-1996)