Root Cause Analysis Bruce Thomadsen University of Wisconsin - Madison
Analysis Assume you have an event. What to do? The analysis would like to get to the root cause.
RCA Team First, assemble an root-cause analysis team. The members should not include anyone involved in the event. The team should include individuals who understand the process, or how it should be, some who know about the environment around the process, and some from the administration. The team should have the principals tell what they think happened.
RCA Team Continued The principals should tell what they think when wrong and what they would suggest would fix it. They probably will be wrong on both counts. The team needs to make a process map of what the process is and what the principals think it is and what it is supposed to be. Then they do the analysis. A good resource is http://www.patientsafety.gov/rca.html
Root Cause and Types of Errors What is root cause? First, let s look at one divisions of failures. There are active errors, those things that individuals do that prevent success; they are active because they cause something to happen. Then there are latent errors, which are the organizational or environmental conditions that would lead an individual to fail.
Latent Errors Active errors only affect the particular patient, while latent errors can affect all patients, right? Well, no. An active error, such as an incorrect calibration of a machine could affect large numbers of patients, while a latent error might only lead to an event that injures one. Be that as it may, most often the active errors are a one-time, one-patient thing and the latent errors are systemic and make traps. Latent errors often are things like staffing, policies or training practices.
Back to Root Causes So, root causes are latent errors, right? Well, yes if you can find them. What if you find latent errors that seem beyond fixing? What if I can t find any latent errors? Will people die?
Root Cause Again The reason one wants to find a latent error is because it is thought that that could lead to many errors - fixing that could prevent several errors. Fixing the latent error is not necessary to fix the problem (assuming there is a problem). What is needed to anything that interrupts the propagation of failures.
Propagation of Errors The propagation of errors is what the fault trees show. The goal (as we will see in the discussion on QM) is to prevent a failure from causing harm to the patient. That means, blocking the paths on the fault tree from the particular failure to patient harm. The flow of a failure is termed propagation of error.
Propagation of Errors Preventing the root cause is akin to eliminating the right-most cause. That is not necessary nor necessarily efficient. Stopping the flow anywhere works, and a downstream block may be better at preventing numerous failures from propagating.
Example Fault Tree You ve Seen Before Wrong units Calibration error or Measurement error Wrong calibration and Calculation error Failure of verification Error in data entry 10/44 E Dosimetry Error or 3/44 7/44 Source strength error Dose calculation error F or Go to Page 5 3/44 Wrong source data Wrong data (wrong decay factor) and and 3/44 3/44 Erroneous strength for source data Failure of verification Wrong data format (US/Euro) or 3/44 Failure to enter or alter data (unit default) Wrong source in device Discrepency in strength between device and planning system Incorrect entry Page 4
Root Cause Again The reason one wants to find a latent error is because it is thought that that could lead to a fix that could prevent several errors. One, single latent error may begin many branches of the fault tree.
Event-analysis Diagram Just as a process tree or map helps to understand a process before performing an FMEA, an event-analysis diagram helps to understand an event. Often, the diagram is built by a team - which can take a long time with lots of arguing. Sometimes it is done by an individual - which is likely to misinterpret some parts of an event.
Root-cause-analysis Tree The main tool used in a root cause analysis is the root-cause-analysis tree, or a RCA tree, or a RCA diagram. The diagram starts with the event as a box at the top. Then, one asks, what actions immediately preceded the event: caused it? These actions are boxes just below the Event box, and join it, usually through an AND gate.
Example Fell Down Stairs Carrying laundry and could not see Cat sitting on top of stairs
Causes The immediate causes are called proximal causes. For each proximal cause, ask what caused that action. Keep asking what caused each action until you get to the last action over which you had control. The causes for that last action, being out of your control, are of no interest to you, and define the limit of your universe.
Progenitor Causes The cause at the end of a path is the progenitor cause, i.e., that which started that path. The progenitor cause might be a root cause or it could just be a condition. An example of a condition would be, The primary nurse was home sick. There isn t anything you could, or would, do to prevent this. Progenitor causes enter the diagram as ovals.
Looking for Root Causes The concept of root cause is not clear, although it sounds good. What you are looking for with root causes are those causes that you could change that would prevent events. You would like the root-causes to be Latent Errors situations in the organization that facilitate failure initiation and propagation. Often they are Active Errors something somebody did.
Environment Often the environment contributes to an event, a performanceshaping factor (PSF) or an enabling factor. This is included in the tree as a diamond, like a transistor*. Light was low Tripped over cat Cat sitting on top of stairs *Transistor: an archaic electronic component that acts as a gate.
WARNING Don t worry about being able to read the type on the following. The figures are here to show you features of the trees.
Diagram for Example Wrong Dose to patient; injury to patient Source was in the wrong location No check of loading performed Physicist was not there Resident did not see the source drop Resident twisted source bucket source fell out Lighting was poor Applicator was difficult to load Resident not as experienced as he though Resident chose to go alone Resident chose not to disturb Physicist Loading was between patient s legs Light was from head of bed
Unintended Area Example of a Root- (P) HRI (R) Execution is erroneous The step size (parameter) was wrong Fail to identify the error (P) HRV (R) Identification not correct cause- analysis (P) TM Requirement for a manual entry The computer would not transfer the file at that size The step size stayed at the wrong size when the physicist moved it with mouse (R) Procedure is incorrect The normal dosimetrist who check the plan left with an emergency (S) External Interference The dosimetrist who check was not familiar with the program enough The physician just missed the error (R) Manual variability (P) OK (R) Spontaneous (R) Excessive demand on human variability knowledge/training (S) Inadequate Search (S) Bounded Rationality Behavior Tree The arrow key rotate through different sizes in the step field (P) TD The physicist did not notice it (R) Information not seen or sought (S) Distraction The physicist not familiar with the program? He was interrupted (P) OK (R) Excessive demand on knowledge/traning (S) Incomplete rule (S) Bounded Rationality (R) Distraction from other person
Another example
Mutual Wrong Dose Input into Two Sides Treatment unit programmed with wrong times No Quality Assurance Dosimetrist entered wrong distance for Physicist didn t notice dosimetrist's error Concentration on the axes question Vigilance relaxed after problem solving Axes determination not planned ahead Staff won't proceed without understanding steps New Procedur Insufficient planning Insufficient Training
Red Arrows: Common causes QC Failure Protocol violation Lapse (following Problem solving)
Deviation from protocol Deviation from protocol Lack of training Feed Back (or Lack of communication Forward) Distractions
Over Dose Complexity Treated with wrong parameter setup (R) Detection missing No one notice the (S) Lack of Vigilant source remained (P) HRM in position much longer than before (P) HRV Verification failure (P) HRI Entering wrong treatment parameters Mental let down (fatigue) after problem solving session Distract by the conversation (R) Distraction (S) Distraction (P) OC Did not follow QMP and have two persons check the program The physician signed the program without really checking it Made a mistake when key in the parameter (R) Execution is erroneous (R) Manual variability (R) Spontaneous human variability (S) Loss of coordination Had to enter by hand Rushed environment (P) OM, OC Priority Rushed environment Priority No familiar with the keyboard layout (R) Stereotype takeover Rushed Environment Card reader was out of order (S) External Interference (P) TM Patient could not wait (R) Interfering task Improper schedule Staff did not communicate well (P) HRC Not enough physicist (P) OM
1 st of 3
Generalizations Almost always, a box will have at least two causes (i.e., the event is preceded by an AND gate.) That is because humans do very well at dealing with problems one by one. When faced with two unusual situations at the same time, attention paid to one detracts from the other.
Generalizations Continued If you don t find two causes, often the cause is just a restatement of the action. The purpose of the exercise is understand the event and what led to it, and to guide the development of preventive measures.
A very Useful Website http://www.patientsafety.gov/rca.html
VA Rules of Causation Rule 1 - Causal Statements must clearly show the "cause and effect" relationship. Rule 2 - Negative descriptors (e.g., poorly, inadequate) are not used in causal statement Rule 3 - Each human error must have a preceding cause. Rule 4 - Each procedural deviation must have a preceding cause. Rule 5 - Failure to act is only causal when there
Generalizations about Fixes All failures are system errors because the system did not prevent the propagation of the progenitor failure. All failures are human errors because somebody did something wrong. Latent errors are usually very hard to fix (like trying to make someone change their religion.) The prevention of similar events can be by eliminating progenitor causes OR by interrupting the propagation.
Root Causes One Last Time Are root causes always latent errors? Are root causes always progenitor causes?