Fixing the System, Not the Person - May 2026 - Week 20

 

Introduction

We close this series with the concept that ties all four weeks together: when something goes wrong, the most important question isn't who to blame…it's what in the system made the error possible. This week we examine how high-reliability organizations investigate incidents, design accountability without blame, and create systems where the right outcome doesn't depend entirely on any one person performing perfectly.

Monday – Why Blame Doesn't Make Us Safer

Good morning. Let's start this week's topic with an honest question: when something goes wrong here, what happens? Not officially, but actually. Does the conversation focus on what the system allowed to happen, or on what the person did wrong?

Blame is a natural human response to harm. It's psychologically satisfying. It creates a clear story, a cause, a responsible party, a resolution. And it's almost always wrong as an analysis.

Here's why: any time a human error occurs, it occurs inside a system that allowed it. The worker who skipped a step was in an environment where skipping that step was easy, fast, and had happened before without consequence. The worker who made the wrong decision was operating with incomplete information, under time pressure, in conditions that the procedure didn't fully account for. Blaming that worker and stopping there doesn't make the system safer. It just identifies who to be angry at.

The alternative is not to ignore accountability. It's to distinguish between individual choices made in bad faith, which are rare and human errors made by capable people in flawed systems, which are common. Fixing the system protects everyone who comes after. Blaming the person protects no one.

Real-World Example

Aviation has spent decades developing a safety culture based on system analysis rather than individual blame, and the results are measurable. The airline industry is statistically one of the safest modes of transportation in the world, not because pilots and crew stopped making errors, but because the system was designed to catch and absorb errors before they become catastrophic.

A central mechanism is the Aviation Safety Reporting System (ASRS), a confidential, non-punitive reporting system run by NASA since 1976. Pilots, air traffic controllers, and maintenance crews can report safety concerns, near-misses, and their own errors without fear of disciplinary action. The data is analyzed for systemic patterns and fed back into safety improvements across the industry.

In the decades since its launch, the ASRS has received over one million reports. Aviation experts credit the system with identifying dozens of systemic hazards that would likely have resulted in accidents had they not been caught through voluntary reporting.

The lesson for manufacturing isn't to build an aviation reporting system. It's to recognize that the same mechanism works in any environment: when people can report errors and near-misses without fear, the data that emerges reveals system problems that can be fixed. When they can't, those problems stay hidden until they produce a serious outcome.

Discussion Prompt

Think about the last incident or near-miss that happened in your area. What was the official explanation? Did the investigation go deeper than the person who was involved? Did it ask what the system made easy that should have been hard? What would the answer to that question have been, if it had been asked?

Tuesday – How to Actually Investigate What Went Wrong

Yesterday we established that fixing the person rarely fixes the system. Today we look at how to investigate an incident in a way that identifies what can be changed.

Effective incident investigation asks 'why' at least five times. Not 'why did the worker do that?' as a judgment, but 'what created the conditions where that action was the one that happened?'

A worker didn't wear PPE. Why? The PPE was uncomfortable. Why? It was never properly fitted. Why? There's no fitting process — it's self-service from a general dispenser. Why? Nobody designed one into the program. Why? The program was built around availability, not use. Now you have a system problem, not a worker problem.

A machine guard was bypassed. Why? It caused frequent false stops. Why? The sensor was set with zero tolerance for vibration. Why? Default settings were never adjusted for this application. Why? No commissioning process required it. Now you have an engineering and process problem, not a reckless worker.

The 'five whys' isn't a magic formula — it's a discipline. It keeps the investigation moving toward causes that can actually be changed, rather than stopping at 'the worker made an error' - which is always true and never sufficient.

Real-World Example

A corrugated packaging plant in Virginia had experienced a forklift pedestrian incident that resulted in a minor injury to a warehouse associate. The initial investigation identified the forklift operator as having failed to yield at a marked crossing. A warning was issued and retraining was scheduled. The investigation was closed.

Three months later, a SAT member raised the incident at a meeting after a near-repeat occurred at the same crossing. The SAT asked to reopen the analysis and apply a more systematic approach. They mapped the physical environment: the crossing was at the end of a rack row, with a full rack obscuring the forklift operator's sightline until they were within approximately 12 feet of the crossing. At the posted speed limit, stopping distance from 12 feet was not achievable. The crossing, by physical geometry, required luck as much as compliance.

The SAT's five-why analysis identified: the crossing location had been established before the rack configuration was modified three years earlier; no re-evaluation had been triggered by the rack change; and the facility had no formal process requiring traffic flow review when physical layouts changed.

Corrective actions: the crossing was relocated to a sightline-clear position. A convex mirror was added at the original location as a secondary measure. A formal requirement was added to the facility change management process that any physical layout change within 20 feet of a pedestrian crossing required a traffic flow review. No further pedestrian-forklift incidents occurred at that crossing in the following two years.

Discussion Prompt

Think about an incident or near-miss from your area, recent or historical. If you applied five whys to it, what do you think you'd find? Do you think the corrective action that was implemented actually addressed the root cause, or did it address the most visible symptom?

Wednesday – Just Culture — Accountability Without Blame

We've talked about why blame doesn't make us safer and how to investigate for system causes. Today we address the question that always comes up: if we don't blame people, does anyone get held accountable?

The answer is yes, and the framework that makes this possible is called Just Culture.

Just Culture distinguishes between three types of behavior when something goes wrong. Human error: an unintentional mistake, where the person did not choose to do the wrong thing - the system created conditions where the error was likely. The appropriate response is to fix the system. At-risk behavior: a choice to take a shortcut or work around a rule, where the person didn't recognize the risk, or felt pressure that outweighed the rule. The appropriate response is to coach, to understand what created the at-risk choice, and to change the conditions. Reckless behavior: a conscious choice to take an unjustifiable risk, where the person knew the risk and chose it anyway, without legitimate reason. The appropriate response is accountability; this is where discipline is appropriate.

Most incidents fall in the first two categories. Very few are genuinely reckless. A facility that applies the same disciplinary response to all three types treats its capable workforce as reckless and destroys the reporting culture that would help it identify and fix its actual system problems.

Real-World Example

A regional hospital system in the Mid-Atlantic had implemented a Just Culture framework after recognizing that their adverse event reporting rates were far below national benchmarks, not because fewer things were going wrong, but because staff were not reporting near-misses and errors for fear of consequences.

The implementation required significant supervisor training. The hardest part, the patient safety director said later, was helping managers understand the difference between holding someone accountable and holding a system accountable. 'We had managers who believed any incident on their unit reflected a failure of their staff. Teaching them to look at incidents as system failures without letting people off the hook for genuinely reckless behavior, took real work.'

In the two years following implementation, voluntary near-miss reporting increased 340%. The quality team began identifying patterns in that data that led to seven specific system changes, including two that the patient safety director assessed as having prevented sentinel events.

A nurse who had been with the system for 11 years described the change at a department meeting: 'Before, if I made an error, my goal was to make sure no one found out, because finding out meant getting in trouble. Now my goal is to report it immediately, because reporting it means we'll figure out why it happened and stop it from happening again. That's a completely different relationship with my own mistakes.'

Discussion Prompt

How would you describe our facility's current approach when an incident happens? Is it closer to 'what did the person do wrong' or 'what did the system allow'? What would need to change about how we respond to incidents for people to feel genuinely safe reporting their own near-misses and errors?

Thursday – Learning Organizations — Turning Every Incident Into Improvement

We've talked about blame, investigation, and Just Culture. Today we look at what all that builds toward: a learning organization — one that systematically gets better from every incident, near-miss, and close call rather than just reacting and moving on.

A learning organization does four things well. It captures what happens - near-misses are reported at rates that reflect actual frequency, not just the small fraction that feel 'serious enough' to report. It analyzes for causes, investigations go past the immediate surface cause to the system conditions that allowed the event. It implements changes - corrective actions address root causes, not just immediate symptoms, and they're tracked to completion. And it shares what it learns - the insight from one incident is communicated broadly enough to prevent recurrence across the whole facility, not just in the area where it happened.

Most facilities do the first thing adequately. Many do the second thing partially. Fewer do the third consistently. Very few do the fourth well.

The fourth one , sharing learning, is where SATs have a specific and powerful role. A near-miss that happens in Cell 3 is a near-miss that Cell 5 hasn't had yet. Getting that information to Cell 5 before the miss becomes a hit is exactly what a functioning safety communication structure is for.

Real-World Example

A multi-site food manufacturing company with seven facilities in the Midwest made a deliberate decision after a serious laceration injury at one site to treat incident learning as a network responsibility rather than a site responsibility.

Before the change, each facility investigated its own incidents, implemented its own corrective actions, and reported summary statistics to corporate quarterly. There was no mechanism for Site A's learning to reach Site B in real time.

The company implemented a shared incident learning platform: within 72 hours of any recordable injury or high-potential near-miss, the investigating site was required to post a brief summary - what happened, initial findings, immediate actions taken — visible to safety leads at all seven sites. Within 30 days, a completed root cause summary and corrective action plan was shared, along with an explicit question: 'Does this hazard or system condition exist at your site?'

In the first year, sites flagged matching conditions in response to 14 of 23 shared incident summaries. In 8 of those 14 cases, the receiving site implemented a corrective action before any incident occurred at their facility. The corporate safety director presented the data to the executive team as a direct return on investment: 'Eight potential incidents we prevented by sharing what we learned somewhere else. Eight investigations we didn't have to do. Eight workers who went home without an injury. The platform cost less than one of those investigations would have.'

Discussion Prompt

When an incident or near-miss happens in one part of this facility, how does that information get to the rest of the facility? Is there a mechanism for learning from what happens here to prevent the same thing somewhere else? And are we sharing learning across shifts, not just across departments?

Friday – Series Wrap-Up — Building a System That Protects Everyone

Last one. Four weeks in. Let's close the series by pulling the whole arc together.

Week one: the safest systems make the safe choice the easy choice. Design, not discipline, is the most reliable safety strategy.

Week two: procedures get bypassed when they don't reflect real work, when the culture creates pressure to skip them, or when stopping feels riskier than continuing. Fixing procedures and cultures is more effective than enforcing rules that workers have already learned to work around.

Week three: PPE is a critical last line of defense - but only when it's the right PPE, in good condition, worn correctly, on top of the higher-level controls it's meant to supplement, not replace.

Week four: when something goes wrong, the question that produces improvement is not 'who did this?' but 'what in our system made this possible?' And the culture that answers that question honestly, without blame, with genuine accountability for reckless behavior, and with learning shared broadly is the culture that gets safer over time.

None of these ideas require a large budget or a corporate initiative. They require people who are willing to look honestly at how work actually happens, who feel safe enough to say what they see, and who have a structure — like a SAT — to turn that honesty into action. That's what you are. That's what this work is for.

Real-World Example

A mid-size aerospace components facility in Connecticut, the same one referenced in the previous series, had spent 18 months building vocabulary around cognitive hazards. In the year that followed, they turned their attention to systems and design using the same approach: worker-led safety shares, SAT-driven investigation, and honest examination of the gap between policy and practice.

Their most significant improvement came from a root cause analysis of their five most frequent injury types over a three-year period. The analysis found that four of the five types shared a common systemic factor: they all occurred during non-routine tasks - setup changes, maintenance interventions, first articles, and recovery from equipment faults - where the written procedure either didn't exist, hadn't been updated, or had never been reviewed by the workers performing the task.

The SAT proposed and drove a Non-Routine Task Protocol: any task performed fewer than once per week or involving a procedure last updated more than 18 months ago required a pre-task review involving at least one experienced worker and one person less familiar with the task — to surface both the knowledge and the fresh eyes.

In the 24 months following implementation, injuries during non-routine tasks dropped 71%. More significantly, the protocol generated 34 procedure updates in its first year - each one driven by a worker who had said, during the pre-task review: 'That's not how we actually do this.'

The plant manager said at a regional safety forum: 'We used to think safety was about getting people to follow the rules. Now we think safety is about making sure the rules reflect reality, and that the system is built so the right thing is also the natural thing. Our people were never the problem. The system we handed them was.'

Discussion Prompt

Series Wrap-Up: Over these four weeks, we've looked at design, procedures, PPE, and system thinking. What's one real, specific change: to how our work is designed, to a procedure, to our PPE program, or to how we investigate and learn from incidents - that you want to see happen in this facility? Who do you need to talk to? And when are you going to do it?

DOWNLOAD PDF VERSION 


Tags: safety topic , Safety Brief , fixing the system ,


Subscribe to Updates

Weekly Safety Topics and Coming Events