
This is Fine! A podcast about resilience engineering and software Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein
Oct 16, 2025
Lorin Hochstein, a software engineer and researcher on reliability, dives into the nuances between root cause analysis and resilience engineering. He reveals the origins of the 'Five Whys' and critiques its overuse in incident analyses. Hochstein argues for resilience-oriented methods, asserting they reveal deeper insights and prevent future failures. He discusses the Swiss Cheese Model, the limitations of assuming a single root cause, and introduces STAMP, a method from safety-critical fields, emphasizing the need for effective learning over mere fixes. A must-listen for tech teams!
AI Snips
Chapters
Books
Transcript
Episode notes
Hosts Use A Hockey Injury To Introduce Analysis
- Clint sprained his wrist playing hockey and joked about personal post-incident analysis.
- The hosts use personal injury stories to frame how humans analyze incidents and risk.
Root Cause Means Different Things During Vs After
- Root-cause talk shifts meaning: during incidents people use it to mean "what's broken now," while afterwards it implies a definitive underlying reason.
- Lorin warns we should be careful using the term post-incident because it carries different expectations and risks misdirecting learning.
Resilience Analysis Reveals Systemic Patterns
- Incident analysis (resilience approach) views failures as multi-factor and interactive, not traceable to a single cause.
- That approach yields broader organizational learning and surfaces patterns root-cause work often misses.

