

Runbooks: the Good, Bad and Ugly w/special guest Andrew Hatch
4 snips Jun 3, 2025
In this chat, Andrew Hatch, a seasoned software engineer from Cisco Thousandize, dives into the intriguing world of runbooks. He discusses their dual nature in incident management, highlighting both strengths and weaknesses. Andrew emphasizes the importance of understanding complex systems for creating effective runbooks, while also addressing the pitfalls of over-reliance during crises. The conversation touches on the transformation of static runbooks into dynamic resources through operations reviews, showcasing the value of adaptability and teamwork in tech resilience.
AI Snips
Chapters
Transcript
Episode notes
Runbooks: Helpful or Harmful
- Andrew Hatch has used great runbooks early in his on-call experience to learn and troubleshoot incidents.
- He also encountered runbooks that misled him, causing wasted time and mistakes during incidents.
Runbook Value Diminishes with Complexity
- Runbooks work better for less complex systems with fewer variables.
- In highly complex environments, runbooks lose value because they cannot cover every possible interaction.
Runbooks as Training Wheels
- Use runbooks as training tools to boost confidence for new on-call engineers.
- Provide clear, straightforward steps to ease onboarding and reduce incident anxiety.