The InfoQ Podcast

Safely Changing Software to Avoid Incidents: A Conversation with Justin Sheehy

29 snips
Sep 8, 2025
In a captivating conversation, Justin Sheehy, Chief Architect at Akamai, shares insights on making software safer and more resilient. He discusses the futility of root cause analysis and stresses the importance of a shared language for incident discussions. The need for malleable and observable software is highlighted, along with the understanding that all technology decisions are inherently business-oriented. Sheehy also addresses how AI's rise complicates engineers' abilities to handle production incidents, making resilience even more crucial.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Origin Story: Becoming An Architect

  • Justin became an architect at Akamai when systems had unknown scale and required in-place improvement.
  • He and peers adopted the title to manage constraints and communicate design trade-offs.
INSIGHT

Changes Often Trigger Incidents

  • Many incidents are triggered by pushes of change rather than single root causes.
  • Treat changes as proximate causes and design systems to tolerate them.
ADVICE

Adopt A Shared Change-Safety Framework

  • Create a shared, simple framework of change-safety principles for your teams.
  • Use common language so SREs, DevOps, and architects can discuss and act quickly.
Get the Snipd Podcast app to discover more snips from this episode
Get the app