This is Fine! A podcast about resilience engineering and software

Colette Alexander and Clint Byrum
undefined
Jul 25, 2025 • 44min

How long should you wait after an incident to do your retro?

Corn sweat is a real thing: https://www.scientificamerican.com/article/humidity-from-corn-sweat-intensifies-extreme-heat-wave-in-midwest-u-s/Also, plugging Tajin here, because: https://en.wikipedia.org/wiki/Taj%C3%ADn_seasoningWikipedia tells me Tajin is Mexican. I dunno, Clint.Beaumaiden report, for those that didn’t listen to the prior episode where we mentioned it: https://dmaib.com/reports/2021/beaumaiden-grounding-on-18-october-2021John Allspaw’s talk at Spotify that we referenced: https://www.youtube.com/watch?v=M8mYPyRG1fQLorin’s Law is always a good plug: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/Clint’s book recommendation: https://bookshop.org/p/books/the-15-commitments-of-conscious-leadership-a-new-paradigm-for-sustainable-success-diana-chapman/14574335?ean=9780990976905&next=tSend us questions! at thisisfinepod.com or find us on LinkedIn here: https://www.linkedin.com/company/this-is-fine-a-podcast-about-software-and-resilience-engineering/You can come to the Lund panelist event for RISF by signing up here: https://resilienceinsoftware.org/networks/events/133948
undefined
8 snips
Jul 10, 2025 • 1h 5min

Lund University - Academic Theory and Practice

Join John Allspaw, a seasoned software systems engineer, Chad Todd, a recent master's graduate, and Jed Needle, an engineering manager, as they explore the evolution of resilience engineering at Lund University. They discuss navigating AI challenges, managing data overload, and the emotional journey of academia. Hear about the importance of community, the role of personalized tutoring in learning, and fascinating insights from cross-industry safety experiences that merge academic theory with real-world practice.
undefined
Jun 27, 2025 • 58min

What’s the ROI on Reliability and Resilience work?

Dave Wood’s Talk at SRECon 25 was on Complexification and SRE: https://www.youtube.com/watch?v=lmBvUJnGUX4Jens Rasmussen’s model - Is really well explained by Richard Cook’s talk at Velocity: https://www.youtube.com/watch?v=PGLYEDpNu60&t=3sLorin’s blog also has a good summary: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/And finally, Jens Rasmussen’s original paper on the subject: Risk Management in a Dynamic Society https://linkinghub.elsevier.com/retrieve/pii/S0925753597000520SRECon 25 talk on Incident Metrics that Matter that was awesome -  https://www.youtube.com/watch?v=QrR2SvpWvdgWant to read about how things are getting a bit fash-y in tech these days?https://www.newyorker.com/culture/infinite-scroll/techno-fascism-comes-to-america-elon-muskhttps://www.theguardian.com/technology/ng-interactive/2025/jan/29/silicon-valley-rightwing-technofascismPerrow/Normal Accidents: https://bookshop.org/p/books/normal-accidents-living-with-high-risk-technologies-updated-edition-revised-charles-perrow/10369279?ean=9780691004129High Reliability Organizations (HROs):Started (ish) with “A Rejoinder to Perrow” https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-5973.1994.tb00047.xAnd you can find Rochlin & La Porte behind a lot of the early writing on HROs, including https://www.jstor.org/stable/44637690?seq=1 and https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-5973.1996.tb00078.xAs well as Weick and Sutcliffe: https://bookshop.org/p/books/managing-the-unexpected-sustained-performance-in-a-complex-world-kathleen-m-sutcliffe/11267666?ean=9781118862414&https://journals.sagepub.com/doi/10.2307/41165243
undefined
4 snips
Jun 3, 2025 • 54min

Runbooks: the Good, Bad and Ugly w/special guest Andrew Hatch

In this chat, Andrew Hatch, a seasoned software engineer from Cisco Thousandize, dives into the intriguing world of runbooks. He discusses their dual nature in incident management, highlighting both strengths and weaknesses. Andrew emphasizes the importance of understanding complex systems for creating effective runbooks, while also addressing the pitfalls of over-reliance during crises. The conversation touches on the transformation of static runbooks into dynamic resources through operations reviews, showcasing the value of adaptability and teamwork in tech resilience.
undefined
May 21, 2025 • 55min

What is an incident? How come no one declare them?

Michael Wettick’s Lund thesis is great, and Laura Maguire’s paper on the Costs of Coordination that is a shortened version of her dissertation is worth a read!Clint’s SRECon talk that he mentioned a couple times: https://www.youtube.com/watch?v=k4UaDDkLOhwLorin wrote a great article on incidents and improvisation: https://surfingcomplexity.blog/2023/06/11/when-theres-no-plan-for-this-scenario-youve-got-to-improvise/Incident.io and the people who work there have hilarious LinkedIn posts about how people use incidents in their org.We talked about BlackRock3 who do incident command training: https://www.blackrock3.com/Brent Chapman has also done great incident command training and has done some talks on why IT incident management can learn from fire/emergency response management processes.We have a LinkedIn! https://www.linkedin.com/company/this-is-fine-a-podcast-about-software-and-resilience-engineering/And you can ask us questions here: https://forms.gle/rggrbGG6aFVrgZsv9
undefined
May 7, 2025 • 48min

Chaos Engineering w/special guest Casey Rosenthal

The O’Reilly book on Chaos Engineering by Casey and Nora Jones is here: https://www.oreilly.com/library/view/chaos-engineering/9781492043850/Some of the Netflix posts introducing Chaos Monkey and Simian Army are here and here.You can see Lorin Hochstein talking about Chaos Engineering at Netflix here.The Void is an awesome collection of information on incidents throughout tech and you can find it here.Casey mentioned Rasmussen’s model. Lorin has a great summary of that on his blog, but you can read the original paper by Rasmussen introducing this model here.A report on the Netflix outage during Christmas of 2012.A reminder - you can ask us questions for the podcast at www.thisisfinepod.com
undefined
Apr 26, 2025 • 47min

Burnout on Aisle 3

Clint wrote the Socio-Technical Reality Engineer as a blog post it’s a good read.The Burnout book by the Nagoski sisters is A+++ reading.Those Found Responsible Have Been Sacked is by the late, great, Dr. Richard Cook and Chris NemethThe Perverse Incentives of Reliability by Katie Wilde from Snyk at this year’s SRECon was just an incredible talk.Colette mentioned the Beaumaiden report from DMAIB. She gave a talk for the DORA community on resilience engineering that you can see here.
undefined
Apr 9, 2025 • 55min

Resilience, Complexity, and Your Boss a collab w/Punk Rock Safety

Ben (Goodheart), Dave (Provan) and Ron (Gantt) have the very awesome podcast Punk Rock Safety (punkrocksafety.com) - you can get your own punk rock safety merch at punkrocksafetymerch.comCharles Perrow wrote Normal Accidents and talks about safety and power in his essay (book, really), Complex Organizations.Drew Rae lit the stage on fire about safety work as soothing rather than actually improving safety: EHS Congress Berlin 2024 - Day2Dr. Richard Cook’s concepts of ‘Above the Line/Below the Line’ got a shout out - here’s the paper, and here’s John Allspaw giving a talk about the concept.
undefined
Mar 28, 2025 • 49min

Live From SRECon

No video for this one because it didn’t really end up working.We had some awesome people with us for this show:Eric DobbsWill GallegoJuan Carlos RamirezMartin SmithDr Richard Cook’s talk on The Marvelous Resilience of Bone(one of our absolute favorites)You can see the schedule for the SRECon 2025 Americas conference hereThe keynotes from the day we recorded were Dr David Woods and Katie Wilde (from Snyk)
undefined
Mar 12, 2025 • 24min

Teaser Episode - Season 2

The XKCD comic that’s in Colette’s thesis is DependencyJustin Reock is at DXhttps://punkrocksafety.com/ are our mutual podcast friends

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app