
This is Fine! A podcast about resilience engineering and software
A podcast about resilience engineering and software.
Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help!
Write us anonymously at our open question form
Email us at: thisisfine.softwarepodcast@gmail.com
Call us and leave a voicemail, or text us at: (401) 592-7574
Latest episodes

May 21, 2025 • 55min
What is an incident? How come no one declare them?
Michael Wettick’s Lund thesis is great, and Laura Maguire’s paper on the Costs of Coordination that is a shortened version of her dissertation is worth a read!Clint’s SRECon talk that he mentioned a couple times: https://www.youtube.com/watch?v=k4UaDDkLOhwLorin wrote a great article on incidents and improvisation: https://surfingcomplexity.blog/2023/06/11/when-theres-no-plan-for-this-scenario-youve-got-to-improvise/Incident.io and the people who work there have hilarious LinkedIn posts about how people use incidents in their org.We talked about BlackRock3 who do incident command training: https://www.blackrock3.com/Brent Chapman has also done great incident command training and has done some talks on why IT incident management can learn from fire/emergency response management processes.We have a LinkedIn! https://www.linkedin.com/company/this-is-fine-a-podcast-about-software-and-resilience-engineering/And you can ask us questions here: https://forms.gle/rggrbGG6aFVrgZsv9

May 7, 2025 • 48min
Chaos Engineering w/special guest Casey Rosenthal
The O’Reilly book on Chaos Engineering by Casey and Nora Jones is here: https://www.oreilly.com/library/view/chaos-engineering/9781492043850/Some of the Netflix posts introducing Chaos Monkey and Simian Army are here and here.You can see Lorin Hochstein talking about Chaos Engineering at Netflix here.The Void is an awesome collection of information on incidents throughout tech and you can find it here.Casey mentioned Rasmussen’s model. Lorin has a great summary of that on his blog, but you can read the original paper by Rasmussen introducing this model here.A report on the Netflix outage during Christmas of 2012.A reminder - you can ask us questions for the podcast at www.thisisfinepod.com

Apr 26, 2025 • 47min
Burnout on Aisle 3
Clint wrote the Socio-Technical Reality Engineer as a blog post it’s a good read.The Burnout book by the Nagoski sisters is A+++ reading.Those Found Responsible Have Been Sacked is by the late, great, Dr. Richard Cook and Chris NemethThe Perverse Incentives of Reliability by Katie Wilde from Snyk at this year’s SRECon was just an incredible talk.Colette mentioned the Beaumaiden report from DMAIB. She gave a talk for the DORA community on resilience engineering that you can see here.

Apr 9, 2025 • 55min
Resilience, Complexity, and Your Boss a collab w/Punk Rock Safety
Ben (Goodheart), Dave (Provan) and Ron (Gantt) have the very awesome podcast Punk Rock Safety (punkrocksafety.com) - you can get your own punk rock safety merch at punkrocksafetymerch.comCharles Perrow wrote Normal Accidents and talks about safety and power in his essay (book, really), Complex Organizations.Drew Rae lit the stage on fire about safety work as soothing rather than actually improving safety: EHS Congress Berlin 2024 - Day2Dr. Richard Cook’s concepts of ‘Above the Line/Below the Line’ got a shout out - here’s the paper, and here’s John Allspaw giving a talk about the concept.

Mar 28, 2025 • 49min
Live From SRECon
No video for this one because it didn’t really end up working.We had some awesome people with us for this show:Eric DobbsWill GallegoJuan Carlos RamirezMartin SmithDr Richard Cook’s talk on The Marvelous Resilience of Bone(one of our absolute favorites)You can see the schedule for the SRECon 2025 Americas conference hereThe keynotes from the day we recorded were Dr David Woods and Katie Wilde (from Snyk)

Mar 12, 2025 • 24min
Teaser Episode - Season 2
The XKCD comic that’s in Colette’s thesis is DependencyJustin Reock is at DXhttps://punkrocksafety.com/ are our mutual podcast friends

Feb 12, 2025 • 49min
Episode 9 - Learning from Incidents with special guest Alex Elman
You can find ACL (Adaptive Capacity Labs), the folks who train software engineers how to do LFI and who we speak so fondly of here.Colette mentioned Allspaw’s take on Five Whys - if you want to know why we think there are better options for learning out there, you can read it here.Alex did a great talk with Sarah Butt on some LFI related things at LFI Conf in 2023: https://www.youtube.com/watch?v=CbSiKAtO7FkAnd at SRECon: SREcon20 Americas - Are We Getting Better Yet? Progress Toward Safer OperationsColette went to go see whales in the Baja through this tour, it was awesomeWrite to us at thisisfine.softwarepodcast@gmail.com or go fill out our form with a question at Thisisfinepod.com

Jan 29, 2025 • 37min
Episode 8 - Why Human Factors and Not Technical Ones
The spicy Allspaw take that inspired our listener is here: https://www.linkedin.com/posts/jallspaw_a-im-a-bit-salty-today-b-if-you-dont-activity-7287968197742411776-5_Ay Charles Perrow is the guy who wrote Normal Accidents (https://bookshop.org/p/books/normal-accidents-living-with-high-risk-technologies-updated-edition-revised-charles-perrow/10369279?ean=9780691004129&next=t&next=t) , which Colette is somewhat controversially a fan of, and thus a Perrow-ian? (a lot of resilience engineering people are not fans!) Not many notes today, here go check out a page on one of Colette’s favorite chicken breeds: https://greenfirefarms.com/shetland_hen.html

Jan 22, 2025 • 54min
Episode 7 - AI and Resilience with special guest Courtney Nash
The VOID is one of our favorite things!Some of Courtney’s inoculation of the MTTR virus can be found here:An interview with InfoQA talk at SRE Con Americas in 2022Courtney’s recent talk on Automation and AIDavid Graeber’s Bullshit Jobs started as a talk and then a great bookWant to read more about HABA-MABA and CSE/RE? Lisanne Bainbridge’s The Ironies of Automation is a perennial recommendation in our show notesThe thread Courtney mentioned from Gergely Orosz

Jan 8, 2025 • 56min
Episode 6 - Can You Buy Resilience? With Special Guest Steve McGhee
Steve is the host of the Google SRE Prodcast, you should check it out!Colette got her chickens from Greenfire Farms, and her chicken coop from Carolina Coops, if anyone is wondering.The Chris Hayes podcast Colette mentioned about unconditional cash transfers is here.Iain M. Banks is an author of The Culture series, a set of fiction books based in a post-scarcity societyIf you didn’t get the Vizzini/Inigo Montoya references, you should probably find a way to see The Princess Bride.Colette mentioned STAMP - which is more along the lines of reliability engineering than resilience engineering, technically, but is related. You can read about how Google is using it here.Lord, you want the history of ITIL? Okay.**** note, none of the below sponsor us (yet), so these are pure-hearted endorsements from Clint during the episode ****Adaptive Capacity Labs will teach your teams how to be more resilient.Incident.io is who Clint mentioned as one of the many incident automation tools out there (Rootly and FireHydrant are a couple others).Backstage is an open source Spotify product, and anyone who’s worked at Spotify will talk your ear off about how great it is if you let us.*************************A new Resilience Engineering community that Colette and Clint are a part of has launched! You can find us at resilienceinsoftware.org and join to be a part of the conversation in SlackAnd of course, you can email us at thisisfine.softwarepodcast@gmail.com or write to us via http://thisisfinepod.com