Slight Reliability

Stephen Townshend
undefined
Jul 29, 2025 • 32min

Mobile Observability with Hanson Ho (Episode 102)

Send us a textThis week I'm joined by the wonderful Hanson Ho to discuss the unique challenges and opportunities in making our mobile apps observable! We cover...📱 The mobile/backend observability divide✍️ The challenge of distributed tracing on mobile apps🌏 The entire device runtime environment matters for your app👤 The quest for user-centric mobile observability✅ Advice on how to get started with mobile observability...and much more.You can find Hanson on:LinkedIn: https://www.linkedin.com/in/hanson-ho/Bluesky: https://bsky.app/profile/bidetofevil.wtfYou can find out more about Embrace at https://embrace.io/You can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Jul 15, 2025 • 40min

Intro to Resilience Engineering with Michelle Casey (Episode 101)

Send us a textThis week on the I'm joined once more by SRE leader Michelle Casey who gives a broad and shallow introduction to resilience engineering. We cover...🏋️‍♀️ Reliability VS Robustness VS Resilience🧩 What is a complex system?🔢 Safety one/safety two🧠 Mental models😩 Human error...and so much more.Resources from this episode:Four concepts for resilience (paper) by Dr. David Woods https://www.researchgate.net/publication/276139783_Four_concepts_for_resilience_and_the_implications_for_the_future_of_resilience_engineeringBuilding and revising adaptive capacity sharing for technical incident response (paper) by Dr Richard Cook and Dr Beth Long https://www.researchgate.net/publication/344259449_Building_and_revising_adaptive_capacity_sharing_for_technical_incident_response_A_case_of_resilience_engineeringSystems Thinking for Incident Analysis (talk) by Laura Nolan from LFI Conf 23 https://www.youtube.com/watch?v=-uXGg3g2ypsHow Complex Systems Fail (website) by Dr. Richard Cook https://how.complexsystems.fail/A Tale of Two Safeties (book) by Erik Hollnagel https://erikhollnagel.com/A Tale of Two Safeties.pdfFrom Safety One to Safety Two (book) by Erik Hollnagel https://www.england.nhs.uk/signuptosafety/wp-content/uploads/sites/16/2015/10/safety-1-safety-2-whte-papr.pdfResilience: It's not you, it's the System (talk) by Dr Carl Horsley https://www.youtube.com/watch?v=ugC3GTKt23UAbove the line / Below the line (paper) by Dr Richard Cook (not original link) https://www.researchgate.net/figure/Above-the-Line-Below-the-Line-framework-adapted-with-permission-Cook-Woods-2016_fig3_333091997How Your Systems Keep Running Day After Day (talk) by John Allspaw https://www.youtube.com/watch?v=xA5U85LSk0MBehind Human Error (book) https://www.amazon.com.au/Behind-Human-Error-David-Woods/dp/0754678342The Field Guide to Human Error Investigations (book) by Sydney Dekker https://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker/books/DekkersFieldGuide.pdfThe Howie Guide (paper) by Dr Laura Maguire, Nora Jones and Vanessa Granda https://howie-guide.pagerduty.com/Resilience Engineering: Where do I start? (website) by Lorin Hochstein https://www.resilience-engineering-association.org/resources/where-do-i-start/The STELLA report (paper) https://snafucatchers.github.io/DORA Communtiy Discussion - Resilience Engineering (discussion) https://www.youtube.com/watch?v=g3cEJ7njJbcThis Is Fine! (podcast) by Colette Alexander and Clint Byrum https://www.thisisfinepod.com/the-pod
undefined
Jun 24, 2025 • 48min

Learning with John Allspaw (Episode 100)

John Allspaw, co-founder of Adaptive Capacity Labs and former CDO at Etsy, dives into the essential art of learning from incidents. He challenges the notion of perfect handovers, revealing why traditional incentives fail to eliminate errors. The talk shifts to the importance of embracing organizational learning and understanding incidents as indicators of systemic issues. Allspaw also champions resilience engineering in software development, urging a community-focused approach to foster adaptability and insight in chaotic environments.
undefined
Jun 3, 2025 • 29min

Focusing on What Matters with Trent Hornibrook (Episode 99)

Send us a textThis week I'm joined by SRE leader Trent Hornibrook who shares a story about how he improved on-call early in his career, and then we explore the broader theme of focusing on the things that matter in observability, incident response, on-call, and beyond. We discuss...🔌 Empowering engineers to implement change in your org🧑‍🍼 Focusing on what matters (customer & business > technology)👀 Not just adding more monitoring as the output of each PIR😎 How autonomy can lead to accountability🌳 How to influence change in an organisation...and much more.You can find Trent on:LinkedIn: https://www.linkedin.com/in/trenthornibrook/You can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
May 20, 2025 • 32min

The Root Cause Fallacy with Andrew Hatch (Episode 98)

Send us a textThis week I'm joined by SRE leader Andrew Hatch from Cisco ThousandEyes to talk about a dirty word in the resilience community... root cause. In this excellent conversation we explore...🌌 Is the root cause of every incident the big bang?🦖 How the value of root cause degrades as complexity increases🫣 That if the culture is not blameless, people will hide things🌳 Alternative approaches to root cause analysis such as branching timelines🙋 Getting someone without skin in the game to facilitate your blameless post-mortems...and much more.You can find Andrew on:LinkedIn: https://www.linkedin.com/in/hatchman76/Check out Andrew's SREcon21 talk 'Learning from Complex Systems' which covers many of the topics introduced in this episode: https://www.youtube.com/watch?v=5pKGW61RyvoYou can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
May 6, 2025 • 33min

Synthetic Monitoring with David Dick (Episode 97)

Send us a textThis week I'm joined by David Dick from 2 Steps to (finally!) discuss synthetic monitoring. We cover...🤖 What is synthetic monitoring?🦾 What are the benefits and drawbacks to using it?☢️ Non-web based synthetics (the tough stuff)🍹 Combining RUM and synthetics🫢 Does synthetics need an OTEL-like framework?...and much more.You can find David on:LinkedIn: https://www.linkedin.com/in/david-dick/You can find more about 2 Steps at https://2steps.io/#You can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Apr 23, 2025 • 31min

Tech Leadership with Milan Brown (Episode 96)

Send us a textThis week I'm joined by Cin7 Engineering Director Milan Brown to unpack the challenges of technology management and leadership. We discuss...✖️ Theory X vs Theory Y management🗣️ Intention based leadership and communication🏢 Conditions in an org for people to thrive😵‍💫 How do you learn to manage and lead?🫤 Managing people when you're not an expert in what they do...and much more.Resources mentioned during the episode:Turn The Ship Around! (book): https://davidmarquet.com/turn-the-ship-around-book/Agile Conversations (book): https://itrevolution.com/product/agile-conversations/Drive (book): https://www.danpink.com/books/drive/Radical Candor (book): https://www.radicalcandor.com/the-book/The Team Canvas (technique): https://theteamcanvas.com/The Enginer/Manager Pendulum (article): https://charity.wtf/2017/05/11/the-engineer-manager-pendulum/Retromat (tool for running retrospectives): https://retromat.org/You can find Milan on:LinkedIn: https://www.linkedin.com/in/milan-brown/You can find Stephen on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Mar 29, 2025 • 36min

Finding Tech Work with Leon Adato (Episode 95)

Send us a textThis week Leon Adato and I break down the state of applying for roles in tech. We cover...📝 What a resume or CV is and is not🤝 Leveraging your connections rather than relying on applying cold🪄 How most job descriptions are works of fiction🦾 White-fonting to game AI resume assessment🧪 Experimental ways we could recruit...and our pitch for Kubernetes the Rock Opera (and much more)You can find Leon's job postings weekly on his website:https://www.adatosystems.com/category/joblistings/You can find Leon on:LinkedIn: https://www.linkedin.com/in/leonadato/Bluesky: https://bsky.app/profile/leonadato.bsky.socialYou can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Mar 22, 2025 • 31min

Getting a Start in SRE with Priyam Kumar (Episode 94)

Send us a textThis week Priyam Kumar shares his story of moving from a massive organisation to a startup and the challenges and growth that came from that. We discuss...🪖 War stories and examples of production incidents🩹 The "hacks" we build to keep things running (and how maybe that's just normal)😎 Keeping it simple... YAGNI (You Ain't Gonna Need It!)🧯 The perils of getting stuck in reactive mode📖 Areas of of learning if you want to get into SRE...and much much more.You can find Priyam on:LinkedIn: https://www.linkedin.com/in/priyam-kumar/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Mar 11, 2025 • 39min

SRE Leadership with Michelle Casey (Episode 93)

Send us a textThis week Michelle Casey shares her insights as a 'head of' engineering manager in the SRE context. This was one of my favourite conversations on the podcast so far. We cover topics such as...🤷🏽 Why move into leadership?👁️ Learning from other leaders💎 What is unique about SRE leadership?👑 Women in engineering leadership...and we go through some feedback I got as a leader recently.Resources that Michelle mentions during the episode:The Five Dysfunctions of a Team (book): https://www.tablegroup.com/topics-and-resources/teamwork-5-dysfunctions/The Phoenix Project (novel): https://itrevolution.com/product/the-phoenix-project/The Unicorn Project (novel): https://itrevolution.com/product/the-unicorn-project/How Complex Systems Fail (website): https://how.complexsystems.fail/How Your Systems Keep Running Day After Day (talk): https://www.youtube.com/watch?v=xA5U85LSk0MThe Curse of the Systems Thinker (article): https://blog.relyabilit.ie/the-curse-of-systems-thinkers/Confessions of an SRE Manager (talk): https://www.usenix.org/conference/srecon23americas/presentation/hatchGender Decoder (website): https://gender-decoder.katmatfield.com/You can find Michelle on:LinkedIn: https://www.linkedin.com/in/michelle-casey-00b39837/Steve Licks Instagram: https://www.instagram.com/tailsofstevielicks?igsh=MWFhenVzdzh6Zmtudw%3D%3DYou can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Bluesky: https://bsky.app/profile/slightreliability.bsky.socialYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app