

Slight Reliability
Stephen Townshend
Learning SRE, one day at a time.
Episodes
Mentioned books

May 9, 2023 • 32min
Slight Reliability Episode 54 - Trends in Incident Management with Andy Thurai
Send us a textIn this episode Stephen Townshend chats to Andy Thurai (VP and Principal Analyst at Constellation Research) about Andy's latest report titled "Trends in Incident Management 2023". They chat about "mean time to innocence", status pages, they debate whether AI or ML has real value for incident management, and ponder why anyone would willingly decide to become an incident commander?You can find Andy's report here: https://www.constellationr.com/research/2023-trends-incident-managementYou can find Andy on LinkedIn here: https://www.linkedin.com/in/andythurai/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

May 2, 2023 • 28min
Slight Reliability Episode 53 - DORA Metrics with Tim Wheeler
Send us a textIn this episode Stephen Townshend chats to Tim Wheeler (Director of Engineering Services at SquaredUp) about his work implementing and continually monitoring DORA metrics. They chat about customising each metric to your own unique context, avoiding the weaponisation metrics, the "tools will solve this for me" trap, and much more.The books mentioned during this episode were: Accelerate, The DevOps Handbook, The Phoenix Project, The Unicorn Project, Lean Enterprise, and Sooner, Safer, Happier. Tim also mentioned the work of Bryan Finster (https://twitter.com/BryanFinster).You can find Tim on LinkedIn: https://www.linkedin.com/in/timjameswheeler/You can find out more about SquaredUp at https://squaredup.com/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Apr 25, 2023 • 9min
Slight Reliability Episode 52 - Double, Double, Toil and Trouble!
Send us a textIn this episode Stephen explores the SRE concept of "toil". What is it? How can we measure it? How do we reduce it?Also in this episode: Can we make non-technology systems observable? (like we do technology ones), and the ineffectiveness of change advisory boards (CAB). Also, Stephen's upcoming attendance at SREcon, AWS Summit, and SLOconf.Shout outs to Steve McGhee, Dom Finn, and Shea Stewart.You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Apr 18, 2023 • 30min
Slight Reliability Episode 51 - The reliability.org Community with Anurag Gupta
Send us a textIn this episode Stephen Townshend and Anurag Gupta discuss the new reliability.org community for SREs or reliability engineers to share experiences, ask questions, and find community. They discuss the value of community and sharing your thoughts, collaboration between organisations, vicious versus virtuous cycles for reliability, and much more.You can join us in the community by visiting https://www.reliability.org/You can find Anurag:On LinkedIn: https://www.linkedin.com/in/awgupta/You can find out more about Shoreline by visiting https://www.shoreline.io/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Apr 11, 2023 • 39min
Slight Reliability Episode 50 - The 50th Episode Special with Bruce Cullen
Send us a textIn this episode Bruce Cullen interviews Stephen Townshend about the past, present, and future of the Slight Reliability podcast. They discuss their shared backgrounds in software testing, the different career paths that testing has opened up, and much more!Bruce is the Director of Engineering at SquaredUp. You can find him on LinkedIn: https://www.linkedin.com/in/bruce-cullen/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Apr 4, 2023 • 39min
Slight Reliability Episode 49 - Implementing Observability in the Real World with Ivan Merrill
Send us a textIn this episode Ivan Merrill from Fiberplane shares his experiences implementing observability within some of the large complex organisations he's worked for in the past.You can find Ivan on LinkedIn: https://www.linkedin.com/in/ivan-merrill-1a05223/You can find out more about Fiberplane here: https://fiberplane.com/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Mar 21, 2023 • 8min
Slight Reliability Episode 48 - Blind Insight
Send us a textIn this episode I discuss the word "insight" within the context of observability. Is insight something tools can provide? Is it something you can reproduce? You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Mar 14, 2023 • 33min
Slight Reliability Episode 47 - Cloud Dependency Reliability with Jeff Martens and Ryan Duffield
Send us a textIn this episode Stephen Townshend discusses our increased dependency on third party cloud services and what this means for reliability with Jeff Martens and Ryan Duffield from https://metrist.io/.You can find Jeff... On LinkedIn: https://www.linkedin.com/in/jmartens/On Twitter: https://twitter.com/JmartensYou can find Ryan...On StackOverflow: https://stackoverflow.com/users/2696/ryan-duffieldOn GitHub: https://github.com/rduffieldYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Mar 7, 2023 • 10min
Slight Reliability Episode 46 - Raw Telemetry
Send us a textIn this episode I propose the use of scatterplots of raw data to better understand how our systems are behaviour and what our customers are experiencing. The ideas from this episode come from my time as a performance engineer and working with legends in that space Richard Leeke (https://www.linkedin.com/in/richard-leeke-450448/) and Neil Davies (https://www.linkedin.com/in/neildaviesnz/).For some basic examples of scatterplots and what they show you versus line charts check out an article I wrote back in 2017 called "Let's Talk About Averages": https://www.linkedin.com/pulse/lets-talk-averages-stephen-townshend/Another proponent of scatterplots is Stijn Schepers (https://www.linkedin.com/in/stijnschepers/). Here's an article he wrote about it in 2019: https://www.linkedin.com/pulse/performance-testing-act-like-detective-use-raw-data-stijn-schepers/ Neil Davies' article on tornado scatters "Chasing Tornadoes" can be found here: http://www.performance-workshop.org/wp/wp-content/uploads/2013/12/Chasing_Tornadoes_Davies.pdfYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 28, 2023 • 49min
Slight Reliability Episode 45 - Telemetry Fluency with Paige Cruz
Send us a textIn this episode we discuss uplifting telemetry knowledge within engineering teams to enrich their work (and their lives) with Paige Cruz from Chronosphere. We cover why not to take a chainsaw to your observability in order to cut costs, the dark side of auto-instrumentation, story telling with live data, and much more.The book that Paige recommends at the end is "Effecting Monitoring and Alerting for Web Operations": https://www.oreilly.com/library/view/effective-monitoring-and/9781449333515/You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre