

Slight Reliability
Stephen Townshend
Learning SRE, one day at a time.
Episodes
Mentioned books

Apr 4, 2023 • 39min
Slight Reliability Episode 49 - Implementing Observability in the Real World with Ivan Merrill
Send us a textIn this episode Ivan Merrill from Fiberplane shares his experiences implementing observability within some of the large complex organisations he's worked for in the past.You can find Ivan on LinkedIn: https://www.linkedin.com/in/ivan-merrill-1a05223/You can find out more about Fiberplane here: https://fiberplane.com/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Mar 21, 2023 • 8min
Slight Reliability Episode 48 - Blind Insight
Send us a textIn this episode I discuss the word "insight" within the context of observability. Is insight something tools can provide? Is it something you can reproduce? You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Mar 14, 2023 • 33min
Slight Reliability Episode 47 - Cloud Dependency Reliability with Jeff Martens and Ryan Duffield
Send us a textIn this episode Stephen Townshend discusses our increased dependency on third party cloud services and what this means for reliability with Jeff Martens and Ryan Duffield from https://metrist.io/.You can find Jeff... On LinkedIn: https://www.linkedin.com/in/jmartens/On Twitter: https://twitter.com/JmartensYou can find Ryan...On StackOverflow: https://stackoverflow.com/users/2696/ryan-duffieldOn GitHub: https://github.com/rduffieldYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Mar 7, 2023 • 10min
Slight Reliability Episode 46 - Raw Telemetry
Send us a textIn this episode I propose the use of scatterplots of raw data to better understand how our systems are behaviour and what our customers are experiencing. The ideas from this episode come from my time as a performance engineer and working with legends in that space Richard Leeke (https://www.linkedin.com/in/richard-leeke-450448/) and Neil Davies (https://www.linkedin.com/in/neildaviesnz/).For some basic examples of scatterplots and what they show you versus line charts check out an article I wrote back in 2017 called "Let's Talk About Averages": https://www.linkedin.com/pulse/lets-talk-averages-stephen-townshend/Another proponent of scatterplots is Stijn Schepers (https://www.linkedin.com/in/stijnschepers/). Here's an article he wrote about it in 2019: https://www.linkedin.com/pulse/performance-testing-act-like-detective-use-raw-data-stijn-schepers/ Neil Davies' article on tornado scatters "Chasing Tornadoes" can be found here: http://www.performance-workshop.org/wp/wp-content/uploads/2013/12/Chasing_Tornadoes_Davies.pdfYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 28, 2023 • 49min
Slight Reliability Episode 45 - Telemetry Fluency with Paige Cruz
Send us a textIn this episode we discuss uplifting telemetry knowledge within engineering teams to enrich their work (and their lives) with Paige Cruz from Chronosphere. We cover why not to take a chainsaw to your observability in order to cut costs, the dark side of auto-instrumentation, story telling with live data, and much more.The book that Paige recommends at the end is "Effecting Monitoring and Alerting for Web Operations": https://www.oreilly.com/library/view/effective-monitoring-and/9781449333515/You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 21, 2023 • 39min
Slight Reliability Episode 44 - Cognitive Overload with Paige Cruz
Send us a textIn this episode we discuss cognitive overload in SRE with Paige Cruz from Chronosphere. We cover both what cognitive load is, what causes it, as well as some potential antidotes and preventative measures.You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 14, 2023 • 10min
Slight Reliability Episode 43 - Beyond Observability
Send us a textIn this episode I discuss my "bigger picture" perspective of what observability needs to be, and why it's important we include business and customer into what we monitor in the Digital Era.The books I highlight in this episode are...Observability Engineering https://www.oreilly.com/library/view/observability-engineering/9781492076438/Sooner, Safer, Happier: https://soonersaferhappier.com/book/The Phoenix Project https://www.oreilly.com/library/view/the-phoenix-project/9781457191350/The Unicorn Project https://www.oreilly.com/library/view/the-unicorn-project/9781098124175/Accelerate: https://www.oreilly.com/library/view/accelerate/9781457191435/You can grab a copy of the 2022 State of DevOps report at: https://cloud.google.com/devops/state-of-devopsThe blog I mentioned was The Insight Industrial Complex: https://benn.substack.com/p/insight-industrial-complexYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 7, 2023 • 37min
Slight Reliability Episode 42 - Reliability Insights with José Velez
Send us a textIn this episode we speak to José Velez from Rely about reliability at scale, a top down approach to SLOs, the potential and limitations of AI and ML in operations, the question of service ownership, utilising the business criticality of services in how we monitor the underlying infrastructure, and much more.You can check out Rely at https://www.rely.io/You can find José on LinkedIn: https://www.linkedin.com/in/josevelez-relyio/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Jan 31, 2023 • 32min
Slight Reliability Episode 41 - Testing with Traces (with Ken Hamric)
Send us a textIn this episode we speak to Ken Hamric about distributed tracing, leveraging tracing for better testing, and observability driven development.The tool that Henrik Rexed integrated with Tracetest was Kuberhealthy (https://www.cncf.io/projects/kuberhealthy/) and you can watch a video of him discussing it in combination with Tracetest here: https://youtu.be/PKQQEeeMYxg?t=2492Ken also mentioned Charity Majors' writing about observability driven development: https://thenewstack.io/a-next-step-beyond-test-driven-developmentYou can check out Tracetest: - The official website: https://tracetest.io/- GitHub repo: https://github.com/kubeshop/tracetest- Discord channel: https://discord.com/channels/884464549347074049/963470167327772703You can find Ken on LinkedIn: https://www.linkedin.com/in/ken-hamric-016b1420/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Jan 24, 2023 • 11min
Slight Reliability Episode 40 - Drowning in an Observability Data Lake
Send us a textIn this episode Stephen explores the pros and cons of centralising observability data. Is it a practical to stand up a complex and costly data storage and retrieval solution? Is there another way?You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre