

Slight Reliability
Stephen Townshend
Learning SRE, one day at a time.
Episodes
Mentioned books

Feb 21, 2023 • 39min
Slight Reliability Episode 44 - Cognitive Overload with Paige Cruz
Send us a textIn this episode we discuss cognitive overload in SRE with Paige Cruz from Chronosphere. We cover both what cognitive load is, what causes it, as well as some potential antidotes and preventative measures.You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 14, 2023 • 10min
Slight Reliability Episode 43 - Beyond Observability
Send us a textIn this episode I discuss my "bigger picture" perspective of what observability needs to be, and why it's important we include business and customer into what we monitor in the Digital Era.The books I highlight in this episode are...Observability Engineering https://www.oreilly.com/library/view/observability-engineering/9781492076438/Sooner, Safer, Happier: https://soonersaferhappier.com/book/The Phoenix Project https://www.oreilly.com/library/view/the-phoenix-project/9781457191350/The Unicorn Project https://www.oreilly.com/library/view/the-unicorn-project/9781098124175/Accelerate: https://www.oreilly.com/library/view/accelerate/9781457191435/You can grab a copy of the 2022 State of DevOps report at: https://cloud.google.com/devops/state-of-devopsThe blog I mentioned was The Insight Industrial Complex: https://benn.substack.com/p/insight-industrial-complexYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Feb 7, 2023 • 37min
Slight Reliability Episode 42 - Reliability Insights with José Velez
Send us a textIn this episode we speak to José Velez from Rely about reliability at scale, a top down approach to SLOs, the potential and limitations of AI and ML in operations, the question of service ownership, utilising the business criticality of services in how we monitor the underlying infrastructure, and much more.You can check out Rely at https://www.rely.io/You can find José on LinkedIn: https://www.linkedin.com/in/josevelez-relyio/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Jan 31, 2023 • 32min
Slight Reliability Episode 41 - Testing with Traces (with Ken Hamric)
Send us a textIn this episode we speak to Ken Hamric about distributed tracing, leveraging tracing for better testing, and observability driven development.The tool that Henrik Rexed integrated with Tracetest was Kuberhealthy (https://www.cncf.io/projects/kuberhealthy/) and you can watch a video of him discussing it in combination with Tracetest here: https://youtu.be/PKQQEeeMYxg?t=2492Ken also mentioned Charity Majors' writing about observability driven development: https://thenewstack.io/a-next-step-beyond-test-driven-developmentYou can check out Tracetest: - The official website: https://tracetest.io/- GitHub repo: https://github.com/kubeshop/tracetest- Discord channel: https://discord.com/channels/884464549347074049/963470167327772703You can find Ken on LinkedIn: https://www.linkedin.com/in/ken-hamric-016b1420/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Jan 24, 2023 • 11min
Slight Reliability Episode 40 - Drowning in an Observability Data Lake
Send us a textIn this episode Stephen explores the pros and cons of centralising observability data. Is it a practical to stand up a complex and costly data storage and retrieval solution? Is there another way?You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Jan 17, 2023 • 42min
Slight Reliability Episode 39 - The Future of SRE with Adriana Villela and Ana Margarita Medina
Send us a textThis week I am joined by Ana Margarita Medina and Adriana Villela, the hosts of the On-Call Me Maybe podcast, to discuss what we'd like to see for SRE in 2023. We talk about observability, SRE recruitment, what organisations need in place to set SRE up for success, and much more.You can find the On-Call Me Maybe podcast on most podcast platforms or go directly to the website here: https://oncallmemaybe.com/Twitter: https://twitter.com/oncallmemaybeMastodon: https://mastodon.social/@oncallmemaybeYou can find Adriana on:LinkedIn: https://www.linkedin.com/in/adrianavillela/Twitter: https://twitter.com/adrianamvillelaMastodon: @adrianamvillela@hachyderm.ioBlog: https://adri-v.medium.com/ You can find Ana on:LinkedIn: https://www.linkedin.com/in/anammedina/Twitter: https://twitter.com/Ana_M_MedinaMastodon: @anamedina@hachyderm.ioYou can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Jan 9, 2023 • 10min
Slight Reliability Episode 38 - SRE Reading
Send us a textTo begin 2023 I share the books I read last year in my quest to be a better SRE.Here is a list of all the books mentioned during the episode:The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262592Site Reliability Engineering (by Google) https://sre.google/sre-book/table-of-contents/Sooner, Safer, Happier by Jonathon Smart https://soonersaferhappier.com/book/The Toyota Way by Jeffrey Liker https://www.amazon.com/Toyota-Way-Second-Management-Manufacturer/dp/1260468518Remote: Office Not Required by Jason Fried https://www.amazon.com/Remote-Office-Required-Jason-Fried/dp/0091954673Driving Digital Strategy by Sunil Gupta https://www.amazon.com/Driving-Digital-Strategy-Reimagining-Business/dp/163369268XTeam Topologies by Matthew Skelton and Manuel Pais https://teamtopologies.com/bookAccelerate by Nicole Forsgren, Jez Humble, and Gene Kim https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339The Manager’s Path by Camille Fournier https://www.oreilly.com/library/view/the-managers-path/9781491973882/Staff Engineer by Will Larson https://staffeng.com/bookGetting Things Done by David Allen https://gettingthingsdone.com/books/Thinking, Fast and Slow by Daniel Kahneman https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555Lean Enterprise by Jez Humble, Joanne Molesky, and Barry O’Reilly https://www.oreilly.com/library/view/lean-enterprise/9781491946527/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Dec 19, 2022 • 46min
Slight Reliability Episode 37 - Observability New Year's Resolutions with Henrik Rexed
Send us a textThis week Henrik Rexed and Stephen Townshend discuss their New Year's resolutions for observability. They cover OpenTelemetry and a unified query language, continuous profiling, raw data analysis, instrumenting code, using distributed tracing as part of testing, and much more.Some of the tools or resources mentioned during the episode include:https://tracetest.io/ (distributed tracing for testing)https://github.com/open-telemetry/opamp-go (OTEL orchestration)https://ebpf.io/ (for continuous profiling)You can find Henrik on LinkedIn: https://www.linkedin.com/in/hrexed/ and Twitter: https://twitter.com/HrexedYou can find the Is It Observable? series on YouTube: https://www.youtube.com/@IsitObservableAnd the Perfbytes Podcast on most podcast platforms: https://www.perfbytes.com/p/perfbytes.htmlYou can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Dec 12, 2022 • 28min
Slight Reliability Episode 36 - Starting an SRE Team from Scratch with Gwen Berry and Steve Gill
Send us a textThis week we talk to Steve Gill and Gwen Berry from IAG to discuss their experiences forming an SRE incubator team (starting SRE from scratch in a large enterprise). We discuss on-call, SLOs, single pane of glass, pivoting, chaos engineering, and much more.You can find Steve on LinkedIn: https://www.linkedin.com/in/stevegill239/You can find Gwen on LinkedIn: https://www.linkedin.com/in/gwen-berry-56324418b/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre

Dec 5, 2022 • 16min
Slight Reliability Episode 35 - SRE Trends from re:Invent 2022
Send us a textThis week I share the observations I made at AWS re:Invent relating to SRE work including the lack of SREs at the event, data warehouses for observability data, the use of topologies to understand complexity, FinOps, serverless, making sense of enormous amounts of data... and more.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre