Slight Reliability

Stephen Townshend
undefined
Jul 24, 2024 • 36min

Slight Reliability Episode 87 - Measuring the value of SRE with Artem Yakimenko

Send us a textIn Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this.We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relationship between finance and performance, the link between user design and performance, and so much more.Books mentioned in the episode:100 Things Every Designer Needs to Know About PeopleBy Susan Weinschenkhttps://www.amazon.com.au/Things-Every-Designer-Needs-People/dp/0321767535You can find Artem on LinkedIn: https://www.linkedin.com/in/temikus/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
undefined
Jun 8, 2024 • 26min

Slight Reliability Episode 86 - Evolving SLOs with Dom Finn

Send us a textIn the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more.Books mentioned in the episode:The Beginning of Infinity: Explanations That Transform the WorldBy David Deutchhttps://www.amazon.com.au/Beginning-Infinity-Explanations-Transform-World/dp/0143121359Turn The Ship Around!By David Marquettehttps://davidmarquet.com/turn-the-ship-around-book/You can find Dom on LinkedIn: https://www.linkedin.com/in/dom-finn/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
undefined
May 2, 2024 • 11min

Slight Reliability Episode 85 - Feeling SaaSsy

Send us a textThis week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshendYou can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Mar 30, 2024 • 28min

Slight Reliability Episode 84 - Clinical Troubleshooting with Dan Slimmon

Send us a textThis week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
undefined
Mar 5, 2024 • 31min

Slight Reliability Episode 83 - An Unfulfilled Promise with Itiel Shwartz

Send us a textThis week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more.You can find the Kubernetes for Humans podcast here:https://komodor.com/blog/the-kubernetes-for-humans-podcast/Or find out more about Komodor here:https://komodor.com/Or find Itiel on LinkedIn: https://www.linkedin.com/in/itiel-shwartz-18542853/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
undefined
Feb 13, 2024 • 26min

Slight Reliability Episode 82 - CI/CD with Amin Astaneh

Send us a textThis week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
undefined
Feb 6, 2024 • 10min

Slight Reliability Episode 81 - Incident Management in Non-Prod Environments

Send us a text"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently?In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments.(Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode)You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
undefined
Nov 22, 2023 • 37min

Slight Reliability Episode 80 - What's Been Bugging Niall Murphy

Send us a textThis week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy.We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articulating the value of SRE to leaders, the relationship that velocity and reliability have, the value of new features versus reliability improvements, and *much* more.You can find Niall at:LinkedIn: https://www.linkedin.com/in/niallm/X: https://twitter.com/niallmWebsite: https://relyabilit.ie/(and his company Stanza: https://www.stanza.systems/)You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
undefined
Nov 21, 2023 • 45min

Slight Reliability Episode 76 - Sampling Distributed Traces with Paige Cruz

Send us a textPaige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there?You can check out Chronosphere's cloud native observability platform here: https://chronosphere.io/You can find Paige on:LinkedIn: https://www.linkedin.com/in/paigerduty/X: https://twitter.com/paigerdutyYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
undefined
Nov 20, 2023 • 38min

Slight Reliability Episode 79 - Incident Story Time with Valeska Victoria

Send us a textThis week Valeska Victoria returns to share some of her experiences working as an SRE at eBay.We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer literacy of how infrastructure works, the importance of ownership and accountability of reliability, and much more.You can find Valeska on: LinkedIn: https://www.linkedin.com/in/valeska-victoria/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app