

Reliability Enablers
Ash Patel & Sebastian Vietz
Software reliability is a tough topic for engineers in many organizations. The Reliability Enablers (Ash Patel and Sebastian Vietz) know this from experience. Join us as we demystify reliability jargon like SRE, DevOps, and more. We interview experts and share practical insights. Our mission is to help you boost your success in reliability-enabling areas like observability, incident response, release engineering, and more. read.srepath.com
Episodes
Mentioned books

Dec 5, 2023 • 35min
#18 Winning at SRE in Banking and Telecom (with Troy Koss)
Ash Patel talks with Troy Koss who is the Director of SRE at CapitalOne, an early adopter of DevOps and SRE in the banking sector. He shares insights on working in regulated industries like banking telecom with his early work experience being at Verizon, a US telecom. Troy shares his thoughts on building stronger SRE individual contributors and emphasizes the importance of education as pivotal to ongoing reliability success. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Nov 27, 2023 • 46min
#17 Lessons from SRE's Wild West Days (with Rick Boone)
Ash Patel talks with Rick Boone who is a pioneer in SRE, having been an early AppOps engineer at Facebook and Uber's first SRE hire. He shares amazing stories from those pioneering days. Rick also draws from his experience to share his insights on how to build stronger SRE teams, as well as support effective career progression for individual contributor SREs. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Nov 21, 2023 • 39min
#16 Acing Cloud Infra in Digital Media Giant (with Sreejith Chelanchery)
Ash Patel interviews Sreejith Chelchery who is SVP of Delivery and Infrastructure Engineering at Dotdash Meredith. Sreejith shares his journey from programming analyst in Bangalore, India, to now being an executive responsible for platform engineering, DevOps, and SRE at a media giant in New York City.He gives a glimpse into how his team saved his organization over $9 million in cloud computing costs, how they started an internal developer platform well before Backstage was around, and more. Sreejith also sheds light on how changemakers and advocates like SREs can win over business and other non-technical stakeholders. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Nov 14, 2023 • 43min
#15 Growing Reliability Engineering Across 5+ Companies (with Nash Seshan)
Ash Patel talks with Nash Seshan, who has supported reliability work in over 5 organizations, including Cisco, eBay, Dropbox, Lyft, Netflix, and Wayfair. He shares his learnings from reliability work at these big brands. Nash also draws from his experience as co-founder of a Y Combinator-funded startup on effective engineering leadership. He also gives his take on issues with ill-conceived automation. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Nov 7, 2023 • 42min
#14 Faster Incident Resolution through Data-Driven Notebooks (with Ivan Merrill)
Ash Patel talks with Ivan Merrill of Fiberplane about wrangling the big data that incidents and systems generate through collaborative notebooks. Ivan also touches on how open-source tools like Autometrics enable deeper observability of code by increasing the granularity of data used for incident response and retrospectives. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Oct 31, 2023 • 33min
#13 Making Sense of OpenTelemetry and Observability (with Adriana Villela)
Ash Patel talks with Adriana Villela (CNCF Ambassador, OpenTelemetry contributor, and senior developer advocate at Lightstep) about the promise of OpenTelemetry for observability teams, as well as the challenges of doing it right. She also touches on engineering leadership topics, recalling her experience as a leader of platform engineering and observability teams. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Oct 24, 2023 • 29min
#12 From Incident Firefighting to Reliability First (with Robert Ross)
Ash Patel talks with Robert Ross of Firehydrant about his experience in offering incident management software to SREs and other software incident responders. Highlights include defining the broader concept of reliability, making smarter choices for handling incidents, and more. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Oct 17, 2023 • 27min
#11 Rising to Staff Engineer in DevOps and SRE (with Rajesh Reddy N)
Ash Patel interviews Rajesh Reddy N about his experiences as a senior DevOps and SRE individual contributor. Rajesh shares his insights on having systems to minimize alert fatigue, the importance of security in DevOps, and more. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Oct 10, 2023 • 24min
#10 Using AI for Kubernetes troubleshooting self-service (with Kyle Forster)
Ash Patel interviews Kyle Forster of RunWhen about his experiences as an ex-Google director helping SREs and running an AI-based company that supports Kubernetes troubleshooting. Their conversation will cover themes like enabling junior SREs, the role of SRE in shift-left, and handling misaligned incentive models in organizations. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Oct 2, 2023 • 29min
#9 Inside Booking.com's Site Reliability Engineering practice (with Samuele Tonon and Yoann Fouquet)
In this episode of the SREpath Podcast, Ash Patel interviews two SRE managers from Booking.com, Samuele and Yoann, to gain insights into their experiences and strategies for developing a successful SRE practice within a large organization. Yoann is a senior manager responsible for managing SRE teams and serves as the SRE Craft lead. Samuele is an SRE engineering manager working in the Big Data department and manages a team of eight to nine people.Yoann officially began his journey in SRE in 2017, transitioning from a consultancy role to an engineer focused on reliability.Samuele's background included network engineering and DevOps roles before he joined Booking.com in 2018 as an SRE.Booking.com initially didn't have SREs but started adopting SRE practices in 2017 as they transitioned from a monolithic architecture to microservices.The SRE team at Booking.com grew from around 20-30 members to nearly 200, with various teams handling infrastructure, central roles, and embedded roles with product teams. Learn more about the challenges they faced and tackled by listening to the episode. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com