Google SRE Prodcast cover image

Google SRE Prodcast

Latest episodes

undefined
Dec 11, 2024 • 36min

Imperative vs. Declarative Change Workflows with Dominic Hutton & Niccolo' Cascarano

Dominic Hutton, Staff SRE at HashiCorp with a rich background in engineering, teams up with Niccolo' Cascarano, Senior Staff SRE at Google and a pro in continuous delivery systems. They dive into the intriguing world of configuration management, comparing imperative and declarative workflows. Listeners will learn how declarative methods simplify complexity while imperative approaches can cater to quick tasks. The importance of managing scripts, navigating synchronization pitfalls, and fostering collaboration between development and operations also takes center stage.
undefined
Dec 4, 2024 • 41min

Human Factors in Complex Systems with Casey Rosenthal and John Allspaw

Casey Rosenthal, Founder of Cirrusly.ai, and John Allspaw, Principal of Adaptive Capacity Labs, delve into the complexities of resilience in software engineering. They emphasize the crucial human factors that influence system reliability and adaptability during failures. The discussion reveals the pitfalls of traditional incident metrics, advocating for an understanding of qualitative impacts on users. Additionally, they tackle the cultural challenges organizations face in incident management, highlighting the need for transparency and better communication.
undefined
Nov 20, 2024 • 34min

Embracing Complexity with Christina Schulman & Dr. Laura Maguire

In this episode of the Prodcast, we are joined by guests Christina Schulman (Staff SRE, Google) and Dr. Laura Maguire (Principal Engineer, Trace Cognitive Engineering). They emphasize the human element of SRE and the importance of fostering a culture of collaboration, learning, and resilience in managing complex systems. They touch upon topics such as the need for diverse perspectives and collaboration in incident response, the necessity of embracing complexity, and explore concepts such as aerodynamic stability, and more.
undefined
Nov 13, 2024 • 33min

Maglev: load balancing at Google with Cody Smith and Trisha Weir

Cody Smith, CTO and co-founder of Camu Energy, spent over 14 years at Google and contributed to Maglev. Trisha Weir, with 21 years at Google, is an SRE Department Lead. They uncover the evolution of Maglev, a network load balancer essential for traffic management in data centers. Their discussion highlights the significance of psychological safety and collaboration in tech innovation. They also delve into challenges faced during system rollouts, debugging practices, and the shift from manual to automated network provisioning, showcasing a unique blend of technical and teamwork insights.
undefined
Oct 30, 2024 • 42min

Profiling data with Pat Somaru and Narayan Desai

In this episode, guests Narayan Desai (Principal SRE, Google) and Pat Somaru (Senior Production Engineer, Meta) join hosts Steve McGhee and Florian Rathgeber to discuss the challenges of observability and working with profiling data. The discussion covers intriguing topics like noise reduction, workload modeling, and the need for better tools and techniques to handle high-cardinality data.
undefined
Oct 23, 2024 • 32min

Google Public DNS (8.8.8.8) with Wilmer van der Gaast and Andy Sykes

This episode features Google engineers Wilmer van der Gaast (Production on-tall) and Andy Sykes (Senior Staff Systems Engineer, SRE), joining hosts Steve McGhee and Jordan Greenberg, to discuss the development and maintenance of Google Public DNS (8.8.8.8). They highlight the initial motivations for creating the service, technical challenges like cache poisoning and load balancing, as well as the collaborative effort between SRE and SWE teams to address these issues. They also reflect on the evolving nature of SRE and advice for aspiring SREs.
undefined
Oct 16, 2024 • 34min

SRE in the Retail and Gaming Worlds with Jordan Chernev & Scott Bowers

Guests Jordan Chernev (Senior Technology Executive) and Scott Bowers (SRE, Gearbox Software) who hail from the retail and gaming industries, respectively, join hosts Steve McGhee and Jordan Greenberg  to discuss the unique challenges of Site Reliability Engineering in their industries. They share the importance of aligning SLOs with user experience, strategies for handling spikes in traffic, communicating with users during outages, and investing in reliability.
undefined
Oct 9, 2024 • 44min

Incident Response with Sarah Butt and Vrai Stacey

Sarah Butt (Principal Engineer, Centralized Incident Response, Salesforce) and Vrai Stacey (Staff Software Engineer, Google) join hosts Steve McGhee and Jordan Greenberg to dive into incident response—particularly tooling and software for reliability incidents. Tune in for an in-depth discussion on topics such as the importance of communication and collaboration during incidents, and the role of tooling in supporting incident response processes. Sarah and Vrai also share personal takeaways from incidents they have experienced.
undefined
Oct 2, 2024 • 42min

Building Reliable Systems with Silvia Botros and Niall Murphy

Silvia Botros (SRE Architect, Twilio | Author of "High Performance MySQL, 4th edition”) and Niall Murphy (Co-founder & CEO, Stanza) join hosts Steve McGhee and Jordan Greenberg, to discuss cultural shifts in database engineering, rate limiting, load shedding, holistic approaches to reliability, proactive measures to build customer trust, and much more!
undefined
Sep 25, 2024 • 29min

Creating Systems that are Safe with Liz Fong-Jones

Liz Fong-Jones, a former Google SRE and current Field CTO at honeycomb.io, dives into the fascinating world of observability. She shares insights on how observability has evolved from traditional monitoring, likening it to medical diagnostics. Liz emphasizes its critical role in enhancing user satisfaction through Service Level Objectives (SLOs) and discusses the balance between human insight and machine learning in system analysis. Additionally, she highlights the transformation of Site Reliability Engineering, advocating for collaboration and hands-on experience in modern software development.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode