#42670
Mentioned in 1 episodes

Site Reliability Engineering

How Google Runs Production Systems
Book • 2016
This book provides insights into how Google's Site Reliability Engineering (SRE) teams manage and maintain large-scale systems.

It covers principles, practices, and management strategies that enable scalable, reliable, and efficient systems.

The book is divided into sections that explore the introduction to SRE, its core principles, day-to-day practices, and management best practices.

Mentioned by

Mentioned in 1 episodes

Mentioned by Justin Garrison as a book that highlights the power that Google gave to its SRE team to say no to supporting unreliable services.
Animating the Stack with Sam Rose
Mentioned by Stephen Welch while discussing large-scale databases and reliable software systems.
453: Big Global Problems Worth Solving with Machine Learning

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app