SRE handbook

How Google Runs Production Systems
Book •
Niall Murphy's book has is an important resource for site reliability engineering (SRE) and for system administrators.

In the book, the author presents a set of guidelines and practices for managing and operating large-scale, distributed systems.

The book has chapters on monitoring, automation, incident response, and capacity planning.

It also includes a discussion of the cultural and organizational aspects of SRE. The book also provides examples and case studies to illustrate the concepts presented and is widely recommended and well known within the field.

Mentioned by

Mentioned in 0 episodes

Mentioned by
undefined
Demetrios Brinkmann
as the author of the seminal book on SRE.
#1 [Interview] Building developer communities of craft with Demetrios Brinkmann

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app