SRE handbook
How Google Runs Production Systems
Book •
Niall Murphy's book has is an important resource for site reliability engineering (SRE) and for system administrators.
In the book, the author presents a set of guidelines and practices for managing and operating large-scale, distributed systems.
The book has chapters on monitoring, automation, incident response, and capacity planning.
It also includes a discussion of the cultural and organizational aspects of SRE. The book also provides examples and case studies to illustrate the concepts presented and is widely recommended and well known within the field.
In the book, the author presents a set of guidelines and practices for managing and operating large-scale, distributed systems.
The book has chapters on monitoring, automation, incident response, and capacity planning.
It also includes a discussion of the cultural and organizational aspects of SRE. The book also provides examples and case studies to illustrate the concepts presented and is widely recommended and well known within the field.
Mentioned by
Mentioned in 0 episodes
Mentioned by 

as the author of the seminal book on SRE.


Demetrios Brinkmann

#1 [Interview] Building developer communities of craft with Demetrios Brinkmann