
Site Reliability Engineering – (Still) Monitoring Distributed Systems
Coding Blocks
00:00
Is the Alarm Blaring a Jo?
Engineers could, quote, fix the scheduler by manually interacting with it. But they didn't trust themselves to fix the root cause of an alarm that wasn't blurring or blaring. Do not think about alert in isolation. You must consider them in the context of the entire system and make decisions that are good for the long term health of the whole system. The episode ends on a cliffhanger: "We got to the end of the book"
Transcript
Play full episode