
Ep. #7, The March 2023 Datadog Outage with Laura de Vesine
Heavybit Podcasts
00:00
The Impact of Out of Band Monitoring on Engineering
Most of our monitoring is built on data dog, but as responsible engineers, we do not monitor data dog exclusively using the same infrastructure stack. So that team got alerted, got online very quickly saw this was substantial impact. Exculated it to a severity that automatically pages are sort of incident response on call rotation. Because our tooling was impacted, it took them around 10 minutes to open an incident and escalate it to me. And then because it was one in the morning, I had literally just gone to bed,. It took me a couple of minutes to get online.
Transcript
Play full episode