

Keep on-call simple (Ship It! #36)
Jan 20, 2022
Ildar Iskhakov and Matvey Kukuy, co-founders of Amixr (Grafana OnCall), join a lively discussion on simplifying on-call processes. They share insights on the complexities of being on call, the emotional toll it takes, and the importance of clear communication during incidents. The duo dives into the tech stack that powers their solution, emphasizing operational efficiency with tools like Django and Kubernetes. They also talk about mastering on-call alert notifications and creating customizable incident management tools to empower engineers while minimizing alert fatigue.
AI Snips
Chapters
Books
Transcript
Episode notes
Nighttime On-Call Experience
- Ildar shared waking up in the night on call to fix incidents with team assistance.
- Most issues required immediate collaborative response rather than solo fixes.
Manage Incidents Proactively
- Incidents will inevitably occur and require proper follow-up to prevent recurrence.
- Hold regular meetings to discuss incidents and add mitigation action items to your backlog.
Slack Notification Mishap
- Ildar recounted accidentally sending notifications to the entire organization via Slack.
- The event became an unintended growth hack, generating more interest in their product.