Engineering Enablement by Abi Noda

How Slack fully automates deploys and anomaly detection with Z-scores | Sean Mcllroy (Slack)

10 snips
Apr 23, 2024
Sean Mcllroy from Slack's Release Engineering team discusses fully automating deploys and anomaly detection. Topics include the challenges of the deployment process, using Z-scores for anomaly detection, managing noise, automating deployments for efficiency, and handling large deployments to improve reliability.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Manual Deploy Process

  • Slack used a deploy commander schedule with engineers manually overseeing deployments.
  • This process was stressful, especially for less experienced engineers, due to the complexity of monitoring numerous graphs.
INSIGHT

Anomaly Detection Challenge

  • Anomaly detection is challenging because it requires distinguishing between objective badness and anomalous behavior.
  • Z-scores offer a solution by measuring the size of a spike relative to historical data, enabling identification of anomalies without static thresholds.
INSIGHT

Z-Score Implementation

  • A Z-score measures how far a data point is from the mean, in units of standard deviation.
  • Slack uses a Z-score of 3 as a threshold, indicating a significant spike and potential anomaly, based on analysis of past incidents.
Get the Snipd Podcast app to discover more snips from this episode
Get the app