The chapter explores the challenges of deploying a web application with a monolithic code base at Slack and the implementation of Z-Scores to improve anomaly detection. It discusses the transition from static thresholds to dynamic thresholds, reducing false positives and improving alert accuracy. The chapter also highlights the use of sophisticated UIs and automation in anomaly detection for more efficient monitoring and deployment processes.
Click here to view the episode transcript.
This week we’re joined by Sean Mcllroy from Slack’s Release Engineering team to learn about how they’ve fully automated their deployment process. This conversation covers Slack’s original release process, key changes Sean’s team has made, and the latest challenges they’re working on today.
Mentions and links:
Time Stamps:
- (1:34): The Release Engineering team
- (2:13): How the monolith has served Slack
- (3:24): How the deployment process used to work
- (6:23): The complexity of the deploy itself
- (7:39): Early ideas for improving the deployment process
- (9:07): Why anomaly detection is challenging
- (10:32): What a Z-score is
- (13:23): Managing noise with Z-scores
- (16:49): Presenting this information to people that need it
- (19:54): Taking humans out of the process
- (23:13): Handling rollbacks
- (25:27): Not overloading developers with information
- (28:26): Handling large deployments