How Slack fully automates deploys and anomaly detection with Z-scores | Sean Mcllroy (Slack)
Apr 23, 2024
auto_awesome
Sean Mcllroy from Slack's Release Engineering team discusses fully automating deploys and anomaly detection. Topics include the challenges of the deployment process, using Z-scores for anomaly detection, managing noise, automating deployments for efficiency, and handling large deployments to improve reliability.
Slack transitioned from manual to automated deployment with Z-Scores for anomaly detection.
Maintaining Slack's monolithic structure is strategic for productivity benefits and efficient resource management.
Deep dives
Automating Deployment Process at Slack
The episode discusses how Sean and his team at Slack fully automated the deployment process. Initially, Slack had a manual rotation of deploy commanders overseeing releases. Sean's team developed tools to aid deploy commanders in their tasks, ultimately leading to full automation of deploys. Challenges faced by the team and the current focus on overcoming new obstacles are also highlighted.
Monolithic Code Base at Slack
Despite industry trends moving away from monolithic structures, Slack's large monolithic web app continues to serve efficiently. The decision to maintain the monolith rather than decompose it is based on its productivity benefits and resource focus. While operational challenges arise due to the scale of the application, having dedicated teams helps in managing infrastructure effectively.
Transition to Automated Deployments
The podcast discusses the transition from manual deployment processes at Slack to automated systems. Previously, a deploy commander schedule was followed, involving manual oversight during deployments. The introduction of automation through tools like Z-Scores enabled efficient anomaly detection, leading to the full automation of the deployment process.
Enhancing Developer Experience
Improvements in the deployment process at Slack focus on enhancing developer experience and productivity. Initiatives include automated rollbacks for quick error resolution, balancing information dissemination without overwhelming developers, and addressing challenges related to the sequential release of builds. The team's efforts aim to streamline deployments, improve efficiency, and maintain reliability across the platform.
This week we’re joined by Sean Mcllroy from Slack’s Release Engineering team to learn about how they’ve fully automated their deployment process. This conversation covers Slack’s original release process, key changes Sean’s team has made, and the latest challenges they’re working on today.