AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Building Resilient Software and Ensuring Service Quality
The chapter discusses the challenges of maintaining a stable service like Slack and the importance of being able to recover from issues quickly. They mention instances where Slack has gone down and how they have implemented measures to detect and recover from problems instantaneously. They also talk about the engineering challenge of handling network flakiness and the need to constantly communicate with vendors.