Cloudflare's bot-defense config refresh pulled doubled data from ClickHouse after a DB change, causing Rust code to hit a 200-item limit and call unwrap which panicked the process.
The panic crashed the critical service and returned 500s, producing a global outage that lasted many hours before a fix and cleanup restored service.
question_answer ANECDOTE
Personal Site Taken Offline
Matthew Sanabria lost access to his home-hosted personal site because he used Cloudflare's DNS/tunnel feature to front it.
This example illustrated how even small personal services can be impacted by large-provider outages.
volunteer_activism ADVICE
Handle Errors Explicitly In Production
Avoid calling unwrap on fallible results in production code and explicitly handle error cases to prevent panics.
Add defensive limits, graceful degradation, and proper error handling to reduce blast radius when downstream data changes.
Get the Snipd Podcast app to discover more snips from this episode
This cannot keep happening. Another day, another outage. On this week's episode Kris and Matt talk about the recent Cloudflare outage. And boy do they have thoughts, we really hope you enjoy this exchange of monologues.
If you prefer to watch this episode, you can view it on YouTube: https://youtu.be/LsOgDolc9Fw
This week's episode of break continues the conversation, with a few more monologues and some thinking about the state of things. Watch it on YouTube or listen with your favorite podcasting app! Learn more by going to https://break.show/17.
And we've got bonus content for our supporters, where you'll hear about the Cloudflare outage in a bit more depth and hear the duos take on being a generalist versus a specialist. Not a supporter yet? Fix that today by heading over to https://fallthrough.fm/subscribe where you'll get not only extra content but also higher quality audio. Sign up today!
Thanks for tuning in and happy listening!
Table of Contents:
Prologue (00:00:00)
Chapter 1: The Cloudflare Outage (00:02:17)
Chapter 2: Too Much Centralization? (00:20:24)
Chapter 3: Communication Matters (00:26:22)
Chapter 4: Magic Numbers Take Down The Internet [Extended] (00:29:50)
Chapter 5: Programming Language Hate and AI versus Tools (00:30:19)
Chapter 6: The Generalist and The Specialist [Extended] (00:49:29)