Every map of a complex territory is inherently wrong, but without one, we're completely lost. So what happens when the territory is the vast, ever-changing landscape of distributed systems?
In this episode, David Wynn sits down with Kyle Kingsbury, the renowned researcher behind Jepsen, to discuss a monumental effort to chart this landscape: the new Distributed Systems Reliability Glossary.
Kyle explains why he and T.W. Lim from Antithesis felt the need to "put this all in one place," creating a practical roadmap for testers and engineers navigating the field. They explore the challenge of creating "directionally correct" definitions , the surprising "urban legends" that persist in system design (like yelling at hard drives to increase error rates ), and why even the most rigorous formal models can drift from the code they're meant to describe.
Tune in for a deep dive into the subtle bugs that defy simple explanations , the future of reliability in the age of AI-generated code, and the one problem Kyle is still determined to solve in his own work: reproducibility.