The BugBash Podcast

Every map is wrong, but we made one anyway

Sep 3, 2025
Kyle Kingsbury, a leading distributed systems researcher known as Aphyr, joins David Wynn to share his insights on creating the Distributed Systems Reliability Glossary. They discuss the surprising effects of cosmic rays on computing and the challenges of cloud data integrity. Kyle reveals the complexities behind testing distributed systems, including adversarial methods and the importance of clear definitions. The conversation also touches on AI-generated code's impact on software reliability and the ongoing quest for reproducibility in tech.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Glossary As A Practical Map

  • The Distributed Systems Reliability Glossary maps terms to practical testing-focused definitions for engineers and testers.
  • Kyle aims for "directionally correct" definitions that guide where to look next rather than exhaustive formalism.
ADVICE

Use It To Start Testing

  • Use the glossary as a jumping-off point to decide what properties to test and which faults to inject.
  • Start with simple, practical definitions and follow links to deeper papers when needed.
INSIGHT

Words Mean Different Things

  • Many distributed-systems terms are used inconsistently across contexts, causing confusion.
  • The glossary chooses particular wordings (e.g., disk vs storage fault) to provide clarity for practitioners.
Get the Snipd Podcast app to discover more snips from this episode
Get the app