The BugBash Podcast

Antithesis
undefined
Dec 10, 2025 • 1h 19min

Hypothesis vs. Hallucinations: Property Testing AI-Generated Code

Large Language Models can generate code in a flash, but that code is notoriously unreliable. Traditional unit tests often can’t put enough guardrails in place to ensure correctness… even if they’re written by the LLM itself.This is where property-based testing (PBT) becomes essential.Today, we're joined by David R. MacIver, creator of the PBT library Hypothesis, and now an Antithesis employee! We discuss how to build robust feedback loops that are needed to make AI-generated code trustworthy.We'll cover why standard AI coding benchmarks are flawed, how Hypothesis makes PBT approachable, and the challenge of getting developers to think in "invariants." David also shares his perspective on the future of AI in software engineering.If you want to build a reliability backstop for your code, vibed or otherwise, stick around.
undefined
Nov 26, 2025 • 40min

From the Lab to Production: Making Cutting-Edge Testing Practical

Software testing research is exploding, but in practice, most companies' testing approaches seem stuck in the past. Where does that gap come from?It often boils down to the distance between academic promises and the practical needs of developers who need usable tools and fast results.In this episode, David talks with Rohan Padhye, head of the PASTA research group at Carnegie Mellon University, who has lived on both sides of that divide. They explore how fuzz testing crossed that chasm—from industry curiosity to academic focus and back again—and what it will take for other techniques to do the same.Rohan shares insights on designing testable software, building a robust testing culture, and what truly makes a "good" property for finding bugs.
undefined
Nov 12, 2025 • 40min

Ergonomics, reliability, durability

Integrating non-deterministic, non-durable elements like AI agents into our workflows tends to lead to a lot of do-overs. But restarting AI processes can be costly, burning through tokens and losing valuable progress. Wouldn’t it be easier if there was always a clear checkpoint to restart a task from? Today I talk with Qian Li, co-founder of the DBOS durable execution engine, about reliability, ergonomics, and actually understanding your software. We discuss the long history of checkpointing, mental models, and how using durable execution allows systems to resume right where they left off after a crash. It makes your software resilient by default.Learn how this architectural pattern can impact an AI-assisted or any complex system that could use a little improvement in how developers work with it.
undefined
Oct 30, 2025 • 55min

No actually, you can property test your UI

How do you test for bugs that only appear when a user clicks frantically, or when asynchronous data loads in an unexpected order? Standard UI tests often miss the subtle stuff that happens all the time in the stateful, dynamic applications.In this episode, Paul Ryan and I sit down with Oskar Wickström, creator of the QuickStrom framework, among other things, to explore how to apply generative testing to the complex world of user interfaces. Oskar argues that you don't need to be a formal methods genius to get real value out of the approach. Even simple properties can uncover deep bugs, like ensuring a loading spinner eventually disappears or that the screen never goes blank. If you've been intrigued by property-based testing but intimidated by the thought of writing complex formal models for UIs, stick around. 
undefined
Oct 15, 2025 • 53min

Slow down to go fast: TDD in the age of AI with Clare Sudbery

AI coding assistants promise incredible speed, but what happens when you run straight into a wall of buggy code and technical debt?In this episode, Clare Sudbery, a software engineer with over 25 years of experience, discusses a crucial paradox for modern developers. The secret to harnessing AI's power isn't to move faster, but to slow down. Clare explains why deliberate, rigorous practices like Test-Driven Development (TDD) are the essential "guardrails" needed to guide AI tools toward reliable, high-quality software. You'll learn why "more, smaller steps" is the key to tackling technical debt and how throwing your code away might be the most productive thing you do all week.
undefined
Oct 1, 2025 • 1h 11min

Fixing five "two-year" bugs per day

Some bugs are so rare, they can take years to track down and fix. What if you could find and fix five of them per day? For Joran Dirk Greef, the creator of the TigerBeetle database, that's not a wild dream — it's how his team works every day. While most people think building a new database takes a decade, Joran's team built TigerBeetle in just three and a half years. The key is a unique philosophy for writing software called "Tiger Style".Joran joins the show to share the secrets behind their speed and safety. You'll hear why he thinks picking C would have been a "fatal" mistake , how a strict rule about memory can force you to write better code , and why Zig was the perfect choice for TigerBeetle.The key to it all is a powerful testing method that Joran calls "existential" for any important project. If you want to hear more about how his team turns squashing impossible bugs into their normal day-to-day, stay tuned.
undefined
Sep 18, 2025 • 42min

No really, some bugs aren’t real

When is a bug not really a bug? In this episode, host David Wynn talks with SRE veteran Dan Slimmon about a radical idea: chasing perfect code might not be the best way to make your service reliable.Dan argues that once your code is "good enough," most outages aren't caused by code defects. They're caused by weird interactions between different parts of a system or by users doing things you would never expect. He shares wild stories from his career, including how a tiny database hiccup created a massive, repeating traffic jam and how a single user crashed servers by uploading a 3.2-gigabyte config file.This conversation will make you rethink what you thought you knew about bugs, quality, and what "reliability" truly means.
undefined
Sep 3, 2025 • 53min

Every map is wrong, but we made one anyway

Kyle Kingsbury, a leading distributed systems researcher known as Aphyr, joins David Wynn to share his insights on creating the Distributed Systems Reliability Glossary. They discuss the surprising effects of cosmic rays on computing and the challenges of cloud data integrity. Kyle reveals the complexities behind testing distributed systems, including adversarial methods and the importance of clear definitions. The conversation also touches on AI-generated code's impact on software reliability and the ongoing quest for reproducibility in tech.
undefined
Aug 20, 2025 • 53min

Fail loudly, fail fast, fail in production

Is it just a fact of life that software is broken? Our industry often operates as if the answer is "yes." We write tests, we fix bugs, but we seem to accept a certain level of failure as the cost of doing business. Our guest today is tired of it.Isaac Van Doren is a software engineer at Paytient, a healthcare payment solutions provider,   and he’s "sick of software being broken all the time". Isaac makes the provocative case for a radical cultural shift in how we approach software reliability. He argues that we need to move beyond the narrow view that reliability simply equals testing and instead adopt practices that force us to be explicit about the rules of our systems.Listen to explore a different philosophy of development—one where engineers are fully responsible for defining business logic , assertions are a tool for building a "theory of the system" , and failures in production are not just bugs, but immediate, unmissable signals that our understanding was wrong. This conversation will challenge your assumptions and give you a new vocabulary for building software that, as Isaac puts it, "actually works".
undefined
Aug 6, 2025 • 1h 8min

Scaling Correctness: Marc Brooker on a Decade of Formal Methods at AWS

Marc Brooker, Distinguished Engineer at AWS, shares insights from his nearly 17 years of experience building essential cloud services like S3 and Lambda. He reveals how AWS's journey into formal methods transformed software correctness, enhancing both reliability and development speed. The discussion highlights innovative testing strategies, the challenges of applying these methods in complex systems, and the game-changing potential of AI in programming. From the intricate landscape of verification to the tech scene in Cape Town, Marc offers a glimpse into the future of software development.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app