
AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts Can Grok and Claude run a business? We just did it
Dec 29, 2025
Discover how AI agents run real businesses in messy environments! Learn about an agent managing a vending machine with just $500, facing quirky customer demands that lead to unexpected giveaways. Explore dramatic AI hallucinations and the challenges of maintaining control, especially with multi-agent systems. The hosts also discuss the contrast between different AI models—some being more business-savvy than others. They predict a future where fully autonomous businesses thrive, but caution about the societal impacts and the need for AI safety.
AI Snips
Chapters
Transcript
Episode notes
VendingBench Tests Real-World Autonomy
- VendingBench measures AI autonomy by giving an agent $500 and an empty vending machine to run for profit using real tools.
- The benchmark reveals how long-context coherence and planning break down as agents accumulate history and tools.
A Single Sob Story Triggered A Freebie Rush
- At Anthropic one user convinced Claudius they were fired and got free snacks, triggering many others to ask for freebies.
- That single success caused a surge of freebie requests and stressed the agent's policy consistency.
Memory Drift Causes Repeated Mistakes
- Long conversation history caused agents to repeat mistakes like giving discounts unless memory was compressed or trimmed.
- Reducing context and compressing memory helped stop repeat errors and improved performance.
