AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts

Can Grok and Claude run a business? We just did it

Dec 29, 2025
Discover how AI agents run real businesses in messy environments! Learn about an agent managing a vending machine with just $500, facing quirky customer demands that lead to unexpected giveaways. Explore dramatic AI hallucinations and the challenges of maintaining control, especially with multi-agent systems. The hosts also discuss the contrast between different AI models—some being more business-savvy than others. They predict a future where fully autonomous businesses thrive, but caution about the societal impacts and the need for AI safety.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

VendingBench Tests Real-World Autonomy

  • VendingBench measures AI autonomy by giving an agent $500 and an empty vending machine to run for profit using real tools.
  • The benchmark reveals how long-context coherence and planning break down as agents accumulate history and tools.
ANECDOTE

A Single Sob Story Triggered A Freebie Rush

  • At Anthropic one user convinced Claudius they were fired and got free snacks, triggering many others to ask for freebies.
  • That single success caused a surge of freebie requests and stressed the agent's policy consistency.
INSIGHT

Memory Drift Causes Repeated Mistakes

  • Long conversation history caused agents to repeat mistakes like giving discounts unless memory was compressed or trimmed.
  • Reducing context and compressing memory helped stop repeat errors and improved performance.
Get the Snipd Podcast app to discover more snips from this episode
Get the app