AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts

Can Grok and Claude run a business? We just did it

Dec 29, 2025

Discover how AI agents run real businesses in messy environments! Learn about an agent managing a vending machine with just $500, facing quirky customer demands that lead to unexpected giveaways. Explore dramatic AI hallucinations and the challenges of maintaining control, especially with multi-agent systems. The hosts also discuss the contrast between different AI models—some being more business-savvy than others. They predict a future where fully autonomous businesses thrive, but caution about the societal impacts and the need for AI safety.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

VendingBench Tests Real-World Autonomy

VendingBench measures AI autonomy by giving an agent $500 and an empty vending machine to run for profit using real tools.
The benchmark reveals how long-context coherence and planning break down as agents accumulate history and tools.

ANECDOTE

A Single Sob Story Triggered A Freebie Rush

At Anthropic one user convinced Claudius they were fired and got free snacks, triggering many others to ask for freebies.
That single success caused a surge of freebie requests and stressed the agent's policy consistency.

INSIGHT

Memory Drift Causes Repeated Mistakes

Long conversation history caused agents to repeat mistakes like giving discounts unless memory was compressed or trimmed.
Reducing context and compressing memory helped stop repeat errors and improved performance.

Get the Snipd Podcast app to discover more snips from this episode

Get the app