
Autonomous Agents Beyond the Hype
The MLSecOps Podcast
00:00
Benchmark Limits and Real-World Performance
Gavin critiques benchmarks, training data leakage, and why passing tests doesn't guarantee safe real-world use.
Play episode from 13:59
Transcript


