Lenny's Podcast: Product | Career | Growth cover image

The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Lenny's Podcast: Product | Career | Growth

00:00

Benchmarks can mislead model progress

Edwin explains benchmark flaws, how labs game leaderboards, and why benchmarks diverge from real-world performance.

Play episode from 16:30
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app