Latent Space: The AI Engineer Podcast cover image

Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind

Latent Space: The AI Engineer Podcast

NOTE

Trust in Verification: Distinguishing Contribution from Automation

Emphasizing the importance of verifiability in conversations about model capabilities, one can recognize the common disparity between claims of model performance and actual contributions. Many assert that a model accomplished a complex task while their own input constituted the majority of the effort. Thus, a clear distinction is needed to validate what a model genuinely achieves versus the human contribution involved. Additionally, the ability to handle mundane, familiar aspects of tasks is invaluable, countering the narrative that models only replicate prior human achievements. In reality, much of research involves executing well-established processes, affirming that while novel insights arise, the majority of work often relies on the repetitive tasks that models can effectively automate.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner