Latent Space: The AI Engineer Podcast cover image

Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind

Latent Space: The AI Engineer Podcast

00:00

Navigating Outdated Coding Formats and Benchmark Robustness

This chapter explores the challenges of training models to identify obsolete coding formats such as UU encoding, reflecting on personal experiences with performance limitations. Additionally, it discusses the innovative use of adversarial examples in benchmarks to improve model robustness and evaluate overfitting in machine learning.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app