Latent Space: The AI Engineer Podcast cover image

Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind

Latent Space: The AI Engineer Podcast

CHAPTER

Navigating Outdated Coding Formats and Benchmark Robustness

This chapter explores the challenges of training models to identify obsolete coding formats such as UU encoding, reflecting on personal experiences with performance limitations. Additionally, it discusses the innovative use of adversarial examples in benchmarks to improve model robustness and evaluate overfitting in machine learning.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner