False positives: benchmarking pitfalls

Greg warns against RL-environment overfitting and vanity metrics, urging work on true generalization rather than benchmark hacking.

Play episode from 04:50

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!