Dynamic Benchmarks and Models in the Loop of Benchmark Creation

The idea is humans and models in the loop I think if you're familiar with thinking about adversarial processes people would often have some sort of algorithm or model try to adversarial find mistakes that models make. What we're doing here is slightly different because the adversary is not an algorithm or model it's actually an under human so a human is talking to a model and their job is essentially to find things that the model cannot yet do. If we keep doing this over time the idea is that we again we get good metrics so we see how well these models doWe also collect a lot of data that can then be used for training so that we get even better models which we can then

Play episode from 02:09

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app