Human Eval: A Benchmark for Open Source

The team at LLM Packers is developing a bespoke Python model. They are benchmarking it against the Codex paper, which will open up EI around 2021. It's called human eval and they're hiring people to work on these problems. "We ran human evil out of curiosity," says Cevallos. 'For us, it was about what are users getting out of it?'

Play episode from 45:52

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app