Replit AI Podcast cover image

02: Unleashing LLMs in Production: Challenges and Opportunities with Chip Huyen

Replit AI Podcast

CHAPTER

Human Eval: A Benchmark for Open Source

The team at LLM Packers is developing a bespoke Python model. They are benchmarking it against the Codex paper, which will open up EI around 2021. It's called human eval and they're hiring people to work on these problems. "We ran human evil out of curiosity," says Cevallos. 'For us, it was about what are users getting out of it?'

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner