
02: Unleashing LLMs in Production: Challenges and Opportunities with Chip Huyen
Replit AI Podcast
Human Eval: A Benchmark for Open Source
The team at LLM Packers is developing a bespoke Python model. They are benchmarking it against the Codex paper, which will open up EI around 2021. It's called human eval and they're hiring people to work on these problems. "We ran human evil out of curiosity," says Cevallos. 'For us, it was about what are users getting out of it?'
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.