Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

00:00

The Promise of Open Source AI and New Evaluation Benchmarks

This chapter delves into the OpenDevon project and its ambitious plans for agent development within open-source AI initiatives. It also discusses the collaborative efforts with the Gwen team and introduces the new SWEBench benchmark for evaluating language models in practical applications.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app