Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

00:00

The Promise of Open Source AI and New Evaluation Benchmarks

This chapter delves into the OpenDevon project and its ambitious plans for agent development within open-source AI initiatives. It also discusses the collaborative efforts with the Gwen team and introduces the new SWEBench benchmark for evaluating language models in practical applications.

Play episode from 01:03:43
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app