Latent Space: The AI Engineer Podcast cover image

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast

NOTE

Improving Model Performance on SWEETBench

Improving the RAG system can greatly enhance model performance on SWEETBench, as existing models like Clod3 Opus currently only resolve 3.8% of issues. The suite bench instances contain the full codebase with around 3000 files, whereas gold patches usually modify one or two files. Utilizing 19,000 unverified instances without test cases for training, the models are tasked with generating patch files specifying file edits. Various models including chat GPT and GPT -4 are evaluated, and a code llama is fine-tuned from long context rag examples to develop SWE llama 7B and 13B, the only open source models with non-zero performance on SWEETBench.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner