Advancements in Language Model Training and Evaluation

This chapter explores the release of Tulu 3.405b by AI2, highlighting its enhancements in scalability and reinforcement learning methods. It emphasizes the importance of curated data quality and introduces innovative benchmarks for reasoning abilities in language models. Additionally, the discussion on Zebra Logic and distributed training techniques offers insights into optimizing model performance and addressing challenges in federated learning.

Play episode from 56:50

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Our 199th episode with a summary and discussion of last week's big AI news!
Recorded on 02/09/2025

Join our brand new Discord here! https://discord.gg/nTyezGSKwP

Hosted by Andrey Kurenkov and Jeremie Harris.
Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

In this episode:

- OpenAI's deep research feature capability launched, allowing models to generate detailed reports after prolonged inference periods, competing directly with Google's Gemini 2.0 reasoning models.
- France and UAE jointly announce plans to build a massive AI data center in France, aiming to become a competitive player within the AI infrastructure landscape.
- Mistral introduces a mobile app, broadening its consumer AI lineup amidst market skepticism about its ability to compete against larger firms like OpenAI and Google.
- Anthropic unveils 'Constitutional Classifiers,' a method showing strong defenses against universal jailbreaks; they also launched a $20K challenge to find weaknesses.

Timestamps + Links:

(00:00:00) Intro / Banter
(00:02:27) News Preview
(00:03:28) Response to listener comments
Tools & Apps
- (00:08:01) OpenAI now reveals more of its o3-mini model’s thought process
- (00:16:03) Google’s Gemini app adds access to ‘thinking’ AI models
- (00:21:04) OpenAI Unveils A.I. Tool That Can Do Research Online
- (00:31:09) Mistral releases its AI assistant on iOS and Android
- (00:36:17) AI music startup Riffusion launches its service in public beta
- (00:39:11) Pikadditions by Pika Labs lets users seamlessly insert objects into videos
Applications & Business
Projects & Open Source
Research & Advancements
- (01:10:34) LIMO: Less is More for Reasoning
- (01:16:39) s1: Simple test-time scaling
- (01:19:17) ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
- (01:23:55) Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
Policy & Safety

(01:33:16) Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books