Advancements in Reasoning Models and Reinforcement Learning

This chapter explores the latest developments in reasoning models, with a focus on Prime Online Reinforcement Learning using process rewards. It discusses the challenges faced by models like O1 and O3 in training due to limited data, and presents Euros27B Prime as a breakthrough in online reinforcement learning. Additionally, the chapter delves into issues of representation shifts in language models and the innovative use of metagenomic approaches for pathogen detection.

Play episode from 50:15

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Our 196th episode with a summary and discussion of last week's* big AI news!
*and sometimes last last week's
Recorded on 01/10/2024

Join our brand new Discord here! https://discord.gg/nTyezGSKwP

Hosted by Andrey Kurenkov and Jeremie Harris.
Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

Sponsors:

The Generator - An interdisciplinary AI lab empowering innovators from all fields to bring visionary ideas to life by harnessing the capabilities of artificial intelligence.

In this episode:

- Nvidia announced a $3,000 personal AI supercomputer called Digits, featuring the GB10 Grace Blackwell Superchip, aiming to lower the barrier for developers working on large models.
- The U.S. Department of Justice finalizes a rule restricting the transmission of specific data types to countries of concern, including China and Russia, under executive order 14117.
- Meta allegedly trained Llama on pirated content from LibGen, with internal concerns about the legality confirmed through court filings.
- Microsoft paused construction on a section of a large data center project in Wisconsin to reassess based on new technological changes.

If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form.

Timestamps + Links:

(00:00:00) Intro / Banter
(00:04:52) Sponsor Break
Tools & Apps
- (00:05:55) Nvidia announces $3,000 personal AI supercomputer called Digits
- (00:10:23) Meta removes AI character accounts after users criticize them as ‘creepy and unnecessary’
Applications & Business
Projects & Open Source
- (00:41:59) Cosmos World Foundation Model Platform for Physical AI
- (00:48:21) Microsoft releases Phi-4 language model on Hugging Face
Research & Advancements
Policy & Safety

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books