2min snip

Ep 43: CEO/Co-Founder of Contextual AI Douwe Kiela Reaction to o1, What’s Next in Reasoning and Innovations in Post-Training

Unsupervised Learning

NOTE

Alignment Maximizes Utility

The advancement of AI is closely linked to the challenge of alignment, which aims to optimize systems for end-user utility. Despite rapid developments in techniques like reinforcement learning from AI feedback (RLHF), fully replacing human roles remains distant. Current methods like reward modeling are essential but costly; they ensure that human preferences are captured not just at a micro (next word) level, but at a macro (full sequence) level. However, the inherent complexity and expense of training effective reward models reveal significant obstacles that must be overcome to enhance AI alignment.

00:00

Transcript

Episode notes

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

2min snip

Ep 43: CEO/Co-Founder of Contextual AI Douwe Kiela Reaction to o1, What’s Next in Reasoning and Innovations in Post-Training

Unsupervised Learning

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights