Last Week in AI cover image

#199 - OpenAI's 03-mini, Gemini Thinking, Deep Research, s1

Last Week in AI

00:00

Theoretical Framework for Safe Alignment of LLMs

This chapter explores the theoretical framework for safe alignment of large language models during inference, emphasizing a proposed method involving a critic in a constrained decision process. The discussion highlights the challenges in defining effective safety metrics to ensure robust protection against exploitation.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app