LessWrong (Curated & Popular) cover image

“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans

LessWrong (Curated & Popular)

00:00

Exploring Behavioral Self-Awareness in Language Models

This chapter delves into how large language models can express their implicit behaviors without explicit examples. It emphasizes the implications of such self-awareness for AI safety, particularly in recognizing risky behaviors and effectively addressing potential vulnerabilities.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app