#16438
Mentioned in 1 episodes

Refusal in Language Models

Research on Refusal Mechanisms in AI
Book • 2024
Recent studies have focused on understanding and controlling refusal behavior in language models.

This includes identifying single directions that mediate refusal across various models and proposing methods to mitigate false refusal.

These efforts aim to improve the safety and reliability of AI systems by refining their ability to refuse harmful or inappropriate requests.

Mentioned by

Mentioned in 1 episodes

Mentioned by Andrey Kurenkov when discussing a paper on refusal in language models.
50 snips
#171 - Apple Intelligence, Dream Machine, SSI Inc

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app