The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Responsible AI in the Generative Era with Michael Kearns - #662

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CHAPTER

Navigating Performance Trade-offs in AI Alignment

This chapter explores the complexities of Reinforcement Learning from Human Feedback (RLHF) in language models, emphasizing the performance trade-offs associated with safety measures like toxicity suppression. It provides examples of how alignment principles can affect functionalities, particularly in targeted advertising, illustrating the delicate balance between operational efficiency and ethical considerations.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner