
Are Emergent Behaviors in LLMs an Illusion? with Sanmi Koyejo - #671
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Evaluating Trustworthiness in Language Models
This chapter explores a recent research paper assessing trustworthiness in language models, focusing on eight evaluation perspectives like toxicity and fairness. It discusses the development of scalable methods and a GitHub toolbox for model assessment, while addressing the balance between ethical guidelines and instruction-following behavior. The chapter also highlights ongoing challenges in benchmarking these models and the implications for trust and performance evaluation in AI.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.