The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Are Emergent Behaviors in LLMs an Illusion? with Sanmi Koyejo - #671

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Evaluating Trustworthiness in Language Models

This chapter explores a recent research paper assessing trustworthiness in language models, focusing on eight evaluation perspectives like toxicity and fairness. It discusses the development of scalable methods and a GitHub toolbox for model assessment, while addressing the balance between ethical guidelines and instruction-following behavior. The chapter also highlights ongoing challenges in benchmarking these models and the implications for trust and performance evaluation in AI.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app