The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Are Emergent Behaviors in LLMs an Illusion? with Sanmi Koyejo - #671

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CHAPTER

Evaluating Trustworthiness in Language Models

This chapter explores a recent research paper assessing trustworthiness in language models, focusing on eight evaluation perspectives like toxicity and fairness. It discusses the development of scalable methods and a GitHub toolbox for model assessment, while addressing the balance between ethical guidelines and instruction-following behavior. The chapter also highlights ongoing challenges in benchmarking these models and the implications for trust and performance evaluation in AI.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner