

Breaking Down EvalGen: Who Validates the Validators?
May 13, 2024
This podcast delves into the complexities of using Large Language Models for evaluation, highlighting the need for human validation in aligning LLM-generated evaluators with user preferences. Topics include developing criteria for acceptable LLM outputs, evaluating email responses, evolving evaluation criteria, template management, LLM validation, and the iterative process of building effective evaluation criteria.
Chapters
Transcript
Episode notes
1 2 3 4 5 6
Introduction
00:00 • 2min
Evaluating Language Model Outputs Using the E-Valgen Framework
02:05 • 28min
Exploring the Evaluation Process of Data Sets in Email Responses
29:49 • 2min
Exploring the Evolution of Evaluation Criteria
31:41 • 3min
Discussion on Template Management and LMM Validation for Responses
35:09 • 6min
Exploring the Validation Process and Criteria Development for Evaluations
41:00 • 4min