Breaking Down EvalGen: Who Validates the Validators?

May 13, 2024

This podcast delves into the complexities of using Large Language Models for evaluation, highlighting the need for human validation in aligning LLM-generated evaluators with user preferences. Topics include developing criteria for acceptable LLM outputs, evaluating email responses, evolving evaluation criteria, template management, LLM validation, and the iterative process of building effective evaluation criteria.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

Evaluating Language Model Outputs Using the E-Valgen Framework

02:05 • 28min

Exploring the Evaluation Process of Data Sets in Email Responses

29:49 • 2min

Exploring the Evolution of Evaluation Criteria

31:41 • 3min

Discussion on Template Management and LMM Validation for Responses

35:09 • 6min

Exploring the Validation Process and Criteria Development for Evaluations

41:00 • 4min