
E4: Evaluating Large Language Models with Nathan Lambert
Practically Intelligent
00:00
Exploring the Luther's Test Harness and its Applications
This chapter discusses Luther's test harness, an evaluation tool for NLP models that provides a common access to evaluation, addresses issues and differences in results, and can be used for generative tasks and simpler metrics.
Transcript
Play full episode