Practically Intelligent cover image

E4: Evaluating Large Language Models with Nathan Lambert

Practically Intelligent

00:00

Exploring the Luther's Test Harness and its Applications

This chapter discusses Luther's test harness, an evaluation tool for NLP models that provides a common access to evaluation, addresses issues and differences in results, and can be used for generative tasks and simpler metrics.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app