The Data Exchange with Ben Lorica cover image

Evaluating Language Models

The Data Exchange with Ben Lorica

00:00

How to Test a Language Model for General Purpose Tasks

Helm uses API access to test language models. It looks at seven different metrics, including accuracy and calibration. Helm has done some work in the past on adversarial attacks on language models. There's also issues of fairness and bias.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app