Hallucination Trivia Test for Large Language Models

The project is an attempt to create a common metric to test large language models for progress in eliminating hallucinations. In H3.0 trivia, it went up to 85.7% regarding the GPT 3.5 who got 59.3 and GPT 3 who got 55.6. While the latest llama model from came up to 49.7 and alpaca model came down to 44 seconds.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app