TechCrunch Industry News cover image

AI isn’t very good at history, new paper finds

TechCrunch Industry News

00:00

Examining the Historical Limitations of AI Models

This chapter explores the limitations of large language models in handling historical inquiries, revealing that even top models like GPT-4 only achieve around 46% accuracy on historical subjects. It introduces the HIST-LLM benchmark and discusses the implications of these findings, while maintaining a hopeful outlook on the future role of LLMs in historical research.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app