
AI isn’t very good at history, new paper finds
TechCrunch Industry News
00:00
Examining the Historical Limitations of AI Models
This chapter explores the limitations of large language models in handling historical inquiries, revealing that even top models like GPT-4 only achieve around 46% accuracy on historical subjects. It introduces the HIST-LLM benchmark and discusses the implications of these findings, while maintaining a hopeful outlook on the future role of LLMs in historical research.
Transcript
Play full episode