#14928

Mentioned in 3 episodes

Humanity's Last Exam

A Multi-Modal Benchmark at the Frontier of Human Knowledge

Book • 2025

Humanity's Last Exam (HLE) is a benchmarking project aimed at assessing the capabilities of large language models (LLMs) across a wide range of subjects, including mathematics, humanities, and the natural sciences.

Developed by over a thousand experts globally, HLE consists of 3,000 questions that are multiple-choice and short-answer, suitable for automated grading.

Each question has a known, unambiguous solution that cannot be quickly answered via internet retrieval.

The benchmark highlights the significant gap between current LLM capabilities and expert human knowledge, providing a critical tool for research and policymaking in AI development.

Mentioned by

Mentioned in 3 episodes

Mentioned as one of the benchmarks XAI used to test Grok 4's performance.

#216 - Grok 4, Project Rainier, Kimi K2

Mentioned by

Alex Volkov

when discussing AI breakthroughs.

📆 ThursdAI - Feb 6 - OpenAI DeepResearch is your personal PHD scientist, o3-mini & Gemini 2.0, OmniHuman-1 breaks reality & more AI news

Mentioned by

Sarah Guo

as one of

Dan Hendrycks

' publications.

National Security Strategy and AI Evals on the Eve of Superintelligence with Dan Hendrycks

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner