Latent Space: The AI Engineer Podcast

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

501 snips
Jan 9, 2026
Join George Cameron, co-founder of Artificial Analysis and benchmarking guru, along with Micah Hill-Smith, who crafted the evaluation methodology and unique benchmarks. They share their journey from a basement project to a vital tool for AI model assessment. Discover why independent evaluations matter, how their 'mystery shopper' strategy keeps benchmarks honest, and the innovative Omniscience index that prioritizes accurate responses. Learn about the evolving AI landscape and their predictions for future developments in benchmarking and transparency.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Side Project Became Industry Staple

  • Artificial Analysis began as a side project while Micah built a legal AI assistant and needed independent benchmarks.
  • A Swyx retweet helped the project go viral and grow into a full-time company.
ADVICE

Always Run Your Own Evals

  • Run benchmarks yourself and control prompts because labs prompt and cherry-pick differently.
  • Standardize evaluation across models to prevent inflated or non-comparable scores.
ADVICE

Reduce Variance With Repeats

  • Use repeated runs and calculate confidence intervals to reduce variance on small-sample evals.
  • Aim for 95% confidence intervals by dialing repeats before publishing final scores.
Get the Snipd Podcast app to discover more snips from this episode
Get the app