
Latent Space: The AI Engineer Podcast Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith
501 snips
Jan 9, 2026 Join George Cameron, co-founder of Artificial Analysis and benchmarking guru, along with Micah Hill-Smith, who crafted the evaluation methodology and unique benchmarks. They share their journey from a basement project to a vital tool for AI model assessment. Discover why independent evaluations matter, how their 'mystery shopper' strategy keeps benchmarks honest, and the innovative Omniscience index that prioritizes accurate responses. Learn about the evolving AI landscape and their predictions for future developments in benchmarking and transparency.
AI Snips
Chapters
Transcript
Episode notes
Side Project Became Industry Staple
- Artificial Analysis began as a side project while Micah built a legal AI assistant and needed independent benchmarks.
- A Swyx retweet helped the project go viral and grow into a full-time company.
Always Run Your Own Evals
- Run benchmarks yourself and control prompts because labs prompt and cherry-pick differently.
- Standardize evaluation across models to prevent inflated or non-comparable scores.
Reduce Variance With Repeats
- Use repeated runs and calculate confidence intervals to reduce variance on small-sample evals.
- Aim for 95% confidence intervals by dialing repeats before publishing final scores.


