

Complete Beginner's Course on AI Evaluations: Step by Step (2025) | Aman Khan
50 snips Aug 24, 2025
Aman Khan, Head of Product at Arise, specializes in AI evaluations and large language models. He shares insights on creating AI evaluations, including four essential types every product manager should know. The podcast features a live demo of building evals for a customer support agent, discussing the importance of a golden dataset and aligning AI with human judgment. Aman emphasizes effective prompt crafting and the iterative process critical for enhancing AI performance in real-world applications, particularly in customer service scenarios.
AI Snips
Chapters
Transcript
Episode notes
Evals Are Non-Negotiable
- LLMs hallucinate, so evals are essential to avoid product and brand harm.
- Companies selling models also recommend running evals before deployment.
Four Core Eval Types
- There are four core eval types: code-based, human, LLM-as-judge, and user metrics.
- Each serves different roles from deterministic checks to business impact measurement.
Start With A Structured Prompt
- Start your agent with a clear prompt and include input variables like user question, product info, and policy.
- Use tools (e.g., Anthropic Workbench) to generate and iterate prompt templates quickly.