High Agency: The Podcast for AI Builders

Evaluating LLMs the Right Way: Lessons from Hex's Journey

28 snips

Jun 11, 2024

Bryan Bischof, AI lead at Hex, shares insights on evaluating LLMs for reliable AI agents. Hex's unique framework enabled them to ship with confidence. Key ingredients to Hex's success include choosing the right tools, using a reactive DAG for course-correction, building user-centric evaluators, gating releases on meaningful metrics, and scrutinizing model inputs & outputs. The podcast discusses data analysis in AI engineering, agent capabilities development, efficiency in using agents at Hex, prompt engineering with GPT-4 Turbo, challenges in evaluating Language Models, and empowering non-technical experts in AI product building.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Constrain Agent Plans

Constrain agent behavior by being prescriptive about the types of plans they generate.
Define specific types of steps the agent can execute, rather than allowing complete freedom.

INSIGHT

Reactive DAG for Agents

Hex's AI agents operate within a Directed Acyclic Graph (DAG), managing dependencies between steps.
This DAG enables reactivity, allowing upstream changes to automatically trigger downstream updates.

ANECDOTE

Iterative Agent Development

Hex built individual capabilities first, allowing users to use them before incorporating them into agents.
This iterative process provided valuable insights into user behavior and context requirements.

Get the Snipd Podcast app to discover more snips from this episode

Get the app