Running data-driven evaluations of AI engineering tools

6 snips

Dec 12, 2025

Abi Noda, CEO of DX and a developer productivity expert, joins Laura Tacho to discuss the rapidly evolving landscape of AI engineering tools. They delve into the importance of data-driven evaluations, outlining practical methods for shortlisting tools and structuring trials that reflect real development workflows. Noda emphasizes the need for clear goals and representative cohorts to measure effectiveness. The conversation highlights essential frameworks, like the AI Measurement Framework, to ensure impactful tool adoption and avoid costly missteps.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Begin With A Clear Research Question

Start evaluations with a clear research question tied to a business outcome, not just curiosity about a new tool.
Work backward from that goal to design metrics, cohorts, and trial scope for reliable results.

ADVICE

Keep Shortlists Small And Scalable

Shortlist a small set of tools (commonly 2–3 plus your incumbent) to keep trials manageable.
Scale the number of simultaneous tools to your developer population and experiment capacity.

ADVICE

Group Tools By Use Case And Mode

Group tools by use case and interaction mode and run separate evaluations per category.
Avoid comparing agentic IDEs directly against chat-only assistants because they serve different needs.

Get the Snipd Podcast app to discover more snips from this episode

Get the app