Measuring Agent Performance Against Expected Outcomes

Spencer describes evaluations focused on closeness to expected answers and heatmap-style failure analysis for agents.

Play episode from 47:30

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Spencer Reagan leads R&D at Airia, working on secure AI-agent orchestration, data governance systems, and real-time signal fusion technologies for regulated and defense environments.

Overcoming Challenges in AI Agent Deployment: The Sweet Spot for Governance and Security // MLOps Podcast #349 with Spencer Reagan, R&D at Airia.

Join the Community:

https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

Shoutout to Airia for powering this MLOps Podcast episode.

// Abstract

Spencer Reagan thinks it might be, and he’s not shy about saying so. In this episode, he and Demetrios Brinkmann get real about the messy, over-engineered state of agent systems, why LLMs still struggle in the wild, and how enterprises keep tripping over their own data chaos. They unpack red-teaming, security headaches, and the uncomfortable truth that most “AI platforms” still don’t scale. If you want a sharp, no-fluff take on where agents are actually headed, this one’s worth a listen.

// Bio

Passionate about technology, software, and building products that improve people's lives.

// Related Links

Website: https://airia.com/

Machine Learning, AI Agents, and Autonomy // Egor Kraev // MLOps Podcast #282 - https://youtu.be/zte3QDbQSek

Re-Platforming Your Tech Stack // Michelle Marie Conway & Andrew Baker // MLOps Podcast #281 - https://youtu.be/1ouSuBETkdA

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Spencer on LinkedIn: /spencerreagan/

Timestamps:

[00:00] AI industry future

[00:55] Use cases in software

[05:44] LLMs for data normalization

[11:02] ROI and overengineering

[15:58] Street width history

[20:58] High ROI examples

[25:16] AI building challenges

[33:37] Budget control challenges

[39:30] Airia Orchestration platform

[46:25] Agent evaluation breakdown

[53:48] Wrap up

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books