AI build vs buy: How do you choose between custom tools vs vendors? | Diamond Bishop

10 snips

May 29, 2025

Diamond Bishop, Director of Engineering and AI at Datadog, discusses innovative approaches in building AI agents for production incident management. He emphasizes the transition from simple workflow automation to defining AI agents with real decision-making autonomy. The conversation highlights the critical need for trust and reliability in enterprise AI through root cause identification, enabling proactive solutions before engineers are needed. Additionally, Diamond explores the significance of adopting standards like Anthropic's MCP for seamless tool integration across diverse environments.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Defining True AI Agents

AI agents are systems with autonomy over control flow, not just fixed workflows or simple chatbots.
True agents observe, act, and decide dynamically, such as skipping steps or gathering more data.

ANECDOTE

AI Prevents Midnight Engineer Alerts

Datadog's Bits AI agent analyzes logs and runbooks to diagnose issues before engineers wake up.
It can identify root causes like faulty deployments or dependent service failures, saving time during outages.

ADVICE

Build Trust Via Precise Evaluations

Build trust in AI agents by establishing precise, scenario-specific evaluation metrics.
Share clear precision and recall statistics to show when and how the agent performs well.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

What happens when you build AI agents trusted enough to handle production incidents while engineers sleep? At Datadog, it sparked a fundamental rethink of how enterprise AI systems earn developer trust in critical infrastructure environments.

Diamond Bishop, Director of Eng/AI, outlines for Ravin how their Bits AI initiative evolved from basic log analysis to sophisticated incident response agents. By focusing first on root cause identification rather than full automation, they're delivering immediate value while building the confidence needed for deeper integration.

But that's just one part of Datadog's systematic approach. From adopting Anthropic's MCP standard for tool interoperability to implementing multi-modal foundation model strategies, they're creating AI systems that can evolve with rapidly improving underlying technologies while maintaining enterprise reliability standards.

Topics discussed:

Defining AI agents as systems with control flow autonomy rather than simple workflow automation or chatbot interfaces.
Building enterprise trust in AI agents through precision-focused evaluation systems that measure performance across specific incident scenarios.
Implementing root cause identification agents that diagnose production issues before engineers wake up during critical outages.
Adopting Anthropic's MCP standard for tool interoperability to enable seamless integration across different agent platforms and environments.
Using LLM-as-judge evaluation methods combined with human alignment scoring to continuously improve agent reliability and performance.
Managing multi-modal foundation model strategies that allow switching between OpenAI, Anthropic, and open-source models based on tasks.
Balancing organizational AI adoption through decentralized experimentation with centralized procurement standards and security compliance oversight.
Developing LLM observability products that cluster errors and provide visibility into token usage and model performance.
Navigating the bitter lesson principle by building evaluation frameworks that can quickly test new foundation models.
Predicting timeline and bottlenecks for AGI development based on current reasoning limitations and architectural research needs.