

Episode 51: Why We Built an MCP Server and What Broke First
4 snips Jun 26, 2025
In this discussion, Philip Carter, Product Management Director at Salesforce and former Principal PM at Honeycomb, shares insights on creating LLM-powered features. He explains the nuances of integrating real production data with these systems. Carter dives into the challenges of tool use, prompt templates, and flaky model behavior. He also discusses the development of the innovative MCP server that enhances observability in AI systems, emphasizing its role in improving user experience and navigating the pitfalls of SaaS product development.
AI Snips
Chapters
Transcript
Episode notes
Spreadsheet-Driven LLM Eval Process
- Philip Carter detailed using spreadsheets to collect and analyze real user inputs and outputs for their LLM feature.
- By iterating judgments between himself and an LLM judge, they created a highly aligned evaluation system for better product performance.
Levers to Align LLM and Fix Errors
- Improve your LLM-driven system by tuning prompts, few-shot examples, and adding deterministic post-processing rules.
- Use observability to spot and fix real user data errors quickly for practical improvements.
MCP Enables Live Data Integration
- MCP bridges general-purpose LLMs with live APIs, unlocking broad workflow orchestration across enterprise tools.
- Real-world data scale and idiosyncrasies challenge LLM reliability and context window limits, requiring ongoing engineering efforts.