MLOps.community

Hardening Agents for E-commerce Scale: From RL Alignment to Reliability // Panel 2

36 snips
Dec 2, 2025
In this engaging discussion, expert panelists share insights into the world of e-commerce agents. Arushi Jain, a Microsoft applied scientist, delves into post-training techniques that enhance AI reliability for tasks. Swati Bhatia from Google Cloud talks about using Direct Preference Optimization to fine-tune support routing. Audi Liu from Inworld AI discusses architectural trade-offs in voice models for better accuracy. Isabella Piratininga of iFood highlights personalization challenges in Brazil. Together, they explore the complexities of automating customer interactions and the future of AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

DPO+PEFT Fixed A Broken Support Router

  • Swati Bhatia described an agent that misrouted complex support tickets until a DPO+PEFT adapter fixed compliance rapidly.
  • The team raised policy adherence from 45% to 90% within 48 hours using human-reviewed training data.
INSIGHT

Post-Training Replaced Prompt Engineering

  • Arushi Jain said post-training has replaced prompt engineering for many Copilot features at Microsoft.
  • She credits grounded post-training on Microsoft data for reducing variance and hallucinations in production.
ANECDOTE

Cascading Beat End-To-End Speech Models

  • Audi Liu shared a production improvement when switching from end-to-end speech models to a cascaded STT+LLM+TTS architecture.
  • The cascaded setup increased tool-calling accuracy and customer retention despite ~200ms extra latency.
Get the Snipd Podcast app to discover more snips from this episode
Get the app