

Compound AI Systems with Philip Kiely - Weaviate Podcast #105!
36 snips Oct 17, 2024
Philip Kiely, the leading developer relations at Baseten, shares insights on compound AI systems and their evolution. He discusses breaking tasks into multiple stages for better AI model performance. The conversation covers advancements in multimodal AI and strategies for deploying these systems efficiently. Kiely emphasizes the benefits of smaller models and constrained generation techniques, alongside architectural tips for Kubernetes deployment. Key comparisons are made between various model serving frameworks, focusing on optimizing AI performance while minimizing costs.
AI Snips
Chapters
Transcript
Episode notes
Two Diverging Model Trends
- Models trend toward two directions: everything models and specialist models.
- Both approaches offer complementary tradeoffs in capability, cost, and efficiency.
Enforce Structured Output Server-Side
- Constrain model outputs at the server with schemas to ensure validity.
- Use token masking during inference to get deterministic, 100% valid structured outputs.
One-Time Cost For Constrained Generation
- Building the state machine for constrained generation costs one-time latency.
- Cache the generated state machine to restore fast subsequent inferences for the same schema.