Weaviate Podcast

Compound AI Systems with Philip Kiely - Weaviate Podcast #105!

36 snips
Oct 17, 2024
Philip Kiely, the leading developer relations at Baseten, shares insights on compound AI systems and their evolution. He discusses breaking tasks into multiple stages for better AI model performance. The conversation covers advancements in multimodal AI and strategies for deploying these systems efficiently. Kiely emphasizes the benefits of smaller models and constrained generation techniques, alongside architectural tips for Kubernetes deployment. Key comparisons are made between various model serving frameworks, focusing on optimizing AI performance while minimizing costs.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Two Diverging Model Trends

  • Models trend toward two directions: everything models and specialist models.
  • Both approaches offer complementary tradeoffs in capability, cost, and efficiency.
ADVICE

Enforce Structured Output Server-Side

  • Constrain model outputs at the server with schemas to ensure validity.
  • Use token masking during inference to get deterministic, 100% valid structured outputs.
INSIGHT

One-Time Cost For Constrained Generation

  • Building the state machine for constrained generation costs one-time latency.
  • Cache the generated state machine to restore fast subsequent inferences for the same schema.
Get the Snipd Podcast app to discover more snips from this episode
Get the app