Compound AI Systems with Philip Kiely - Weaviate Podcast #105!

36 snips

Oct 17, 2024

Philip Kiely, the leading developer relations at Baseten, shares insights on compound AI systems and their evolution. He discusses breaking tasks into multiple stages for better AI model performance. The conversation covers advancements in multimodal AI and strategies for deploying these systems efficiently. Kiely emphasizes the benefits of smaller models and constrained generation techniques, alongside architectural tips for Kubernetes deployment. Key comparisons are made between various model serving frameworks, focusing on optimizing AI performance while minimizing costs.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Two Diverging Model Trends

Models trend toward two directions: everything models and specialist models.
Both approaches offer complementary tradeoffs in capability, cost, and efficiency.

ADVICE

Enforce Structured Output Server-Side

Constrain model outputs at the server with schemas to ensure validity.
Use token masking during inference to get deterministic, 100% valid structured outputs.

INSIGHT

One-Time Cost For Constrained Generation

Building the state machine for constrained generation costs one-time latency.
Cache the generated state machine to restore fast subsequent inferences for the same schema.

Get the Snipd Podcast app to discover more snips from this episode

Get the app