Weaviate Podcast cover image

Compound AI Systems with Philip Kiely - Weaviate Podcast #105!

Weaviate Podcast

00:00

Optimizing AI Deployment on Kubernetes

This chapter explores architectural considerations for deploying compound AI systems on Kubernetes, emphasizing effective model coexistence and resource allocation. It discusses the auto-scaling of models based on traffic patterns and contrasts smaller language models with larger, state-of-the-art ones. Additionally, the chapter delves into advancements in memory management and optimization strategies, particularly focusing on VLLM and TensorRT LLM frameworks for improved performance.

Play episode from 17:25
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app