
Compound AI Systems with Philip Kiely - Weaviate Podcast #105!
Weaviate Podcast
Optimizing AI Deployment on Kubernetes
This chapter explores architectural considerations for deploying compound AI systems on Kubernetes, emphasizing effective model coexistence and resource allocation. It discusses the auto-scaling of models based on traffic patterns and contrasts smaller language models with larger, state-of-the-art ones. Additionally, the chapter delves into advancements in memory management and optimization strategies, particularly focusing on VLLM and TensorRT LLM frameworks for improved performance.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.