Kubernetes Podcast from Google cover image

LLM-D, with Clayton Coleman and Rob Shaw

Kubernetes Podcast from Google

00:00

Optimizing LLM Deployment on Kubernetes

This chapter examines the nuances of running large language models (LLMs) in Kubernetes, focusing on the unique challenges they present compared to traditional applications. The discussion covers advancements in model architectures, deployment patterns, and the importance of collaboration within the open-source community to streamline resource utilization and deployment efficiency. Additionally, the speakers highlight the significance of performance optimizations and the integration of innovative solutions like the Inference Gateway and VLLM to enhance operational capabilities.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app