Kubernetes Bytes cover image

Generative AI on Kubernetes

Kubernetes Bytes

CHAPTER

Optimizing Inference Engines on Kubernetes with TGI X Gen and More

This chapter delves into the advantages of utilizing the TGI X Gen inference engine from hugging face on Kubernetes, highlighting its ease of use and memory management capabilities. It explores various optimized inference engines like TGI, VLLM, and TensorRT LLM for deploying models effectively on Kubernetes clusters. The discussion extends to utilizing tools like Olaama and Ray, building infrastructures for retrieval augmented generation pipelines, and leveraging LLM for contextual question answering, concluding with reflections on the episode and promoting upcoming content by the guest.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner