15min chapter

Kubernetes Bytes cover image

Generative AI on Kubernetes

Kubernetes Bytes

CHAPTER

Optimizing Inference Engines on Kubernetes with TGI X Gen and More

This chapter delves into the advantages of utilizing the TGI X Gen inference engine from hugging face on Kubernetes, highlighting its ease of use and memory management capabilities. It explores various optimized inference engines like TGI, VLLM, and TensorRT LLM for deploying models effectively on Kubernetes clusters. The discussion extends to utilizing tools like Olaama and Ray, building infrastructures for retrieval augmented generation pipelines, and leveraging LLM for contextual question answering, concluding with reflections on the episode and promoting upcoming content by the guest.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode