Deploy and fine-tune LLM models on Kubernetes using KAITO

Aug 7, 2024

Sachi Desai, a Product Manager specializing in AI technologies, and Paul Yu, a Senior Cloud Advocate at Microsoft, dive into the KAITO project for deploying open source LLM models on Kubernetes. They discuss how KAITO simplifies running AI applications alongside LLM models and enables users to bring and fine-tune their own models. The conversation highlights innovative techniques like LoRa and Q-LoRa for efficient model training. Additionally, they emphasize community engagement's role in enhancing AI model deployment and future capabilities.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 2min

Kubernetes and AI: Integrating Kaito for Enhanced Workloads

01:56 • 23min

Optimizing LLM Fine-Tuning with Kaito

24:59 • 10min

Engaging Community Feedback for Enhanced AI Model Deployment

35:02 • 2min

Enhancing AI with RAG and Community Contributions

37:05 • 4min

Understanding KAITO and Fine-Tuning Language Models on AKS

41:17 • 3min