In this podcast, Manjot Pahwa, Rahul Parundekar, and Patrick Barker discuss the integration of Kubernetes and large language models (LLMs), the challenges of using Kubernetes for data scientists, and the considerations for hosting LMM applications in production. They also explore the abstraction of LLMs on Kubernetes, the cost considerations, and the pros and cons of using Kubernetes for LLM training versus inferencing. Additionally, they touch on using Kubernetes for real-time online inferences and the availability of abstractions like Metaplow.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Kubernetes offers scalability and reliability for ML workloads, but challenges arise with large language models due to size and startup times.
Platforms that automate LLM deployment on Kubernetes can simplify the process for data scientists, but robust platforms are recommended for complex deployments and iterative improvements.
Deep dives
Kubernetes as a reliable and scalable workload orchestrator
Kubernetes was designed to provide scalability and reliability for workloads, offering an abstraction layer and orchestration capabilities. While originally focused on traditional workloads, Kubernetes has been adapted for machine learning (ML) workloads. The platform abstracts away hardware, improving efficiency and scalability. However, challenges exist when dealing with large language models (LLMs) due to their size, such as longer startup times and potential issues with node provisioning and networking. Service-oriented architectures and inferencing workloads can also benefit from Kubernetes' scalability and composability.
Simplifying LLM deployment with Kubernetes
To simplify LLM deployment for data scientists and improve velocity, there are platforms that provide one-line commands or decorators to containerize and deploy models on Kubernetes. These platforms automate the orchestration process, making it easier for data scientists to focus on their code and iterate on the models. However, for more complex and repeatable deployments, it is recommended to rely on a robust platform built on Kubernetes that abstracts away complexities and supports iterative improvements of LLM applications.
Business opportunities for Kubernetes and LLM applications
The current landscape of deploying LLMs on Kubernetes is evolving rapidly, with new products and solutions emerging. The market presents opportunities for businesses to address challenges in LLM infrastructures, such as managing GPU access, reducing costs, and improving reliability and latency. While Kubernetes provides a strong foundation, there is a need to abstract complexities and create thought-leading platforms that address the problems faced by organizations and enterprises in deploying and managing LLM applications.
Trade-offs in using Kubernetes for LLM training and inferencing
Kubernetes offers advantages and trade-offs for LLM training and inferencing. For training, Kubernetes helps with scalability, reliability, and orchestration, making it valuable for handling batch workloads. It enables the composability and extensibility of libraries and tools needed for LLM training. However, challenges arise in managing large container sizes, startup times, GPU shortages, and hardware abstractions. For inferencing, Kubernetes provides scalability for service-oriented architectures but requires further development to address challenges related to hardware abstractions and real-time online inference. The choice to use Kubernetes depends on factors such as organization size, capital availability, and the specific requirements of the LLM application.
MLOps Coffee Sessions #178 with LLMs in Production Conference part 2 LLM on K8s Panel, Manjot Pahwa, Rahul Parundekar, and
Patrick Barker hosted by Outerbounds, Inc.'s Shrinand Javadekar.
// Abstract
Large Language Models require a new set of tools... or do they? K8s is a beast and we like it that way. How can we best leverage all the battle-hardened tech that K8s has to offer to make sure that our LLMs go brrrrrrr. Let's talk about it in this chat.
// Bio
Shrinand Javadekar
Shri Javadekar is currently an engineer at Outerbounds, focussed on building a fully managed, large-scale platform for running data-intensive ML/AI workloads. Earlier, he spent time trying to start an MLOps company for which he was a co-founder and head of engineering. He led the design, development, and operations of Kubernetes-based infrastructure at Intuit, running thousands of applications, built by hundreds of teams and transacting billions of $$. He has been a founding engineer of the Argo open-source project and also spent precious time at multiple startups that were acquired by large organizations like EMC/Dell and VMWare.
Manjot Pahwa
Manjot is an investor at Lightspeed India and focuses on SaaS and enterprise tech. She has had an operating career of over a decade within the space of fintech, SaaS, and developer tools spanning various geos such as the US, Singapore, and India.
Before joining Lightspeed, Manjot headed Stripe in India, successfully obtaining the payment aggregator license, growing the team from ~10 to 100+, and driving acquisitions in the region during that time.
Rahul Parundekar
Rahul has 13+ years of experience building AI solutions and leading teams. He is passionate about building Artificial Intelligence (A.I.) solutions for improving the Human Experience. He is currently the founder of A.I. Hero - a platform to help you fix and enrich your data with ML. At AI Hero, he has also been a big proponent of declarative MLOps - using Kubernetes to operationalize the training and serving lifecycle of ML models and has published several tutorials on his Medium blog.
Before AI Hero, he was the Director of Data Science (ML Engineering) at Figure-Eight (acquired by Appen), a data annotation company, where he built out a data pipeline and ML model serving architecture serving 36 models (NLP, Computer Vision, Audio, etc.) and traffic of up to 1M predictions per day.
Patrick Barker
Patrick started his career in Big Data back when that was cool, then moved into Kubernetes near its inception. He has put major features into the Kubernetes API and built several platforms on top of it.
In recent years he has moved into AI, with a focus on distributed machine learning. He is now working with a startup to reshape the world of AI agents.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
Website: https://www.angellist.com/venture/relay
Foundation by Isaac Asimov: https://www.amazon.com/Foundation-Isaac-Asimov/dp/0553293354
AngelList Relay blog: https://www.angellist.com/blog/introducing-angellist-relay
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Shri on LinkedIn: https://www.linkedin.com/in/shrijavadekar/
Connect with Manjot on LinkedIn: https://www.linkedin.com/in/manjotpahwa/
Connect with Rahul on LinkedIn: https://www.linkedin.com/in/rparundekar/
Connect with Patrick on LinkedIn: https://www.linkedin.com/in/patrickbarkerco/
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode