Challenges of GPU Inferencing in Generative AI

This chapter explores the complexities and costs of GPU inferencing in generative AI models, emphasizing the necessity for mental preparation and infrastructure investment. It also discusses the integration of large language models with recommendation systems, their optimization for latency, and the shift from bespoke models to centralized approaches for improved efficiency.

Play episode from 10:18

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Building Trust Through Technology: Responsible AI in Practice // MLOps Podcast #299 with Animesh Singh, Executive Director, AI Platform and Infrastructure of LinkedIn.

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

Animesh discusses LLMs at scale, GPU infrastructure, and optimization strategies. He highlights LinkedIn's use of LLMs for features like profile summarization and hiring assistants, the rising cost of GPUs, and the trade-offs in model deployment. Animesh also touches on real-time training, inference efficiency, and balancing infrastructure costs with AI advancements. The conversation explores the evolving AI landscape, compliance challenges, and simplifying architecture to enhance scalability and talent acquisition.

// Bio

Executive Director, AI and ML Platform at LinkedIn | Ex IBM Senior Director and Distinguished Engineer, Watson AI and Data | Founder at Kubeflow | Ex LFAI Trusted AI NA Chair

Animesh is the Executive Director leading the next-generation AI and ML Platform at LinkedIn, enabling the creation of the AI Foundation Models Platform, serving the needs of 930+ million members of LinkedIn. Building Distributed Training Platforms, Machine Learning Pipelines, Feature Pipelines, Metadata engines, etc. Leading the creation of the LinkedIn GAI platform for fine-tuning, experimentation, and inference needs. Animesh has more than 20 patents and 50+ publications.

Past IBM Watson AI and Data Open Tech CTO, Senior Director, and Distinguished Engineer, with 20+ years of experience in the Software industry, and 15+ years in AI, Data, and Cloud Platform. Led globally dispersed teams, managed globally distributed projects, and served as a trusted adviser to Fortune 500 firms. Played a leadership role in creating, designing, and implementing Data and AI engines for AI and ML platforms, led Trusted AI efforts, and drove the strategy and execution for Kubeflow, OpenDataHub, and execution in products like Watson OpenScale and Watson Machine Learning.

// Related Links

Composable Memory for GPU Optimization // Bernie Wu // Pod #270 - https://youtu.be/ccaDEFoKwko

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Animesh on LinkedIn: /animeshsingh1

Timestamps:

[00:00] Animesh's preferred coffee

[00:16] Takeaways