Latent Space: The AI Engineer Podcast

Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang)

171 snips
Jan 19, 2025
Join Amir Haghighat, co-founder of Baseten, and Yineng Zhang, lead software engineer at Baseten, as they dive into the groundbreaking DeepSeek v3 model. This model boasts 671 billion parameters and has shaken up LLM inference platforms. They unravel the complexities of deploying massive models, discuss the innovations of SGLang, and delve into the challenges of caching technologies. With insights on optimizing AI workflows and a clear manifesto for crucial applications, this conversation is a must-listen for AI enthusiasts!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
00:00 / 00:00

DeepSeek v3's Popularity and Challenge

  • DeepSeek v3 is the leading open-source LLM, making it highly sought after.
  • Its massive size, however, presents unique serving challenges.
00:00 / 00:00

Serving DeepSeek v3's Massive Model

  • DeepSeek v3's large size requires H200s for serving due to memory needs; even 8x H100s are insufficient.
  • FP8 precision and long loading times pose additional challenges for debugging and performance testing.
00:00 / 00:00

DeepSeek v3 User Motivations

  • Baseten observes that DeepSeek v3 users primarily seek better control, cost, latency, and throughput.
  • Users often migrate from cloud providers due to limitations or the desire for model ownership.
Get the Snipd Podcast app to discover more snips from this episode
Get the app