Building the AI Hyperscaler with Kubernetes

Jun 28, 2024

Brandon Jacobs, Infrastructure architect at Coreweave, discusses how Coreweave uses Kubernetes to build an AI hyperscaler. They cover managing Day 0 & 2 operations for AI labs, lessons learned, and best practices for a Kubernetes based cloud. Topics include leveraging bare metal Kubernetes for GPU workloads, storage options for AI labs, observability, monitoring, handling CVEs, and customer cluster support.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 6min

Implementing Kubernetes at Core Weave Cloud

05:35 • 9min

Leveraging Bare Metal Kubernetes for GPU Workloads

14:29 • 23min

Storage Options for AI Labs and Kubernetes Challenges

37:23 • 6min

Importance of Observability, Monitoring, and Handling CVEs in a Kubernetes Environment

43:49 • 3min

Customer Cluster Support and Expansion

46:26 • 4min

Exploring coreWeave's Kubernetes Journey and Future Plans

50:35 • 4min