Kubernetes Podcast from Google

65k nodes on GKE, with Maciej Rozacki and Wojciech Tyczyński

16 snips
Nov 13, 2024
In this engaging discussion, Maciej Rozacki, a Product Manager for AI training at GKE, and Wojciech Tyczyński, a Software Engineer focused on Kubernetes scalability, delve into the monumental support for 65,000 nodes on GKE. They share insights on the innovations that enabled this leap, the complexities of managing large clusters, and how these advancements cater to AI workloads. The duo also emphasizes the importance of open-source contributions and community engagement in shaping the future of Kubernetes scalability.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

GKE's Massive Scale Increase

  • GKE now supports 65,000-node clusters, a significant increase from 15,000.
  • This expansion is driven by the increasing demand for large-scale AI training.
INSIGHT

AI's Impact on Infrastructure

  • AI workloads require tightly coupled computation across numerous machines.
  • Kubernetes facilitates this by enabling colocation and dynamic resource allocation.
ANECDOTE

Spanner-Based Storage

  • GKE replaced etcd with its own Spanner-based storage for improved scalability and flexibility.
  • This multi-tenant solution makes control planes stateless and speeds up operations.
Get the Snipd Podcast app to discover more snips from this episode
Get the app