VerticalPodAutoscaler Went Rogue: It Took Down Our Cluster, with Thibault Jamet
13 snips Sep 16, 2025
Thibault Jamet, Head of Runtime at Adevinta, shares his expertise running a multi-tenant Kubernetes platform. He dives into a chaotic incident where the Vertical Pod Autoscaler led to critical pod evictions. Thibault discusses the architecture of VPA and the debugging process that revealed hidden limits in Kubernetes. He emphasizes the importance of monitoring webhook latency and pod eviction rates to catch issues early. Listeners gain invaluable lessons on scaling challenges and operational strategies for maintaining high-performance systems.
AI Snips
Chapters
Transcript
Episode notes
Running SHIP At Massive Scale
- SHIP is Adevinta's Kubernetes runtime spanning ~30 clusters across four regions and peaking at 300k RPS.
- Thibault runs the platform that serves marketplaces and manages thousands of pods on ~2,000 nodes.
How VPA's Three Parts Coordinate
- VPA has three cooperating components: recommender, updater, and mutating webhook.
- The webhook mutates pod specs on admission while the recommender suggests resources and the updater evicts mismatched pods.
Prometheus Alert Led To Bigger Discovery
- A Prometheus alert about missing metrics triggered investigation that revealed growing pod evictions.
- The team temporarily stopped VPA recommendations and applied static sizes, which only mitigated symptoms briefly.