VerticalPodAutoscaler Went Rogue: It Took Down Our Cluster, with Thibault Jamet

13 snips

Sep 16, 2025

Thibault Jamet, Head of Runtime at Adevinta, shares his expertise running a multi-tenant Kubernetes platform. He dives into a chaotic incident where the Vertical Pod Autoscaler led to critical pod evictions. Thibault discusses the architecture of VPA and the debugging process that revealed hidden limits in Kubernetes. He emphasizes the importance of monitoring webhook latency and pod eviction rates to catch issues early. Listeners gain invaluable lessons on scaling challenges and operational strategies for maintaining high-performance systems.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Running SHIP At Massive Scale

SHIP is Adevinta's Kubernetes runtime spanning ~30 clusters across four regions and peaking at 300k RPS.
Thibault runs the platform that serves marketplaces and manages thousands of pods on ~2,000 nodes.

INSIGHT

How VPA's Three Parts Coordinate

VPA has three cooperating components: recommender, updater, and mutating webhook.
The webhook mutates pod specs on admission while the recommender suggests resources and the updater evicts mismatched pods.

ANECDOTE

Prometheus Alert Led To Bigger Discovery

A Prometheus alert about missing metrics triggered investigation that revealed growing pod evictions.
The team temporarily stopped VPA recommendations and applied static sizes, which only mitigated symptoms briefly.

Get the Snipd Podcast app to discover more snips from this episode

Get the app