KubeFM
KubeFM
Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.
Episodes
Mentioned books
Sep 30, 2025 • 48min
Scaling CI horizontally with Buildkite, Kubernetes, and multiple pipelines, with Ben Poland
Ben Poland walks through Faire's complete CI transformation, from a single Jenkins instance struggling with thousands of lines of Groovy to a distributed Buildkite system running across multiple Kubernetes clusters.He details the technical challenges of running CI workloads at scale, including API rate limiting, etcd pressure points, and the trade-offs of splitting monolithic pipelines into service-scoped ones.You will learn:How to architect CI systems that match team ownership and eliminate shared failure points across servicesKubernetes scaling patterns for CI workloads, including multi-cluster strategies, predictive node provisioning, and handling API throttlingPerformance optimization techniques like Git mirroring, node-level caching, and spot instance management for variable CI demandsMigration strategies and lessons learned from moving away from monolithic CI, including proof-of-concept approaches and avoiding the sunk cost fallacySponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/klBmzMY5-Interested in sponsoring an episode? Learn more.
Sep 23, 2025 • 53min
Not Every Problem Needs Kubernetes, with Danyl Novhorodov
Danyl Novhorodov, a veteran .NET engineer and architect at Eneco, presents his controversial thesis that 90% of teams don't actually need Kubernetes. He walks through practical decision-making frameworks, explores powerful alternatives like BEAM runtimes and Actor models, and explains why starting with modular monoliths often beats premature microservices adoption.You will learn:The COST decision framework - How to evaluate infrastructure choices based on Complexity, Ownership, Skills, and Time rather than industry hypePlatform engineering vs. managed services - How to honestly assess whether your team can compete with AWS, Azure, and Google's managed container platformsEvolutionary architecture approach - Why modular monoliths with clear boundaries often provide better foundations than distributed systems from day oneSponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/BYhFw8RwWInterested in sponsoring an episode? Learn more.
13 snips
Sep 16, 2025 • 38min
VerticalPodAutoscaler Went Rogue: It Took Down Our Cluster, with Thibault Jamet
Thibault Jamet, Head of Runtime at Adevinta, shares his expertise running a multi-tenant Kubernetes platform. He dives into a chaotic incident where the Vertical Pod Autoscaler led to critical pod evictions. Thibault discusses the architecture of VPA and the debugging process that revealed hidden limits in Kubernetes. He emphasizes the importance of monitoring webhook latency and pod eviction rates to catch issues early. Listeners gain invaluable lessons on scaling challenges and operational strategies for maintaining high-performance systems.
Sep 15, 2025 • 22min
The Making of Flux: The Origin, a KubeFM Original Series
This episode unpacks the technical and governance milestones that secured Flux's place in the cloud-native ecosystem, from a 45-minute production outage that led to the birth of GitOps to the CNCF process that defines project maturity and the handover of stewardship after Weaveworks' closure.You will learn:How a single incident pushed Weaveworks to adopt Git as the source of truth, creating the foundation of GitOps.How Flux sustained continuity after Weaveworks shut down through community governance.Where Flux is heading next with security guidance, Flux v2, and an enterprise-ready roadmap.SponsorJoin the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register hereMore infoFind all the links and info for this episode here: https://ku.bz/5Sf5wpd8yInterested in sponsoring an episode? Learn more.
Sep 9, 2025 • 26min
Predictive vs Reactive: A Journey to Smarter Kubernetes Scaling, with Jorrick Stempher
Jorrick Stempher, a junior software engineer and student at Windersheim, discusses his team's innovative predictive scaling system for Kubernetes clusters, leveraging machine learning. They utilize the Prophet model to forecast load patterns, enabling preemptive scaling decisions that improve response times dramatically. Stempher dives into the Node Ranking Index for efficient resource management and shares insights on real-world challenges like data validation and load testing. The conversation highlights practical approaches to optimize Kubernetes scalability in dynamic environments.
Sep 2, 2025 • 35min
Solving Cold Starts: Uses Istio to Warm Up Java Pods, with Frédéric Gaudet
If you're running Java applications in Kubernetes, you've likely experienced the pain of slow pod startups affecting user experience during deployments and scaling events.Frédéric Gaudet, Senior SRE at BlaBlaCar, shares how his team solved the cold start problem for their 1,500 Java microservices using Istio's warm-up capabilities.You will learn:Why Java applications struggle with cold starts and how JIT compilation affects initial request latency in Kubernetes environmentsHow Istio's warm-up feature works to gradually ramp up traffic to new podsWhy other common solutions fail, including resource over-provisioning, init containers, and tools like GraalVMReal production impact from implementing this solution, including dramatic improvements in message moderation SLOs at BlaBlaCar's scale of 4,000 podsSponsorThis episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/grxcypt9jInterested in sponsoring an episode? Learn more.
Aug 26, 2025 • 28min
Teaching Kubernetes to Scale with a MacBook Screen Lock, with Brian Donelan
Brian Donelan, VP of Cloud Platform Engineering at JPMorgan Chase, shares his innovative side project that automates Kubernetes workload scaling based on MacBook screen lock status. He connects macOS notifications to CloudWatch, achieving impressive 80% cost savings by scaling resources to zero when idle. The discussion highlights KEDA's unique event-driven scaling capabilities, creative metrics for different industries, and strategies for optimizing cloud resource usage, making workload management more efficient and sustainable.
Aug 19, 2025 • 41min
Building a Carbon and Price-Aware Kubernetes Scheduler, with Dave Masselink
Data centers consume over 4% of global electricity and this number is projected to triple in the next few years due to AI workloads.Dave Masselink, founder of Compute Gardener, discusses how he built a Kubernetes scheduler that makes scheduling decisions based on real-time carbon intensity data from power grids.You will learn:How carbon-aware scheduling works - Using real-time grid data to shift workloads to periods when electricity generation has lower carbon intensity, without changing energy consumptionTechnical implementation details - Building custom Kubernetes schedulers using the scheduler plugin framework, including pre-filter and filter stages for carbon and time-of-use pricing optimizationEnergy measurement strategies - Approaches for tracking power consumption across CPUs, memory, and GPUsSponsorThis episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/zk2xM1lfWInterested in sponsoring an episode? Learn more.
Aug 12, 2025 • 33min
How Policies Saved us a Thousand Headaches, with Alessandro Pomponio
Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.You will learn:How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environmentsPractical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraintsFair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requestsOrganizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research usersSponsorThis episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/5sK7BFZ-8Interested in sponsoring an episode? Learn more.
Jun 24, 2025 • 20min
Dear friend, you have built a Kubernetes, with Mac Chaffee
Mac Chaffee, a platform engineer and security champion, dives into the underestimated complexities of running modern applications. He discusses how overconfidence can lead to costly mistakes, particularly when teams reject proven tools like Kubernetes. Mac highlights the tipping point where DIY solutions become burdensome and stresses the importance of mentorship in preventing poor technical decisions. He advocates for transparency in technology, urging teams to establish effective guardrails rather than hiding complexity.