KubeFM
Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.
Latest episodes
Jun 24, 2025 • 20min
Dear friend, you have built a Kubernetes, with Mac Chaffee
Mac Chaffee, a platform engineer and security champion, dives into the underestimated complexities of running modern applications. He discusses how overconfidence can lead to costly mistakes, particularly when teams reject proven tools like Kubernetes. Mac highlights the tipping point where DIY solutions become burdensome and stresses the importance of mentorship in preventing poor technical decisions. He advocates for transparency in technology, urging teams to establish effective guardrails rather than hiding complexity.
Jun 17, 2025 • 23min
Beyond Kubernetes: Serverless Execution Models for Variable Workloads, with Marc Campora
Marc Campora, a systems consultant with experience in high-throughput platforms, shares his analysis of a real customer deployment with 500+ microservices. He breaks down the cost implications, technical constraints, and operational trade-offs between Kubernetes containers and AWS Lambda functions based on actual production data and migration assessments.You will learn:Cost analysis frameworks for comparing Lambda vs Kubernetes across different traffic patterns, including specific examples of 3x savings potential and the 80/20 rule for service utilizationMigration complexity factors when moving existing microservices to Lambda, including cold start issues, runtime model changes, and why it's often a complete rewrite rather than a simple portDecision criteria for choosing between platforms based on traffic consistency, computational requirements, and operational overhead toleranceSponsorThis episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.More infoFind all the links and info for this episode here: https://ku.bz/5gMTkzLhVInterested in sponsoring an episode? Learn more.
Jun 10, 2025 • 36min
Shared Nothing, Shared Everything: The Truth About Kubernetes Multi-Tenancy, with Molly Sheets
Molly Sheets, Director of Engineering for Kubernetes at Zynga, leads platform engineering behind popular games like Words with Friends. She discusses how her team shifted from a one-cluster-per-team model to a more efficient multi-tenant architecture. Molly highlights the dangers of slowing deployment speeds and shares practical strategies for resource allocation and SLOs. She also delves into the unique challenges of Kubernetes in the gaming sector and candidly addresses the balance between technical roles and her journey as a new parent.
Jun 3, 2025 • 48min
My pipelines from GitLab Commit to ArgoCD got beaten by FTP, with David Pech
David Pech, a Staff Cloud Ops Engineer at Wrike with all CNCF certifications, shares his insights on cloud-native adoption challenges. He recounts how a sophisticated GitLab CI/CD setup was overtaken by simple FTP due to cultural resistance. David discusses the hidden costs of complex tooling and the importance of team readiness over technical superiority. He offers practical strategies for gradual cloud transitions, emphasizing in-house expertise and management advocacy, while also reflecting on his own journey through cloud technologies and Docker misconceptions.
May 27, 2025 • 36min
Performance testing Kubernetes workloads, with Stephan Schwarz
If you're tasked with performance testing Kubernetes workloads without much guidance, this episode offers clear, experience-based strategies that go beyond theory.Stephan Schwarz, a DevOps engineer at iits-consulting, walks through his systematic approach to performance testing Kubernetes applications. He covers everything from defining what performance actually means, to the practical methodology of breaking individual pods to understand their limits, and navigating the complexities of Kubernetes-specific components that affect test results.You will learn:How to establish baseline performance metrics by systematically testing individual pods, disabling autoscaling features, and documenting each incremental change to understand real application limitsWhy shared Kubernetes components skew results and how ingress controllers, service meshes, and monitoring stacks create testing challenges that require careful consideration of the entire request chainPractical approaches to HPA configuration, including how to account for scaling latency, the time delays inherent in Kubernetes scaling operations, and planning for spare capacity based on your SLA requirementsThe role of observability tools like OpenTelemetry in production environments where load testing isn't feasible, and how distributed tracing helps isolate performance bottlenecks across interdependent servicesSponsorThis episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.More infoFind all the links and info for this episode here: https://ku.bz/yY-FnmGfHInterested in sponsoring an episode? Learn more.
May 20, 2025 • 33min
Managing 100s of Kubernetes Clusters using Cluster API, with Zain Malik
Discover how to manage Kubernetes at scale with declarative infrastructure and automation principles.Zain Malik shares his experience managing multi-tenant Kubernetes clusters with up to 30,000 pods across clusters capped at 950 nodes. He explains how his team transitioned from Terraform to Cluster API for declarative cluster lifecycle management, contributing upstream to improve AKS support while implementing GitOps workflows.You will learn:How to address challenges in large-scale Kubernetes operations, including node pool management inconsistencies and lengthy provisioning timesWhy Cluster API provides a powerful foundation for multi-cloud cluster management, and how to extend it with custom operators for production-specific needsHow implementing GitOps principles eliminates manual intervention in critical operations like cluster upgradesStrategies for handling production incidents and bugs when adopting emerging technologies like Cluster APISponsorThis episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.More infoFind all the links and info for this episode here: https://ku.bz/5PLksqVlkInterested in sponsoring an episode? Learn more.
May 13, 2025 • 46min
Super-Scaling Open Policy Agent with Batch Queries, with Nicholaos Mouzourakis
Nicholaos Mouzourakis, a Staff Product Security Engineer at Gusto, dives into the intricacies of scaling authorization within Kubernetes using Open Policy Agent (OPA). He explains how traditional approaches fall short in microservices and shares his team's journey optimizing OPA performance through batch queries for impressive efficiency gains. Nicholaos also highlights surprising interactions between Kubernetes CPU limits and Go's performance, alongside deployment strategies that ensure smooth operations in production. His unique transition from the gaming industry enriches his insights.
May 6, 2025 • 34min
Kubernetes upgrades: beyond the one-click update, with Tanat Lokejaroenlarb
Discover how Adevinta manages Kubernetes upgrades at scale in this episode with Tanat Lokejaroenlarb. Tanat shares his team's journey from time-consuming blue-green deployments to efficient in-place upgrades for their multi-tenant Kubernetes platform SHIP, detailing the engineering decisions and operational challenges they overcame.You will learn:How to transition from blue-green to in-place Kubernetes upgrades while maintaining service reliabilityTechniques for tracking and addressing API deprecations using tools like Pluto and Kube-no-troubleStrategies for minimizing SLO impact during node rebuilds through serialized approaches and proper PDB configurationWhy a phased upgrade approach with "cluster waves" provides safer production deployments even with thorough testingSponsorThis episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.More infoFind all the links and info for this episode here: https://ku.bz/VVHFfXGl_Interested in sponsoring an episode? Learn more.
Apr 29, 2025 • 34min
From Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb
Discover how to build resilient Kubernetes environments at scale with practical automation strategies from an engineer who's tackled complex production challenges.Grzegorz Głąb, Kubernetes Engineer at Cloud Kitchens, shares his team's journey developing a comprehensive self-healing framework. He explains how they addressed issues ranging from spot node preemptions to network packet drops caused by unbalanced IRQs, providing concrete examples of automation that prevents downtime and improves reliability.You will learn:How managed Kubernetes services like AKS provide benefits but require customization for specific use casesThe architecture of an effective self-healing framework using DaemonSets and deployments with Kubernetes-native componentsPractical solutions for common challenges like StatefulSet pods stuck on unreachable nodes and cleaning up orphaned podsTechniques for workload-level automation, including throttling CPU-hungry pods and automating diagnostic data collectionSponsorThis episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.More infoFind all the links and info for this episode here: https://ku.bz/yg_fkP0LNInterested in sponsoring an episode? Learn more.
Apr 22, 2025 • 1h 3min
Replacing StatefulSets with a custom Kubernetes operator in our Postgres cloud platform, with Andrew Charlton
Discover why standard Kubernetes StatefulSets might not be sufficient for your database workloads and how custom operators can provide better solutions for stateful applications.Andrew Charlton, Staff Software Engineer at Timescale, explains how they replaced Kubernetes StatefulSets with a custom operator called Popper for their PostgreSQL Cloud Platform. He details the technical limitations they encountered with StatefulSets and how their custom approach provides more intelligent management of database clusters.You will learn:Why StatefulSets fall short for managing high-availability PostgreSQL clusters, particularly around pod ordering and volume managementHow Timescale's instance matching approach solves complex reconciliation challenges when managing heterogeneous database workloadsThe benefits of implementing discrete, idempotent actions rather than workflows in Kubernetes operatorsReal-world examples of operations that became possible with their custom operator, including volume downsizing and availability zone consolidationSponsorThis episode is brought to you by mirrord — run local code like in your Kubernetes cluster without deploying first.More infoFind all the links and info for this episode here: https://ku.bz/fhZ_pNXM3Interested in sponsoring an episode? Learn more.