

KubeFM
KubeFM
Discover all the great things happening in the world of Kubernetes, learn (controversial) opinions from the experts and explore the successes (and failures) of running Kubernetes at scale.
Episodes
Mentioned books

Dec 2, 2025 • 31min
A Journey Through Kafkian SplitDNS in a Multitenant Kubernetes, with Fabián Sellés Rosa
Fabián Sellés Rosa, Tech Lead of the Runtime team at Adevinta, walks through a real engineering investigation that started with a simple request: allowing tenants to use third-party Kafka services. What seemed straightforward turned into a complex DNS resolution problem that required testing seven different approaches before a working solution was found.You will learn:Why Kafka's multi-step DNS resolution creates unique challenges in multi-tenant environments, where bootstrap servers and dynamic broker lists complicate standard DNS approachesThe iterative debugging process from Route 53 split DNS through Kubernetes native pod DNS config, custom DNS servers, Kafka proxies, and CoreDNS solutionsHow to implement the final solution using node-local DNS and CoreDNS templating with practical details including ndots configuration and Kyverno automationPlatform engineering evaluation criteria for assessing solutions based on maintainability, self-service capability, and evolvability in multi-tenant environmentsSponsorThis episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.More infoFind all the links and info for this episode here: https://ku.bz/NsBZ-FwcJInterested in sponsoring an episode? Learn more.

Nov 25, 2025 • 25min
More Kubernetes Than I Bargained For, with Amos Wenger
Amos Wenger, a developer and writer specializing in hands-on Kubernetes experiences, shares his intriguing saga of adding a home computer to his production K3s cluster. He delves into how this decision led to TLS certificate renewal failures due to NAT issues with consumer routers. The conversation highlights debugging tools like K9s and netshoot, and the unexpected IPv6 behavior encountered. Amos offers best practices for managing mixed infrastructure and encourages listeners to avoid mixing home nodes with production for a smoother experience.

11 snips
Nov 18, 2025 • 26min
The Karpenter Effect: Redefining Kubernetes Operations, with Tanat Lokejaroenlarb
Tanat Lokejaroenlarb, an SRE at Adevinta leading the runtime team for the Ship platform, dives into his journey migrating to AWS Karpenter. He reveals how replacing EKS Managed Node Groups enhanced Kubernetes operations, cutting costs by €30,000 monthly. Tanat discusses innovative strategies like decoupling upgrades, implementing automated instance selection, and using Kyverno for policy automation. Learn about over-provisioning with low-priority pods and the significant performance benefits of AMD adoption. His insights are a must for anyone in cloud-native operations!

Nov 11, 2025 • 33min
Building Kubernetes (a lite version) from scratch in Go, with Owumi Festus
Festus Owumi walks through his project of building a lightweight version of Kubernetes in Go. He removed etcd (replacing it with in-memory storage), skipped containers entirely, dropped authentication, and focused purely on the control plane mechanics. Through this process, he demonstrates how the reconciliation loop, API server concurrency handling, and scheduling logic actually work at their most basic level.You will learn:How the reconciliation loop works - The core concept of desired state vs current state that drives all Kubernetes operationsWhy the API server is the gateway to etcd - How Kubernetes prevents race conditions using optimistic concurrency control and why centralized validation mattersWhat the scheduler actually does - Beyond simple round-robin assignment, understanding node affinity, resource requirements, and the complex scoring algorithms that determine pod placementThe complete pod lifecycle - Step-by-step walkthrough from kubectl command to running pod, showing how independent components work together like an orchestraSponsorThis episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimizationMore infoFind all the links and info for this episode here: https://ku.bz/pf5kK9lQFInterested in sponsoring an episode? Learn more.

14 snips
Nov 4, 2025 • 42min
Graphs in your head, or how to assess a Kubernetes workload, with Oleksii Kolodiazhnyi
In this episode, Oleksii Kolodiazhnyi, a Senior Architect at Mirantis with a wealth of experience in Kubernetes, shares his insights on assessing complex systems. He introduces his 'Graphs in Your Head' approach, emphasizing top-down assessment strategies that begin with business needs. Oleksii discusses practical visualization tools like KubeView and K9s for understanding resource interactions. He also highlights documentation strategies for both technical teams and business stakeholders, ensuring seamless communication and onboarding.

Oct 28, 2025 • 35min
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime, with Andrew Jeffree
Andrew Jeffree, a Staff Cloud Infrastructure Engineer at SafetyCulture, shares insights from migrating over 250 microservices to GitOps with Argo CD, all while ensuring zero downtime. He discusses the switch from a complex Helm setup to a CUE-based domain-specific language, which enhances developer experience with better validation. Key topics include strategies for seamless migration, automated reconciliation, and the importance of empathy in engineering design. Jeffree also highlights the benefits of codeless approvals and adapting tools to minimize operational pain points.

Oct 21, 2025 • 50min
The Double-Edged Sword of AI-Assisted Kubernetes Operations, with Mai Nishitani
Mai Nishitani, Director of Enterprise Architecture at NTT Data and AWS Community Builder, demonstrates how Model Context Protocol (MCP) enables Claude to directly interact with Kubernetes clusters through natural language commands.You will learn:How MCP servers work and why they're significant for standardizing AI integration with DevOps tools, moving beyond custom integrations to a universal protocolThe practical capabilities and critical limitations of AI in Kubernetes operationsWhy fundamental troubleshooting skills matter more than ever as AI abstractions can fail in unexpected ways, especially during crisis scenarios and complex system failuresHow DevOps roles are evolving from manual administration toward strategic architecture and orchestrationSponsorThis episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.ioMore infoFind all the links and info for this episode here: https://ku.bz/3hWvQjXxpInterested in sponsoring an episode? Learn more.

Oct 20, 2025 • 27min
The Making of Flux: The Future, a KubeFM Original Series
In this closing episode, Bryan Ross (Field CTO at GitLab), Jane Yan (Principal Program Manager at Microsoft), Sean O’Meara (CTO at Mirantis) and William Rizzo (Strategy Lead, CTO Office at Mirantis) discuss how GitOps evolves in practice.How enterprises are embedding Flux into developer platforms and managed cloud services.Why bridging CI/CD and infrastructure remains a core challenge—and how GitOps addresses it.What leading platform teams (GitLab, Microsoft, Mirantis) see as the next frontier for GitOps.SponsorJoin the Flux maintainers and community at FluxCon, November 11th in Atlanta—register hereMore infoFind all the links and info for this episode here: https://ku.bz/tVqKwNYQHInterested in sponsoring an episode? Learn more.

Oct 14, 2025 • 43min
The Data Engineer's guide to optimizing Kubernetes, with Niels Claeys
Niels Claeys, a lead engineer at Dataminded and expert in Kubernetes optimization, shares insights on building Conveyor, a data platform processing over 1.5 million core hours monthly. He reveals how switching scheduler strategies can cut costs significantly while enhancing resource use. Niels also discusses achieving 97% spot instance utilization and the importance of multi-type diversification. He emphasizes the need for simplicity in coding and effective communication in tech, alongside practical tips for scaling and optimizing workloads.

Oct 13, 2025 • 23min
The Making of Flux: The Scale, a KubeFM Original Series
In this episode, Philippe Ensarguet, VP of Software Engineering at Orange, and Arnab Chatterjee, Global Head of Container & AI Platforms at Nomura, share how large enterprises are adopting Flux to drive reliable, compliant, and scalable platforms.How Orange uses Flux to manage bare-metal Kubernetes through its SYLVR project.Why FSIs rely on GitOps to balance agility with governance.How Flux helps enterprises achieve resilience, compliance, and repeatability at scale.SponsorJoin the Flux maintainers and community at FluxCon, November 11th in Atlanta—register hereMore infoFind all the links and info for this episode here: https://ku.bz/tWcHlJm7MInterested in sponsoring an episode? Learn more.


