
Platform Engineering Podcast
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver.
Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”
Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.
Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.
Latest episodes

Jul 16, 2025 • 49min
Building Better Platforms with Dapr: Abstractions, Portability, and Durable Systems with Mark Fussell
Cloud lock-in isn't just about where your data lives—it's about how deeply cloud-specific code permeates your applications. Mark Fussell, co-creator of Dapr and CEO of Diagrid, joins Cory O'Daniel to explore how Dapr provides clean abstractions for common distributed system patterns, enabling teams to build portable applications without sacrificing cloud-native capabilities.The conversation covers:How Dapr creates a clean separation between application code and underlying infrastructure services like messaging, state management, and secretsWhy platform teams struggle with tight coupling between applications and infrastructure, and how Dapr solves this problemThe benefits of Dapr's sidecar architecture for local development, testing, and production environmentsHow Dapr automatically handles cross-cutting concerns like security, observability, and resiliency without boilerplate codeIntroduction to Dapr's workflow engine for durable execution and the emerging world of stateful AI agentsWhether you're a platform engineer struggling with cloud lock-in or a developer tired of rewriting code for different infrastructures, this conversation demonstrates how Dapr can simplify your distributed systems while maintaining access to the unique capabilities of each cloud provider.Guest: Mark Fussell, Co-founder of Dapr and CEO of DiagridMark Fussell is the CEO of Diagrid, a cutting-edge company that simplifies building and scaling cloud-native applications. As the co-founder of Dapr (Distributed Application Runtime), Mark has played a pivotal role in shaping the future of modern application development by empowering developers to build resilient, distributed systems with ease. With decades of experience in the software industry, Mark has been a driving force behind innovative solutions that bridge the gap between developers and complex infrastructure.DiagridDaprLinks to interesting things from this episode:"XML Bible" by Elliotte Rusty HaroldOpenTelemetrySPIFFEDataGalaxy case studyCloud Native Computing Foundation

Jul 2, 2025 • 48min
What CVEs Did for Security, CREs Are Doing for Reliability
Did you know that software engineers often "learn things the hard way" because they lack a standardized system to share knowledge about reliability issues? While security professionals have CVEs to catalog vulnerabilities, reliability engineers have been left to reinvent the wheel with each new bug or outage.Tony Meehan, co-founder and CTO of Prequel, introduces us to Common Reliability Enumerations (CREs) - an open-source approach that's doing for reliability what CVEs did for security. After spending a decade at the NSA hunting vulnerabilities, Tony recognized that the same community-driven approach could revolutionize how we handle reliability issues.This conversation covers:How CREs help developers detect and mitigate reliability issues before they cause outagesThe open-source tools Preq and CRE that allow teams to leverage community knowledgePractical ways to implement these tools in your development workflow (locally, in CI/CD, and production)How this approach can reduce cloud costs by identifying issues rather than over-provisioningTips for debugging mysterious production issues when no CRE exists yetGuest: Tony Meehan, CTO at PrequelTony is an engineering leader obsessed with bugs. He dedicated a decade to vulnerability and exploit development at the National Security Agency (NSA) before leading Engineering at Endgame and Elastic. In 2023, Tony co-founded Prequel to change the way application failure is detected and resolved. Tony Meehan, Xprequel.devgithub.com/prequel-devPrequel, XLinks to interesting things from this episode:Blog post about the partial outage at EndgameCommon Reliability Enumeration (CRE)PreqXKCD: Standards Episode on security with Danny Allan from SnykBrendan Gregg's blog

May 28, 2025 • 57min
From DevOps to 'Vibe Coding': Gene Kim on AI-Assisted Development and Platform Engineering
Gene Kim, co-founder of IT Revolution and author of The Phoenix Project, discusses the revolutionary concept of Vibe Coding in software development. He reveals how AI is making once-daunting projects possible in mere weeks and enabling developers to write thousands of lines of code daily. Kim debunks myths about AI replacing developers, emphasizing its role in enhancing creativity and ambition. He also shares insights on avoiding common pitfalls in AI implementation, the importance of feedback loops, and the future of AI-assisted coding for tech leaders.

Apr 30, 2025 • 45min
Snyk’s Danny Allan on Making Security Developer-Friendly
Security often feels like a roadblock to developers, but what if it could be seamlessly integrated into the development process? As software delivery becomes increasingly automated and self-service, the traditional approach to security needs a major overhaul.Danny Allan, CTO at Snyk, shares practical insights on transforming security from a bottleneck into an enabler of developer productivity. Drawing from his extensive experience at IBM, VMware, and Veeam, Allan discusses how security teams can shift left effectively without creating friction.Key topics covered:Building successful security champions programs that cultivate curiosity rather than relying solely on senior developersPractical approaches to embedding security controls into development pipelines, from IDE integration to PR checksStrategies for measuring security team success beyond just vulnerability countsThe role of pre-hardened containers and infrastructure-as-code scanning in platform securityHow AI is transforming both code generation and security tooling, including Snyk's approach to vulnerability detectionLove the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

Apr 16, 2025 • 41min
vCluster with Lukas Gentele: Rethinking Kubernetes Multi-Tenancy
Are your platform teams constantly saying "no" to requests for new Kubernetes clusters? The traditional approach to Kubernetes multi-tenancy forces organizations to choose between cluster sprawl or restrictive namespaces - neither of which fully meets the needs of modern development teams.Lukas Gentele, CEO and co-founder of Loft Labs, shares how vCluster is transforming the way organizations handle multi-tenancy in Kubernetes. By running virtual Kubernetes control planes inside namespaces, vCluster enables teams to experiment with different versions, operators, and configurations while maintaining efficient resource usage.Key topics covered:How vCluster solves the limitations of namespace-based multi-tenancyRunning multiple Kubernetes versions in the same cluster for testing and gradual upgradesManaging bare metal GPU resources efficiently for AI/ML workloadsBalancing standardization with developer autonomy in platform engineeringUsing virtual clusters for cost-effective testing across multiple Kubernetes versionsWhether you're a platform engineer looking to say "yes" more often or a development team seeking greater autonomy within Kubernetes, this discussion offers practical insights into modern multi-tenancy approaches.Love the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

Apr 2, 2025 • 54min
Building Real-World Platforms: Abby Bangser on CNCF, Kratix, & Syntasso
When organizations grow beyond using third-party platforms, they face a critical challenge: how to build internal platforms that enable teams to work efficiently while maintaining security and compliance. Abby Bangser, founding principal engineer at Syntasso, shares insights on creating real-world platforms that strike the right balance between standardization and flexibility.Key InsightsThe shift from external platforms to internal ones often comes from specific business needs, like compliance requirementsSuccessful platform engineering requires finding the right balance between prescriptive standards and flexible customizationPlatforms should offer multiple levels of abstraction - from simplified "paved paths" to advanced customization optionsPlatform teams should watch how users interact with their services to identify emerging patterns and needsLove the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

Mar 19, 2025 • 48min
Smart TV Testing Made Simple with Dave Lucia of TV Labs
Testing smart TV applications presents unique challenges that traditional web testing approaches can't solve. Dave Lucia, CTO and co-founder of TV Labs, shares how his team built a platform that virtualizes televisions and set-top boxes to help media companies test their smart TV apps on physical devices.Learn about TV Labs' innovative architecture and how they handle everything from camera-based testing systems to their custom Lua-based DSL for faster test execution. A key highlight is how choosing Elixir as their primary technology has enabled TV Labs to build a robust orchestration system. The language's built-in capabilities for fault tolerance, process isolation, and distributed computing make it particularly well-suited for managing concurrent connections and real-time state across multiple devices.The discussion also explores practical insights about system architecture, including how TV Labs leverages Phoenix presence for real-time device state tracking and achieves microsecond-level performance for message broadcasting.Love the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

Feb 26, 2025 • 1h 3min
Trust, Lock-in, And Better Infrastructure Management
Why do 70% of organizations still struggle to adopt infrastructure as code? Sören Martius, CPO and co-founder of Terramate, joins Cory O'Daniel to tackle the challenges of modern infrastructure management and the delicate balance between vendor trust and lock-in.The conversation explores practical solutions for common infrastructure challenges, from managing monolithic state files to orchestrating complex deployments. Martius shares insights on: When to maintain a monolithic state file versus breaking it into smaller unitsHow infrastructure needs evolve as engineering teams grow beyond 100 peopleWhy anti-lock-in features build trust with operations teamsThe role of AI in detecting and remediating infrastructure misconfigurationsFor teams wrestling with infrastructure complexity or evaluating new tools, this discussion offers practical perspectives on building scalable, maintainable infrastructure while avoiding common pitfalls around vendor lock-in and team adoption.Love the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

Feb 5, 2025 • 57min
Meeting Developers In Their Existing Workflows: The Terrateam Advantage
Building infrastructure tooling doesn't require massive VC funding or a huge team - just ask Malcolm Matalka, co-founder of bootstrapped Terrateam. Malcolm shares his journey from real estate websites to investment banking to biotech, before landing in infrastructure automation.Learn how Terrateam takes a unique "libraries over frameworks" approach to development, prioritizing simplicity and control by carefully selecting dependencies and building critical components in-house. Malcolm explains how this philosophy leads to more maintainable code and better security outcomes.As an early participant in the OpenTofu fork, Malcolm provides insights into the community response and adoption challenges. He discusses how Terrateam helps teams streamline their infrastructure workflows by integrating directly with existing tools and processes rather than forcing new ones.For platform engineers looking to simplify their infrastructure management, Malcolm describes the ideal Terrateam user as someone who wants infrastructure changes to flow naturally through their existing development process without added complexity.Love the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

5 snips
Jan 22, 2025 • 1h 16min
Beyond GitOps: Rethinking Cloud Self-Service with Dave Williams
Dave Williams, co-founder of Massdriver, shares his extensive experience in cloud engineering and interactive art. He challenges the limits of GitOps, suggesting it may hinder teams more than help. The discussion reveals innovative strategies for creating automated self-service platforms that enhance developer productivity without sacrificing security. Williams emphasizes the need for proactive compliance and tailored infrastructure approaches, making a strong case for evolving cloud operations to empower engineers and streamline workflows.