

Platform Engineering Podcast
Cory O'Daniel, CEO of Massdriver
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver.
Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”
Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.
Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.
Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”
Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.
Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.
Episodes
Mentioned books

Jul 30, 2025 • 50min
From React to Dagster: Pete Hunt on Data, Infra, and AI-Ready Platforms
Is Postgres actually a better message queue than Kafka? This provocative question is just one of many insights Pete Hunt shares in this conversation about data orchestration, platform engineering, and the evolution of infrastructure.Pete Hunt, CEO of Dagster Labs and former React co-founder at Facebook, brings his unique perspective from working at tech giants like Instagram and Twitter to discuss how different platform team approaches impact product development. Having witnessed both Facebook's clear delineation between product and infrastructure teams and Twitter's DevOps-style ownership model, Pete offers valuable comparisons of these contrasting philosophies.The conversation explores:How Dagster provides a higher-level abstraction for data teams, making it easier to track and debug data assets rather than just managing workflowsThe challenges of modern data platforms and why many organizations struggle with complex, distributed systems that could be simplifiedA practical approach to migrating from Airflow to Dagster with their "Airlift" toolkit that allows for incremental, low-risk transitionsHow AI development is fueling demand for better data orchestration as companies build applications that rely on properly managed data pipelinesPete also shares his thoughtful approach to balancing technical debt and product development with a "quarter on, quarter off" cadence that allows teams to both ship features and clean up the inevitable corners that get cut under deadline pressure.For platform engineers, data teams, and technical leaders navigating the intersection of infrastructure and AI, this episode provides practical insights on creating abstractions that deliver real operational value without unnecessary complexity.Guest: Pete Hunt, CEO of DagsterPete is the CEO of Dagster Labs, where he first joined as Head of Engineering in early 2022 and transitioned into the CEO role later that same year. Before Dagster, Pete co-founded Smyte, an anti-abuse startup acquired by Twitter, where he continued as a senior staff engineer.Earlier in his career, Pete was one of the first engineers to work on Instagram after its acquisition by Facebook in 2012. There, he led development on Instagram’s web and analytics teams and became a co-founder of the React.js project, helping transform an internal experiment into one of the most widely used front-end frameworks in the world. He was also part of the early community around GraphQL and has remained deeply engaged in open source and developer tooling.Pete brings a pragmatic, hands-on perspective to modern data infrastructure. Having been both a founder and an engineer, he focuses on reducing complexity and fatigue in data teams by building tools that actually work together. At Dagster, he remains close to the code and actively involved in technical decisions, combining leadership with deep technical fluency.Pete Hunt, XDagsterDagster PipesDagster AirliftLinks to interesting things from this episode:React“Postgres: a Better Message Queue than Kafka?”AirflowKubeflowCAPESFargate

Jul 16, 2025 • 49min
Building Better Platforms with Dapr: Abstractions, Portability, and Durable Systems with Mark Fussell
Cloud lock-in isn't just about where your data lives—it's about how deeply cloud-specific code permeates your applications. Mark Fussell, co-creator of Dapr and CEO of Diagrid, joins Cory O'Daniel to explore how Dapr provides clean abstractions for common distributed system patterns, enabling teams to build portable applications without sacrificing cloud-native capabilities.The conversation covers:How Dapr creates a clean separation between application code and underlying infrastructure services like messaging, state management, and secretsWhy platform teams struggle with tight coupling between applications and infrastructure, and how Dapr solves this problemThe benefits of Dapr's sidecar architecture for local development, testing, and production environmentsHow Dapr automatically handles cross-cutting concerns like security, observability, and resiliency without boilerplate codeIntroduction to Dapr's workflow engine for durable execution and the emerging world of stateful AI agentsWhether you're a platform engineer struggling with cloud lock-in or a developer tired of rewriting code for different infrastructures, this conversation demonstrates how Dapr can simplify your distributed systems while maintaining access to the unique capabilities of each cloud provider.Guest: Mark Fussell, Co-founder of Dapr and CEO of DiagridMark Fussell is the CEO of Diagrid, a cutting-edge company that simplifies building and scaling cloud-native applications. As the co-founder of Dapr (Distributed Application Runtime), Mark has played a pivotal role in shaping the future of modern application development by empowering developers to build resilient, distributed systems with ease. With decades of experience in the software industry, Mark has been a driving force behind innovative solutions that bridge the gap between developers and complex infrastructure.DiagridDaprLinks to interesting things from this episode:"XML Bible" by Elliotte Rusty HaroldOpenTelemetrySPIFFEDataGalaxy case studyCloud Native Computing Foundation

Jul 2, 2025 • 48min
What CVEs Did for Security, CREs Are Doing for Reliability
Did you know that software engineers often "learn things the hard way" because they lack a standardized system to share knowledge about reliability issues? While security professionals have CVEs to catalog vulnerabilities, reliability engineers have been left to reinvent the wheel with each new bug or outage.Tony Meehan, co-founder and CTO of Prequel, introduces us to Common Reliability Enumerations (CREs) - an open-source approach that's doing for reliability what CVEs did for security. After spending a decade at the NSA hunting vulnerabilities, Tony recognized that the same community-driven approach could revolutionize how we handle reliability issues.This conversation covers:How CREs help developers detect and mitigate reliability issues before they cause outagesThe open-source tools Preq and CRE that allow teams to leverage community knowledgePractical ways to implement these tools in your development workflow (locally, in CI/CD, and production)How this approach can reduce cloud costs by identifying issues rather than over-provisioningTips for debugging mysterious production issues when no CRE exists yetGuest: Tony Meehan, CTO at PrequelTony is an engineering leader obsessed with bugs. He dedicated a decade to vulnerability and exploit development at the National Security Agency (NSA) before leading Engineering at Endgame and Elastic. In 2023, Tony co-founded Prequel to change the way application failure is detected and resolved. Tony Meehan, Xprequel.devgithub.com/prequel-devPrequel, XLinks to interesting things from this episode:Blog post about the partial outage at EndgameCommon Reliability Enumeration (CRE)PreqXKCD: Standards Episode on security with Danny Allan from SnykBrendan Gregg's blog

May 28, 2025 • 57min
From DevOps to 'Vibe Coding': Gene Kim on AI-Assisted Development and Platform Engineering
Gene Kim, co-founder of IT Revolution and author of The Phoenix Project, discusses the revolutionary concept of Vibe Coding in software development. He reveals how AI is making once-daunting projects possible in mere weeks and enabling developers to write thousands of lines of code daily. Kim debunks myths about AI replacing developers, emphasizing its role in enhancing creativity and ambition. He also shares insights on avoiding common pitfalls in AI implementation, the importance of feedback loops, and the future of AI-assisted coding for tech leaders.

Apr 30, 2025 • 45min
Snyk’s Danny Allan on Making Security Developer-Friendly
Security often feels like a roadblock to developers, but what if it could be seamlessly integrated into the development process? As software delivery becomes increasingly automated and self-service, the traditional approach to security needs a major overhaul.Danny Allan, CTO at Snyk, shares practical insights on transforming security from a bottleneck into an enabler of developer productivity. Drawing from his extensive experience at IBM, VMware, and Veeam, Allan discusses how security teams can shift left effectively without creating friction.Key topics covered:Building successful security champions programs that cultivate curiosity rather than relying solely on senior developersPractical approaches to embedding security controls into development pipelines, from IDE integration to PR checksStrategies for measuring security team success beyond just vulnerability countsThe role of pre-hardened containers and infrastructure-as-code scanning in platform securityHow AI is transforming both code generation and security tooling, including Snyk's approach to vulnerability detectionGuest: Danny Allan , Chief Technology Officer at SnykAs CTO, Danny leads end-to-end ownership of Snyk’s current core offerings and roadmap, as well as the company’s near-term platform vision. Before joining Snyk, he was CTO at Veeam and Desktone (acquired by VMWare) and Director of Security Research at IBM. In his free time, he loves scuba diving, cycling, and hockey (like a true Canadian!)Snyk, website Snyk, X Snyk, YouTubeSnyk, GithubSnyk, DiscordThe Secure Developer Podcast by SnykLinks to interesting things from this episode:DistroListChainguardVerizon Data Breach Investigation ReportHack This SiteModel Context Protocol

Apr 16, 2025 • 41min
vCluster with Lukas Gentele: Rethinking Kubernetes Multi-Tenancy
Are your platform teams constantly saying "no" to requests for new Kubernetes clusters? The traditional approach to Kubernetes multi-tenancy forces organizations to choose between cluster sprawl or restrictive namespaces - neither of which fully meets the needs of modern development teams.Lukas Gentele, CEO and co-founder of Loft Labs, shares how vCluster is transforming the way organizations handle multi-tenancy in Kubernetes. By running virtual Kubernetes control planes inside namespaces, vCluster enables teams to experiment with different versions, operators, and configurations while maintaining efficient resource usage.Key topics covered:How vCluster solves the limitations of namespace-based multi-tenancyRunning multiple Kubernetes versions in the same cluster for testing and gradual upgradesManaging bare metal GPU resources efficiently for AI/ML workloadsBalancing standardization with developer autonomy in platform engineeringUsing virtual clusters for cost-effective testing across multiple Kubernetes versionsWhether you're a platform engineer looking to say "yes" more often or a development team seeking greater autonomy within Kubernetes, this discussion offers practical insights into modern multi-tenancy approaches.Guest: Lukas Gentele, CEO & Co-Founder at LoftLabsLukas Gentele is the CEO and Co-founder of Loft Labs, which delivers Kubernetes-native tools, functionality and frameworks purpose-built for platform engineers to manage, activate and optimize their platform stack. Gentele is a dynamic leader with wide-ranging expertise in enterprise architecture, distributed systems, and developer productivity solutions. Prior to Loft, Gentele served as the co-founder and CEO at covexo GmbH and Webmans. Gentele often speaks at conferences such as KubeCon, writes articles for leading industry journals, and likes to share his experiences at meetups. Gentele holds a Bachelor of Science in Computer Science and Information Systems, and a Master of Science in Computer Science & Management of Enterprise Information Systems, both from the University of Mannheim.Lukas Gentele, LinkedInLoft Labs vCluster Slack channelLinks to interesting things from this episode:"Kubernetes the Hard Way" by Kelsey Hightower DevSpace “Inception”

Apr 2, 2025 • 54min
Building Real-World Platforms: Abby Bangser on CNCF, Kratix, & Syntasso
When organizations grow beyond using third-party platforms, they face a critical challenge: how to build internal platforms that enable teams to work efficiently while maintaining security and compliance. Abby Bangser, founding principal engineer at Syntasso, shares insights on creating real-world platforms that strike the right balance between standardization and flexibility.Key InsightsThe shift from external platforms to internal ones often comes from specific business needs, like compliance requirementsSuccessful platform engineering requires finding the right balance between prescriptive standards and flexible customizationPlatforms should offer multiple levels of abstraction - from simplified "paved paths" to advanced customization optionsPlatform teams should watch how users interact with their services to identify emerging patterns and needsGuest: Abby Bangser, Founding Principal Engineer at Syntasso.A hands-on software delivery professional with a passion for using quality as the foundation for quick value-focused delivery. Abby truly embodies the benefits of being a specialising generalist with experience and interests across traditionally Business Analyst, Quality Analyst, Developer, DevOps, Platform Engineering, SRE, and Infrastructure Engineering job titles and effectively leveraging those skills to encourage a well-rounded approach to quality software delivery.The next step for software professionals is the ability to drive thoughtful creation and evaluation of data from the operation of live services. With this in mind, Abby is currently excited to be working in an environment where engineers build and run their own software and understand the value of not only monitoring and logging, but striving for an observable system. It is now her goal to use these tools to create even more collaboration across skills like user research, quality, operations, infrastructure and software development to identify unknown issues that our users face in innovative ways.Abby Bangser, Bluesky Syntasso CNCF Links to interesting things from this episode:ThoughtWorks Massdriver Kratix OpenTofu “Let a 1,000 flowers bloom. Then rip 999 of them out by the roots.” Charity Majors

Mar 19, 2025 • 48min
Smart TV Testing Made Simple with Dave Lucia of TV Labs
Testing smart TV applications presents unique challenges that traditional web testing approaches can't solve. Dave Lucia, CTO and co-founder of TV Labs, shares how his team built a platform that virtualizes televisions and set-top boxes to help media companies test their smart TV apps on physical devices.Learn about TV Labs' innovative architecture and how they handle everything from camera-based testing systems to their custom Lua-based DSL for faster test execution. A key highlight is how choosing Elixir as their primary technology has enabled TV Labs to build a robust orchestration system. The language's built-in capabilities for fault tolerance, process isolation, and distributed computing make it particularly well-suited for managing concurrent connections and real-time state across multiple devices.The discussion also explores practical insights about system architecture, including how TV Labs leverages Phoenix presence for real-time device state tracking and achieves microsecond-level performance for message broadcasting.Guest: Dave Lucia, CTO & Co-Founder at TV LabsDave is a technology leader with deep experience designing and scaling systems across industries including media, sports betting, finance, and developer tooling. He is a prominent member of the BEAM community, regularly speaking at conferences such as Code BEAM SF, ElixirConf, The Big Elixir, and RabbitMQ Summit.Dave Lucia, WebsiteDave Lucia, XDave Lucia. BlueskyTV LabsTV Labs, LinkedInLinks to interesting things from this episode:Appium“The Road to 2 Million Websocket Connections in Phoenix”“From $erverless to Elixir”eBPF

Feb 26, 2025 • 1h 3min
Trust, Lock-in, And Better Infrastructure Management
Why do 70% of organizations still struggle to adopt infrastructure as code? Sören Martius, CPO and co-founder of Terramate, joins Cory O'Daniel to tackle the challenges of modern infrastructure management and the delicate balance between vendor trust and lock-in.The conversation explores practical solutions for common infrastructure challenges, from managing monolithic state files to orchestrating complex deployments. Martius shares insights on: When to maintain a monolithic state file versus breaking it into smaller unitsHow infrastructure needs evolve as engineering teams grow beyond 100 peopleWhy anti-lock-in features build trust with operations teamsThe role of AI in detecting and remediating infrastructure misconfigurationsFor teams wrestling with infrastructure complexity or evaluating new tools, this discussion offers practical perspectives on building scalable, maintainable infrastructure while avoiding common pitfalls around vendor lock-in and team adoption.Guest: Sören Martius, Founder at TerramateSören is an entrepreneur and technologist who loves building and delivering digital products and managing and scaling engineering teams for various kinds of businesses. His interests in technologies lie with DLT’s, Distributed Networks, Machine Learning, Microservices, Serverless Compute, Docker (and Kubernetes), AWS, Spark, Scala, Go, Elixir & OTP, Python, Rust, and Typescript among many others. Sören likes simplicity, pragmatism and common sense while bridging business, product and technology.Sören Martius, XTerramate, websiteTerramate, GitHubLinks to interesting things from this episode:TerragruntWizStakpakReclaim AIFyxerCursorWindsurf

Feb 5, 2025 • 57min
Meeting Developers In Their Existing Workflows: The Terrateam Advantage
Building infrastructure tooling doesn't require massive VC funding or a huge team - just ask Malcolm Matalka, co-founder of bootstrapped Terrateam. Malcolm shares his journey from real estate websites to investment banking to biotech, before landing in infrastructure automation.Learn how Terrateam takes a unique "libraries over frameworks" approach to development, prioritizing simplicity and control by carefully selecting dependencies and building critical components in-house. Malcolm explains how this philosophy leads to more maintainable code and better security outcomes.As an early participant in the OpenTofu fork, Malcolm provides insights into the community response and adoption challenges. He discusses how Terrateam helps teams streamline their infrastructure workflows by integrating directly with existing tools and processes rather than forcing new ones.For platform engineers looking to simplify their infrastructure management, Malcolm describes the ideal Terrateam user as someone who wants infrastructure changes to flow naturally through their existing development process without added complexity.Guest: Malcolm Matalka, Software Engineer, Co-Founder of TerrateamAs a co-founder at Terrateam, Malcolm enables teams to deliver infrastructure faster with their tools and services. They leverage Terraform and OpenTofu to automate, manage, and scale cloud infrastructure for developers and organizations.With over 20 years of experience in software engineering, he has a strong background in cloud computing, distributed systems, and infrastructure.Malcolm is also passionate about aerospace and bioinformatics, which led him to found Cosmo Labs AB and work as a software consultant at Abiogenesis Computer Systems Lab. At Cosmo Labs AB, they provided software solutions for satellite communication, orbital mechanics, and data analysis. At Abiogenesis Computer Systems Lab, they tackled bioinformatics challenges such as genome sequencing, protein structure prediction, and drug discovery. Prior to that, he worked for Spotify managing storage solutions at scale.Malcolm Matalka - RedditTerrateamTerrateam - LinkedInLinks to interesting things from this episode:ErlangRiakMnesiaOCamlPuppetTokio