

Platform Engineering Podcast
Cory O'Daniel, CEO of Massdriver
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver.
Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”
Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.
Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.
Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”
Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.
Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.
Episodes
Mentioned books

Oct 8, 2025 • 40min
Guest Host: Kelsey Hightower — Why IaC Alone Isn’t Enough
Ever wonder why strong Terraform modules still lead to long review queues and fragile pipelines? From hand-built scripts and early data center migrations to cloud sprawl and Kubernetes, configuration management has changed a lot - but the core struggle remains: too many decisions, not enough guardrails. Guest host Kelsey Hightower sits down with Cory O’Daniel to unpack where Infrastructure as Code succeeds and where teams get stuck.What you’ll learn:How to avoid “choice overload” in cloud configs by moving decisions upstreamPractical ways to pair IaC with UX, policies, and SLAs to reduce toilWhen click-ops is a symptom, not the problem - and how to replace it safelyPatterns for scaling platform practices beyond a handful of expertsA simple mental model for mapping workflows across serverless, containers, and VMsGuest Host: Kelsey HightowerKelsey has worn every hat possible throughout his career in tech and enjoys leadership roles focused on making things happen and shipping software. Prior to his retirement, he was a Distinguished Engineer at Google, where he worked on Google Cloud Platform. He is a strong open source advocate with a focus on building great software as well as great communities around them. He is also an accomplished author and keynote speaker with a knack for demystifying complex topics, doing live demos and enabling others to succeed. When he is not writing code, you can catch him giving technical workshops covering everything from programming to system administration.Guest: Cory O'Daniel, CEO and Co-Founder of Massdriver and Co-Founder of OpenTofuCory has been a software architect and engineer for 20 years, leading up to the founding of MassDriver. He's also a husband and the father of two kids.Cory O'Daniel, XCory O'Daniel, MediumMassdriver, websiteMassdriver, GitHubMassdriver, YoutubeOpen TofuLinks to interesting things from this episode:"The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win" by Gene Kim"15 Years of Duct Tape - Why IaC Adoption Stalled at 30"

Sep 24, 2025 • 44min
How to Ship Faster with Feature Flags: Insights from Unleash
Still freezing code before Black Friday and hoping nothing breaks? Feature flags can help you ship smaller, safer changes continuously—without the “big bang” risk or painful rollbacks.Cory O’Daniel talks with Unleash VP of Marketing Michael Ferranti about how modern teams use flags as a core delivery primitive alongside CI/CD and trunk-based development. They dig into kill switches for instant mitigation, progressive rollouts tied to real metrics, and why homegrown “if-statement” systems turn into hidden platforms you didn’t mean to build. They also cover the rising volume of AI‑assisted code and how flags provide the control layer to move faster while protecting reliability.What you’ll learn:How feature flags reduce risk for high-stakes periods like Black Friday by avoiding code freezesWhen to replace staging queues with progressive delivery and experiment-driven rolloutsPractical uses: kill switches, trunk-based development, targeting, and cleanup strategies to manage flag debtBuild vs. buy: why DIY flag systems become costly and how Unleash’s open source and on-prem options fit regulated or air‑gapped needsUsing business, engineering, and customer signals to automate safe ramp-ups and ramp-backsWhy AI increases code throughput, how it affects reliability, and how flags create the safety rails for agentic workflowsGuest: Michael Ferranti, VP of Marketing at UnleashMichael Ferranti has held leadership roles at Teleport, Portworx, ClusterHQ, and Rackspace Technology, with a focus on go-to-market strategy in open-source and enterprise software. At Teleport he focused on shifting from legacy security models to developer-first, identity-driven access. At Portworx, he was building new GTM strategies for Kubernetes-native storage when everyone was still figuring out containers, and he helped scale the company from under $500K in revenue to a $370M acquisition by Pure Storage. His work has centered on supporting engineering leaders in delivering features, scaling infrastructure, and improving security without adding unnecessary blockers. Michael has spoken at industry events like KubeCon and theCUBE, sharing insights on platform org design, category creation, and growing open-source adoption. Unleash, websiteUnleash, GitHubUnleash, LinkedInUnleash, XUnleash, SlackUnleash, YouTubeUnleashCon 2025Links to interesting things from this episode:ReactBitbucketLaunchDarklyServiceNowCockroachDBRed Hat OpenShiftState of DevOps Report (DORA)"How to Win Friends & Influence People"Grafana** REMINDER** - Apollo GraphQL has kindly offered us a few free passes to join them at the GraphQL Summit in San Francisco, October 6-8, 2025. If you are interested in going, the code is: PodcastSummit25

Sep 10, 2025 • 43min
GraphQL, MCP, and the Future of APIs with Apollo CEO Matt DeBergalis
**UPDATE** - Apollo GraphQL has kindly offered us a few free passes to join them at the GraphQL Summit in San Francisco, October 6-8, 2025. If you are interested in going, the code is: PodcastSummit25What if your API layer could help you ship faster today and make tomorrow’s AI workflows safer and easier to build?Apollo CEO Matt DeBergalis explains how GraphQL became a practical standard for unifying messy backends, why declarative schemas and strong types are the “bedrock” for agentic systems, and where MCP fits when you want agents to call business data safely. You’ll hear real examples of speeding up frontends, tightening observability, and running focused personalization without “fat” APIs.What you’ll learn: A plain-language model for GraphQL and why it decouples frontend needs from backend servicesHow typing, schema docs, and field-level telemetry reduce risk and enable LLM-driven toolingPractical ways to expose queries as MCP tools and start with internal “agentic DevOps”Tactics for experiments and personalization that stay fast and measurable at scaleWhy an end-to-end approach (client and server) matters for reliability and speedGuest: Matt DeBergalis, CEO and Co-Founder of Apollo GraphQLMatt DeBergalis is the Chief Executive Officer and Co-Founder of Apollo GraphQL, focused on bringing the popular GraphQL technology to the enterprise. He previously served as Apollo's CTO, leading product and engineering. Matt's longtime focus has been in open source and platforms: he co-founded Meteor.js, which grew to become one of the most popular open-source projects in the world for developing full-stack web apps with JavaScript, as well as ActBlue, the American political fundraising platform that revolutionized grassroots political giving. He attended the Massachusetts Institute of Technology and resides in the San Francisco Bay Area with his family. In his spare time, Matt enjoys taking to the air and flying his 1966 Beechcraft Baron.Apollo GraphQL, websiteApollo GraphQL, GitHubApollo GraphQL, LinkedInApollo GraphQL, XApollo GraphQL, YouTubeLinks to interesting things from this episode:Free Software FoundationCursorMotley Fool podcastGraphQL Summit

Aug 20, 2025 • 1h 9min
Beyond Cracking the Coding Interview with Mike Mroczka
Ever wondered how many “perfect” candidates simply learned the test—or how many great engineers get filtered out by bad interview design? Mike Mroczka, interview coach and ex-Googler, shares what really goes on behind technical hiring and how to navigate it to your advantage.What you’ll learn:How leaked question banks and standardized puzzles can distort hiring signals - and where they still helpPractical ways companies can make interviews fairer and harder to game, both on-site and remoteA balanced take on data structures and algorithms: when they’re useful and when they’re noiseTactics to spot and reduce cheating without turning interviews into surveillanceHow to structure interviews for different seniority levels so you measure the right skillsSalary negotiation playbook: timing, leverage, and common pitfalls that cost candidates real moneyGetting past the application black hole: skipping recruiters, networking that works, and coordinating offersWho this helps:Engineers tired of grinding puzzles who want a smarter prep planHiring managers looking to improve signal and reduce false negativesAnyone preparing to negotiate an offer with confidenceGuest: Mike Mroczka, Primary author of Beyond Cracking the Coding Interview, Ex-GoogleMike Mroczka, a former senior SWE (Google, Salesforce, GE), is now a tech consultant with a decade of experience helping engineers land their dream jobs. He’s a top-rated mentor (interviewing.io, Karat, Pathrise, Skilledinc) and the author of viral technical content on system design and technical interview strategies featured on HackerNews, Business Insider, and Wired.Mike Mroczka, websiteBeyond Cracking the Coding InterviewLinks to interesting things from this episode:Cracking the Coding Interview by Gayle Laakmann McDowell HackerOne Interviewing.io Cluely Google glass Ray-Ban HackerRank CodeSignal

Jul 30, 2025 • 50min
From React to Dagster: Pete Hunt on Data, Infra, and AI-Ready Platforms
Is Postgres actually a better message queue than Kafka? This provocative question is just one of many insights Pete Hunt shares in this conversation about data orchestration, platform engineering, and the evolution of infrastructure.Pete Hunt, CEO of Dagster Labs and former React co-founder at Facebook, brings his unique perspective from working at tech giants like Instagram and Twitter to discuss how different platform team approaches impact product development. Having witnessed both Facebook's clear delineation between product and infrastructure teams and Twitter's DevOps-style ownership model, Pete offers valuable comparisons of these contrasting philosophies.The conversation explores:How Dagster provides a higher-level abstraction for data teams, making it easier to track and debug data assets rather than just managing workflowsThe challenges of modern data platforms and why many organizations struggle with complex, distributed systems that could be simplifiedA practical approach to migrating from Airflow to Dagster with their "Airlift" toolkit that allows for incremental, low-risk transitionsHow AI development is fueling demand for better data orchestration as companies build applications that rely on properly managed data pipelinesPete also shares his thoughtful approach to balancing technical debt and product development with a "quarter on, quarter off" cadence that allows teams to both ship features and clean up the inevitable corners that get cut under deadline pressure.For platform engineers, data teams, and technical leaders navigating the intersection of infrastructure and AI, this episode provides practical insights on creating abstractions that deliver real operational value without unnecessary complexity.Guest: Pete Hunt, CEO of DagsterPete is the CEO of Dagster Labs, where he first joined as Head of Engineering in early 2022 and transitioned into the CEO role later that same year. Before Dagster, Pete co-founded Smyte, an anti-abuse startup acquired by Twitter, where he continued as a senior staff engineer.Earlier in his career, Pete was one of the first engineers to work on Instagram after its acquisition by Facebook in 2012. There, he led development on Instagram’s web and analytics teams and became a co-founder of the React.js project, helping transform an internal experiment into one of the most widely used front-end frameworks in the world. He was also part of the early community around GraphQL and has remained deeply engaged in open source and developer tooling.Pete brings a pragmatic, hands-on perspective to modern data infrastructure. Having been both a founder and an engineer, he focuses on reducing complexity and fatigue in data teams by building tools that actually work together. At Dagster, he remains close to the code and actively involved in technical decisions, combining leadership with deep technical fluency.Pete Hunt, XDagsterDagster PipesDagster AirliftLinks to interesting things from this episode:React“Postgres: a Better Message Queue than Kafka?”AirflowKubeflowCAPESFargate

Jul 16, 2025 • 49min
Building Better Platforms with Dapr: Abstractions, Portability, and Durable Systems with Mark Fussell
Cloud lock-in isn't just about where your data lives—it's about how deeply cloud-specific code permeates your applications. Mark Fussell, co-creator of Dapr and CEO of Diagrid, joins Cory O'Daniel to explore how Dapr provides clean abstractions for common distributed system patterns, enabling teams to build portable applications without sacrificing cloud-native capabilities.The conversation covers:How Dapr creates a clean separation between application code and underlying infrastructure services like messaging, state management, and secretsWhy platform teams struggle with tight coupling between applications and infrastructure, and how Dapr solves this problemThe benefits of Dapr's sidecar architecture for local development, testing, and production environmentsHow Dapr automatically handles cross-cutting concerns like security, observability, and resiliency without boilerplate codeIntroduction to Dapr's workflow engine for durable execution and the emerging world of stateful AI agentsWhether you're a platform engineer struggling with cloud lock-in or a developer tired of rewriting code for different infrastructures, this conversation demonstrates how Dapr can simplify your distributed systems while maintaining access to the unique capabilities of each cloud provider.Guest: Mark Fussell, Co-founder of Dapr and CEO of DiagridMark Fussell is the CEO of Diagrid, a cutting-edge company that simplifies building and scaling cloud-native applications. As the co-founder of Dapr (Distributed Application Runtime), Mark has played a pivotal role in shaping the future of modern application development by empowering developers to build resilient, distributed systems with ease. With decades of experience in the software industry, Mark has been a driving force behind innovative solutions that bridge the gap between developers and complex infrastructure.DiagridDaprLinks to interesting things from this episode:"XML Bible" by Elliotte Rusty HaroldOpenTelemetrySPIFFEDataGalaxy case studyCloud Native Computing Foundation

Jul 2, 2025 • 48min
What CVEs Did for Security, CREs Are Doing for Reliability
Did you know that software engineers often "learn things the hard way" because they lack a standardized system to share knowledge about reliability issues? While security professionals have CVEs to catalog vulnerabilities, reliability engineers have been left to reinvent the wheel with each new bug or outage.Tony Meehan, co-founder and CTO of Prequel, introduces us to Common Reliability Enumerations (CREs) - an open-source approach that's doing for reliability what CVEs did for security. After spending a decade at the NSA hunting vulnerabilities, Tony recognized that the same community-driven approach could revolutionize how we handle reliability issues.This conversation covers:How CREs help developers detect and mitigate reliability issues before they cause outagesThe open-source tools Preq and CRE that allow teams to leverage community knowledgePractical ways to implement these tools in your development workflow (locally, in CI/CD, and production)How this approach can reduce cloud costs by identifying issues rather than over-provisioningTips for debugging mysterious production issues when no CRE exists yetGuest: Tony Meehan, CTO at PrequelTony is an engineering leader obsessed with bugs. He dedicated a decade to vulnerability and exploit development at the National Security Agency (NSA) before leading Engineering at Endgame and Elastic. In 2023, Tony co-founded Prequel to change the way application failure is detected and resolved. Tony Meehan, Xprequel.devgithub.com/prequel-devPrequel, XLinks to interesting things from this episode:Blog post about the partial outage at EndgameCommon Reliability Enumeration (CRE)PreqXKCD: Standards Episode on security with Danny Allan from SnykBrendan Gregg's blog

May 28, 2025 • 57min
From DevOps to 'Vibe Coding': Gene Kim on AI-Assisted Development and Platform Engineering
Gene Kim, co-founder of IT Revolution and author of The Phoenix Project, discusses the revolutionary concept of Vibe Coding in software development. He reveals how AI is making once-daunting projects possible in mere weeks and enabling developers to write thousands of lines of code daily. Kim debunks myths about AI replacing developers, emphasizing its role in enhancing creativity and ambition. He also shares insights on avoiding common pitfalls in AI implementation, the importance of feedback loops, and the future of AI-assisted coding for tech leaders.

Apr 30, 2025 • 45min
Snyk’s Danny Allan on Making Security Developer-Friendly
Security often feels like a roadblock to developers, but what if it could be seamlessly integrated into the development process? As software delivery becomes increasingly automated and self-service, the traditional approach to security needs a major overhaul.Danny Allan, CTO at Snyk, shares practical insights on transforming security from a bottleneck into an enabler of developer productivity. Drawing from his extensive experience at IBM, VMware, and Veeam, Allan discusses how security teams can shift left effectively without creating friction.Key topics covered:Building successful security champions programs that cultivate curiosity rather than relying solely on senior developersPractical approaches to embedding security controls into development pipelines, from IDE integration to PR checksStrategies for measuring security team success beyond just vulnerability countsThe role of pre-hardened containers and infrastructure-as-code scanning in platform securityHow AI is transforming both code generation and security tooling, including Snyk's approach to vulnerability detectionGuest: Danny Allan , Chief Technology Officer at SnykAs CTO, Danny leads end-to-end ownership of Snyk’s current core offerings and roadmap, as well as the company’s near-term platform vision. Before joining Snyk, he was CTO at Veeam and Desktone (acquired by VMWare) and Director of Security Research at IBM. In his free time, he loves scuba diving, cycling, and hockey (like a true Canadian!)Snyk, website Snyk, X Snyk, YouTubeSnyk, GithubSnyk, DiscordThe Secure Developer Podcast by SnykLinks to interesting things from this episode:DistroListChainguardVerizon Data Breach Investigation ReportHack This SiteModel Context Protocol

Apr 16, 2025 • 41min
vCluster with Lukas Gentele: Rethinking Kubernetes Multi-Tenancy
Are your platform teams constantly saying "no" to requests for new Kubernetes clusters? The traditional approach to Kubernetes multi-tenancy forces organizations to choose between cluster sprawl or restrictive namespaces - neither of which fully meets the needs of modern development teams.Lukas Gentele, CEO and co-founder of Loft Labs, shares how vCluster is transforming the way organizations handle multi-tenancy in Kubernetes. By running virtual Kubernetes control planes inside namespaces, vCluster enables teams to experiment with different versions, operators, and configurations while maintaining efficient resource usage.Key topics covered:How vCluster solves the limitations of namespace-based multi-tenancyRunning multiple Kubernetes versions in the same cluster for testing and gradual upgradesManaging bare metal GPU resources efficiently for AI/ML workloadsBalancing standardization with developer autonomy in platform engineeringUsing virtual clusters for cost-effective testing across multiple Kubernetes versionsWhether you're a platform engineer looking to say "yes" more often or a development team seeking greater autonomy within Kubernetes, this discussion offers practical insights into modern multi-tenancy approaches.Guest: Lukas Gentele, CEO & Co-Founder at LoftLabsLukas Gentele is the CEO and Co-founder of Loft Labs, which delivers Kubernetes-native tools, functionality and frameworks purpose-built for platform engineers to manage, activate and optimize their platform stack. Gentele is a dynamic leader with wide-ranging expertise in enterprise architecture, distributed systems, and developer productivity solutions. Prior to Loft, Gentele served as the co-founder and CEO at covexo GmbH and Webmans. Gentele often speaks at conferences such as KubeCon, writes articles for leading industry journals, and likes to share his experiences at meetups. Gentele holds a Bachelor of Science in Computer Science and Information Systems, and a Master of Science in Computer Science & Management of Enterprise Information Systems, both from the University of Mannheim.Lukas Gentele, LinkedInLoft Labs vCluster Slack channelLinks to interesting things from this episode:"Kubernetes the Hard Way" by Kelsey Hightower DevSpace “Inception”