Adventures in DevOps

Will Button, Warren Parad

Join us in listening to the experienced experts discuss cutting edge challenges in the world of DevOps. From applying the mindset at your company, to career growth and leadership challenges within engineering teams, and avoiding the common antipatterns. Every episode you'll meet a new industry veteran guest with their own unique story.

Episodes

Mentioned books

Jul 17, 2025 • 53min

The Auth Showdown: Single tenant versus Multitenant Architectures

Get ready for a lively debate on this episode of Adventures in DevOps. We're joined by Brian Pontarelli, founder of FusionAuth and CleanSpeak. Warren and Brian face off by diving into the controversial topic of multitenant versus single-tenant architecture. Expert co-host Aimee Knight joins to moderate the discussion. Ever wondered how someone becomes an "auth expert"? Warren spills the beans on his journey, explaining it's less about a direct path and more about figuring out what it means for yourself. Brian chimes in with his own "random chance" story, revealing how they fell into it after their forum-based product didn't pan out.Aimee confesses her "alarm bells" start ringing whenever multitenant architecture is mentioned, jokingly demanding "details" and admitting her preference for more separation when it comes to reliability. Brian makes a compelling case for his company's chosen path, explaining how their high-performance, downloadable single-tenant profanity filter, CleanSpeak, handles billions of chat messages a month with extreme low latency. This architectural choice became a competitive advantage, attracting companies that couldn't use cloud-based multitenant competitors due to their need to run solutions in their own data centers.We critique cloud providers' tendency to push users towards their most profitable services, citing AWS Cognito as an example of a cost-effective solution for small-scale use that becomes cost-prohibitive with scaling and feature enablement. The challenges of integrating with Cognito, including its reliance on numerous other AWS services and the need for custom Lambda functions for configuration, are also a point of contention. The conversation extends to the frustrations of managing upgrades and breaking changes in both multitenant and single-tenant systems and the inherent difficulties of ensuring compatibility across different software versions and integrations. The episode concludes with a humorous take on the current state and perceived limitations of AI in software development, particularly concerning security.PicksWarren - Scarpa Hiking shoes - Planet Mojito SuadeAimee - Peloton TreadBrian - Searchcraft and Fight or Flight

Jun 24, 2025 • 1h 7min

Should We Be Using Kubernetes: Did the Best Product Win?

Omer Hamerman, an architect at Zesty specializing in Kubernetes and AI, joins the discussion to delve into whether Kubernetes won the infrastructure race based on merit or other factors. They analyze its surprising adoption rate and the idea that human preference for gradual improvements may explain its popularity despite perceived complexity. Omer also considers the merits of serverless solutions like AWS Fargate and the balance between control and efficiency, alongside the environmental challenges posed by Kubernetes deployment and AI.

Jun 21, 2025 • 1h 18min

Mastering SRE: Insights in Scale and at Capacity with Aimee Knight

In this episode, Aimee Knight, an expert in Site Reliability Engineering (SRE) whose experience hails from Paramount and NPM, joins the podcast to discuss her journey into SRE, the challenges she faced, and the strategies she employed to succeed. Aimee shares her transition from a non-traditional background in JavaScript development to SRE, highlighting the importance of understanding both the programming and infrastructure sides of engineering. She also delves into the complexities of SRE at different scales, the role of playbooks in incident management, and the balance between speed and quality in software development.Aimee discusses the impact of AI and machine learning on SRE, emphasizing the need for responsible use of these tools. She touches on the importance of understanding business needs and how it affects decision-making in SRE roles. The conversation also covers the trade-offs in system design, the challenges of scaling applications, and the importance of resilience in distributed systems. Aimee provides valuable insights into the pros and cons of a career in SRE, including the importance of self-care and the satisfaction of mentoring others.The episode concludes with us discussing some of the hard problems such as the on-call burden for large teams, and the technical expertise an org needs to maintain higher complexity systems. Is the average tenure in tech decreasing, we discuss it and do a deep dive on the consequences in the SRE world.PicksThe Adventures In DevOps: SurveyWarren's Technical BlogWarren: The Fifth Discipline by Peter SengeAimee: Sleep Token (Band) - Caramel, GraniteWill: The Bear Grylls Celebrity Hunt on NetflixJillian: Horizon Zero Dawn Video Game

Jun 14, 2025 • 1h 5min

Exploring MCP Servers and Agent Interactions with Gil Feig

Gil Feig, Co-founder and CTO of Merge, discusses the transformative role of MCP (Machine Control Protocol) servers in API interactions, emphasizing their efficiency and security benefits. He highlights real-world challenges and the necessity of thorough testing. The conversation delves into the delicate balance between innovation and stability in tech. Additionally, they explore the historical intricacies of watchmaking and the fascinating world of nuclear safety, blending technical insights with engaging anecdotes.

Jun 9, 2025 • 1h 1min

No Lag: Building the Future of High-Performance Cloud with Nathan Goulding

Warren talks with Nathan Goulding, SVP of Engineering at Vultr, about what it actually takes to run a high-performance cloud platform. They cover everything from global game server latency and hybrid models to bare metal provisioning and the power/cooling constraints that come with modern GPU clusters.The discussion gets into real-world deployment challenges like scaling across 32 data centers, edge use cases that actually matter, and how to design systems for location-sensitive customers—whether that’s due to regulation or performance. Additionally, there's talk about where the hyperscalers have overcomplicated pricing and where simplicity in a flatter pricing model and optimized defaults are better for everyone.There’s a section on nuclear energy (yes, really), including SMRs, power procurement, and what it means to keep scaling compute with limited resources. If you're wondering whether your app actually needs high-performance compute or just better visibility into your costs, this is the episode.PicksThe Adventures In DevOps: SurveyWarren: Jetlag: The GameNathan: Money Heist (La Casa de Papel)

Jun 4, 2025 • 53min

Ground Truth & Guided Journeys: Rethinking Data for AI with Inna Tokarev Sela

Inna Tokarev Sela, CEO and founder of Illumex, joins the crew to break down what it really means to make your data “AI-ready.” This isn’t just about clean tables—it’s about semantic fabric, business ontologies, and grounding agents in your company's context to prevent the dreaded LLM hallucination. We dive into how modern enterprises just cannot build a single source of truth, not matter how hard they try. All the while knowing that it's required to build effected agents utilizing the available knowledge graphs and.The conversation unpacks democratizing data access and avoiding analytics anarchy. Inna explains how automation and graph modeling are used to extract semantic meaning from disconnected data stores, and how to resolve conflicting definitions. And yes, Warren finally coughs up what's so wrong with most dashboards.Lastly, we quickly get to the core philosophical questions of agentic systems and AGI, including why intuition is the real differentiator between humans and machines. Plus: storage cost regrets, spiritual journeys disguised as inference pipelines, and a very healthy fear of subscription-based sleep wearables.PicksThe Adventures In DevOps: SurveyWarren: The Non-Computability of IntuitionWill: The Arc BrowserInna: Healthy GenAI skepticism

May 29, 2025 • 1h 10min

Incident Vibing: The Self-Healing System - DevOps 242

Sylvain Kalache, Head of Developer Relations at Rootly joins us to explore the new frontier of incident response powered by large language models. We dive into the evolution of DevRel and how we meet the new challenges impacting our systems.We explore Sylvain's origin story in self-healing systems, dating back to his SlideShare and LinkedIn days. From ingesting logs via Fluentd to building early ML-driven RCA tools, he shares a vision of self-healing infrastructure that targets root causes rather than just restarting boxes. Plus, we trace the historical arc of deterministic and non-deterministic tools.The conversation shifts toward real-world applications, where we're combining logs, metrics, transcripts, and postmortems to give SREs superpowers. We get tactical on integrating LLMs, why fine-tuning isn't always worth it, and how the Model Context Protocol (MCP) could be the USB of AI ops, but how it is still insecure. We wrap by facing the harsh reality of "incident vibing" in a world increasingly built by prompts, not people—and how to prepare for it.PicksWarren: There is no AI RevolutionSylvain: Incident Vibing and Rootly Labs SRE event on April 24th

May 22, 2025 • 1h 16min

Decentralized Chaos: Web3 Infra, NodeOps, and the Art of Blockchain Load Balancing - DevOps 241

This week, Paul Marston from Ankr joins the crew to unpack the madness that is modern blockchain infrastructure. From his wild career transition out of financial services into 24/7 node ops for Web3, Paul shares the brutal truth about uptime expectations, decentralization challenges, and why hard forks are more like enterprise schema upgrades with a community twist. If you’ve ever wondered why managing a blockchain node is like owning a temperamental pet server, this one’s for you.The team goes deep on the nitty-gritty of load balancing across dozens of chains, explaining why routing traffic to the “wrong” archive node could ruin your day—and how Ankr’s custom load balancer is basically magic for JSON-RPC calls. Warren tosses out wild scenarios about encrypted data smuggling via blockchain, while Will confesses his angry typing habit (yes, it’s back). The discussion gets even more fun with debates on innovation vs. rigor, Web2's forgotten best practices, and why testing in prod might not be such a dirty word after all.But don’t think it’s all crypto and code. Paul shares battle-won wisdom from running over 100 chains across bare metal, giving us a peek at the operational sophistication and automation involved. From Terraform templates to Docker configs, he walks through the process of onboarding new chains and tuning for performance. The episode also touches on emerging risks like data exfiltration via public blockchains, and why AI (used wisely) might just be the sidekick DevOps always needed.And of course memes, we talk a bit about this one: Tree Swing Product DevelopmentPicksWarren: Dvorak Keyboard Setup and Logitech K295Will: Quirky Record Player from MiniotPaul: Super Whisper - Voice Transcription Tool

May 15, 2025 • 1h 21min

Observability in the CI/CD Pipeline with Adriana Villela - DevOps 240

In this episode, Will and Warren welcome Adriana Villela — CNCF ambassador, Dynatrace advocate, and host of the Geeking Out podcast — for a wide-ranging conversation on observability in CI/CD pipelines. Adriana shares her journey from “On Call Me Maybe” to her own podcast, her work with OpenTelemetry, and why observability isn’t just for SREs anymore.The crew digs into how telemetry should be integrated across the software development lifecycle — from development to QA to production — and what that really looks like in modern teams. Adriana drops knowledge on CI/CD failures, distributed traces, and even how to bring observability to other parts of the business like recruiting and onboarding. She also explains how she got involved in the OpenTelemetry end-user SIG and what’s next for the observability movement.Things get persona as we trade war stories about SVN, terrible version control systems, reusable grocery bags, and the ethics of AI log parsers. Adriana closes with a powerful take: observability is a team sport, and the better we play it, the more effective — and environmentally conscious — our systems can become.PicksWarren: Adventures In DevOps survey - How can we make it better for you?Adriana: Bouldering — she recommends it both as a physical activity and a therapeutic mental reset, especially when travelingJillian: Expeditionary ForceWill: Iron Neck and Purpose & Prophet

May 8, 2025 • 51min

Building Engineering Excellence with Ganesh Datta of Cortex - DevOps 239

In this episode, I (flying solo today!) sat down with Ganesh Datta, the CTO and co-founder of Cortex, to explore what it really means to drive engineering excellence at scale. And spoiler: it’s not just about better dashboards or fancy developer tools—it’s about treating software development like the competitive advantage it is.We went deep into the why behind internal developer portals (IDPs) and how they’re transforming platform engineering, developer experience, and organizational maturity. Ganesh shares how Cortex came to life—from being paged at 2am for a mystery Game of Thrones-named microservice (yep, we've all been there), to realizing that every other business function had a system of record—except engineering.Key Takeaways:IDPs are like CRMs for Engineering: Just as sales teams wouldn’t function without a CRM, modern engineering orgs shouldn’t be flying blind without a structured, centralized developer portal.Engineering Excellence = Business Outcomes: Whether it’s reliability, security, or platform efficiency, IDPs help codify best practices and align teams toward measurable goals.Start Small to Win Big: You don’t need to overhaul everything on day one. Start with a pain point you already know—like production readiness—and improve that incrementally.SREs and Platform Engineers Love IDPs: Because it gives them the data, ownership visibility, and real-time checks they need, without the honor-system chaos.Developer Experience is Just the Beginning: Tools like Cortex aren’t just about dev productivity—they’re about creating resilient, aligned, scalable engineering orgs.We also geeked out about everything from naming services (“Brewer” for a feature extraction tool? Chef’s kiss.) to the surprising power of reading 15 minutes before bed to improve sleep quality—yep, we went there!If you’re part of an engineering team (or leading one) and want to know how to move faster and smarter, this is the episode for you.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner