

PurePerformance
PurePerformance
The brutal truth about digital performance engineering and operations.Andreas (aka Andi) Grabner and Brian Wilson are veterans of the digital performance world. Combined they have seen too many applications not scaling and performing up to expectations. With more rapid deployment models made possible through continuous delivery and a mentality shift sparked by DevOps they feel it’s time to share their stories. In each episode, they and their guests discuss different topics concerning performance, ranging from common performance problems for specific technology platforms to best practices in development, testing, deploying and monitoring software performance and user experience. Be prepared to learn a lot about metrics.Andi & Brian both work at Dynatrace, where they get to witness more real world customer performance issues than they can TPS report at.
Episodes
Mentioned books

Apr 28, 2025 • 47min
Run Towards the Fire: Why we should love incidents with Lisa Karlin Curtis
Do you plan for incidents? Do you have a time / cost budget for it in your sprint or quarterly planning? Do you have engineers that are "interruptible"?We discussed those and more questions with Lisa Karlin Curtis, Founding Engineer at incident.io who teaches us why we need to think differently about dealing with incidents!In our discussion we learn why modern incident management embraces more incidents that are publicly shared within an organization to foster learning. We learn about how to train more people to become incident responders, how to triage and categorize incidents, how to better plan for them and how to best report on themWe also touch on AI - and how AI-generated code will eventually result in more Incidents which we should use as an opportunity to learn and improve our engineering processP.S: This was our 10th-anniversary podcast episode!!Here the links we discussed in the podcast:Lisa's LinkedIn: https://www.linkedin.com/in/lisa-karlin-curtis-a4563920/Her talk at ELC Prague: https://docs.google.com/presentation/d/18536WBHBcPEppEeXXP7o5UQOX2XfWoGmfds2CHegHq4/edit?slide=id.g3434e0cba65_0_0#slide=id.g3434e0cba65_0_0Incident Playbook: https://incident.io/guide

10 snips
Apr 14, 2025 • 47min
MCPs (Model Context Protocol) are not that magic, but they enable magic things with Dana Harrison
Dana Harrison, a Staff Site Reliability Engineer at Telus, shares insights on Model Context Protocols (MCPs) and their transformative potential for engineers. He explains how MCPs enhance API interactions, streamlining data retrieval and increasing efficiency. The discussion touches on the critical differences between local and remote MCPs, resilience in API connections, and the importance of observing interactions for better management. With a focus on rapidly evolving software development, Dana sheds light on the exciting challenges and opportunities faced in the AI landscape.

Mar 31, 2025 • 56min
The History & Power of Distributed Tracing with Christoph Neumueller & Thomas Rothschaedl
So you think Distributed Tracing is the new thing? Well - its not! But its never been as exciting as today!In this episode we combine 50 years of Distributed Tracing experience across our guests and hosts. We invited Christoph Neumueller and Thomas Rothschaedl who have seen the early days of agent-based instrumentation, how global standards like the W3C Trace Context allowed tracing to connect large enterprise systems and how OpenTelemetry is commoditizing data collection across all tech stacks.Tune in and learn about the difference between spans and traces, why collecting the data is only part of the story, how to combat the challenge when dealing with too much data and how traces relate and connect to logs, metrics and events.Links we discussedYouTube with Christoph: LINK WILL FOLLOW ONCE VIDEO IS POSTEDChristoph's LinkedIn: https://www.linkedin.com/in/christophneumueller/Thomas's LinkedIn: https://www.linkedin.com/in/rothschaedl/

Mar 17, 2025 • 59min
An Inside Look into Platform Engineering for Architects with the authors Max, Hilliary & Andi
Max Körbächer, founder of Liquid Reply and co-author of 'Platform Engineering for Architects', and Hilliary Lipsig, Senior Principal SRE at Red Hat, delve into the world of platform engineering. They discuss the importance of building user-centered platforms that evolve with feedback from engineering teams. The conversation highlights the cyclical nature of technology trends and the significance of timeless content in a fast-paced industry. Expect insights on decision-making strategies, managing technical debt, and the thrill of sharing their collaborative writing journey.

Mar 3, 2025 • 39min
How CERN analyzed 1 PetaByte per second using K8s with Ricardo Rocha
One PetaByte is the equivalent of 11000 4k movies. And CERN's Large Hadron Collider (LHC) generates this every single second. Only a fraction of this data (~1 GB/s) is stored and analyzed using a multicluster batch job dispatcher with Kueue running on Kubernetes. In this episode we have Ricardo Rocha, Platform Engineering Lead at CERN and CNCF Advocate, explaining why after 20 years at CERN he is still excited about the work he and his colleagues at CERN are doing. To kick things off we learn about the impact that the CNCF has on the scientific community, how to best balance an implementation of that scale between "easy of use" vs "optimized for throughput". Tune in and learn about custom hardware being built 20 years ago and how the advent of the latest chip generation has impacted the evolution of data scientists around the globeLinks we discussedRicardo's LinkedIn: https://www.linkedin.com/in/ricardo-rocha-739aa718/KubeCon SLC Keynote: https://www.youtube.com/watch?v=xMmskWIlktA&list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&index=5Kueue CNCF Project: https://kubernetes.io/blog/2022/10/04/introducing-kueue/

Feb 17, 2025 • 51min
Why Compliance is Important and not Boring with Michiel de Lepper
Michiel de Lepper, a seasoned Security and Compliance expert with experience at McAfee and Dynatrace, shares his insights on compliance's vital role in IT security. He redefines compliance from being boring to a dynamic necessity, integrated into modern tech practices. Michiel emphasizes using data to enhance security and discusses the collaboration between SecOps and DevOps for better outcomes. With a humorous nod to nostalgia, he reveals how compliance can be both exciting and essential, debunking myths surrounding mandatory training and audits.

Feb 3, 2025 • 51min
What's next for Feature Flagging and OpenFeature with Ben Rometsch
Ben Rometsch, Co-founder of Flagsmith and a leading voice in feature flagging, shares his insights on the evolution and future of this essential development practice. He dives into the concept of Feature Flag-Driven Development and the significance of accurate record-keeping for compliance. The conversation highlights the role of the CNCF project OpenFeature in shaping best practices and discusses how AI could revolutionize feature flag management. Ben also touches on the challenges faced by smaller teams and the necessity of effective implementation.

Jan 20, 2025 • 48min
Observability Predictions 2025 Under the Covers with Bernd Greifeneder
Bernd Greifeneder, Founder and CTO of Dynatrace, has been a trailblazer in observability for over 20 years. He shares insights on the shift from reactive to preventive operations in observability, stressing the key role of AI. Greifeneder discusses the vital convergence of observability and security, especially with new regulations like DORA. He also highlights the importance of automation for sustainability and the evolving landscape of AIOps, aiming to help organizations maximize efficiency while addressing environmental concerns.

Jan 6, 2025 • 51min
From Infra to Services to Happy End Users: The role of SLOs at Uber with Vishnu Acharya
Vishnu Acharya, Head of Network Infrastructure EMEA and Platform Engineering at Uber, shares his remarkable journey from eBay and Yahoo to a decade at Uber. He discusses scaling the company to 4,000 engineers and the crucial role of Service Level Objectives (SLOs) in ensuring platform reliability. Insights on navigating complex partnerships, optimizing cloud provider relationships, and the importance of understanding end-user journeys are highlighted. Discover how observability enhances innovation and capacity in Uber's fast-paced tech landscape.

Dec 23, 2024 • 54min
The Road to OpenTelemetry Adoption at Booking with Anton Timofieiev
For the past 10 years Anton has been working at Booking.com - one of the leading digital travel companies based out of Amsterdam. The journey that started as System Administrator has led Anton to be an Engineering Manager for Site Reliability where over the past 3 years he led the rollout and adoption of OpenTelemetry as the standard for getting observability into new cloud native deployments.Tune in and learn how Anton saw R&D grow from 300 to 2000, why they replaced their home-grown Perl-based Observability Framework with OpenTelemetry, how they tackle adoption challenges and how they extend and contribute back to the open source communityLinks we discussed:Anton's LinkedIn Profile: https://www.linkedin.com/in/antontimofieiev/Observability & SRE Summit: https://www.iqpc.com/events-observability-sre-summit/speakers/anton-timofieievOpenTelemetry: https://opentelemetry.io/