Software Misadventures cover image

Software Misadventures

Latest episodes

undefined
Feb 6, 2021 • 1h 3min

Ryan Underwood - On debugging the Linux kernel - #4

Ryan Underwood is a Staff SRE and tech lead on the Helix and Zookeeper SRE team at LinkedIn. Prior to LinkedIn, he was an SRE at Machine Zone and Google. Apart from his regular responsibilities, Ryan’s interest and expertise include debugging production kernel, I/O and containerization issues. His opinion about not treating software as a black box and his persistent approach to debugging complex problems are truly inspiring.   On several occasions, Ryan’s colleagues have leaned on him to solve an esoteric problem that everyone thought was insurmountable. Our main focus today is one such problem that Ryan and team ran into while upgrading machines to 4.x kernel that resulted in elevated 99th percentile latencies. We dive into what the problem was, how it was identified and how it was fixed. We discuss some of the tools and practices that are helpful in debugging system performance issues. And we also talk about Ryan’s background and how his curiosity landed him a career in Site Reliability Engineering. Please enjoy this deeply technical and highly educational conversation with Ryan Underwood. Website link: https://softwaremisadventures.com/ryan   Music Credits: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en
undefined
Jan 23, 2021 • 58min

David Henke - On building a culture of "Site Up" at LinkedIn and Yahoo! - #3

David is LinkedIn’s former SVP of Engineering and Operations. He came out of retirement to join LinkedIn in 2009 during a time of rapid growth. After 4 years at LinkedIn, he retired in 2013.  Throughout his career, David has been in multiple leadership positions and has been recognized as one of the best Operations Executives. This was an extremely fascinating conversation. David shares insightful stories from early days at LinkedIn and what it took to develop the culture of “Site Up and Secure”. He shares one of the most severe outages he has experienced in his career - this one was at Yahoo!, which he calls the 10g massacre. We talk about David’s 3 retirements throughout his career, his advice on developing operational excellence and lessons on being an effective leader. Throughout this conversation you’ll also hear various nuggets of wisdom from David, better known as Henkeisms. Please enjoy this highly entertaining and deeply insightful conversation with David Henke. Website link: https://softwaremisadventures.com/henke   Music Credits: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en  
undefined
Jan 6, 2021 • 46min

Julia Evans - On kubernetes scheduler bugs, TCP performance regressions and debugging tips - #2

In this episode, we speak with Julia Evans. Julia runs a programming zines business, called Wizard Zines (https://wizardzines.com/), where she creates comics about various programming concepts. She has been creating zines, when she was still a software engineer at Stripe. Her zines are extremely approachable and highly educational. In addition to creating zines, Julia is a prolific blogger and has around 500 posts on her blog at jvns.ca. Her blogs are another great source to learn about fundamental programming concepts.  We had a lot of fun speaking with Julia for this episode. We discuss two bugs she came across at Stripe. We talk about how she identified and fixed a bug in Kubernetes Scheduler and how her understanding of TCP helped her fix a performance regression. We also cover other topics like blogging, zines, debugging and learning new things. Please enjoy this fun conversation with the amazing Julia Evans! Website link: https://softwaremisadventures.com/julia  Links: https://jvns.ca  https://wizardzines.com/ https://twitter.com/b0rk  https://github.com/jvns    Music Credits: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en
undefined
Dec 4, 2020 • 1h 1min

Kelsey Hightower - On ways kubernetes can break, being an effective leader and much more - #1

Principal Developer Advocate at Google, Kelsey Hightower, discusses Kubernetes breaking in production, leadership tips, and demystifying complex topics in an engaging conversation on the podcast.
undefined
Nov 28, 2020 • 4min

Introducing Software Misadventures Podcast - #0

In this episode, Ronak, Austin and Guang share the origin story - who they are, what this podcast is about and why they are doing this.  They've seen first hand how stressful it is when something breaks in production but also found it to be the best opportunity to learn about a system more deeply. They started this podcast to have in-depth conversations with software and devops experts and hear their stories from the trenches about how software breaks in production. In upcoming conversations, they discuss the principles and practical tips to build resilient software as well as advice to grow as technical leaders. Learn more at https://softwaremisadventures.com. Music Credits: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner