Reliability Enablers cover image

Reliability Enablers

Latest episodes

undefined
Mar 26, 2024 • 23min

#34 From Cloud to Concrete: Should You Return to On-Prem?

This episode continues our coverage of Chapter 2 of the Site Reliability Engineering book (2016). We talk about the age-old debate of cloud vs on-prem, which is analogous to that other debate we have in the technology of build vs buy. Here are key takeaways from our conversation: Adapt your storage solutions to business needs: Understand the diverse storage options available and tailor them to specific business needs, considering factors like data type, access patterns, and scalability requirements. Optimize your load balancing: Implement global load balancing strategies to optimize user experience and performance by directing traffic to the nearest data center to minimize latency, and maximize resource utilization. Don't hesitate to continuously evaluate your cloud: Assess the suitability of cloud solutions against your organization's needs, considering factors like cost, control, scalability, and security, and be open to reevaluating decisions based on evolving requirements. Make strategic decisions for your operations footprint: Lean on decisions based on thorough analysis that considers: Encourage objective evaluation and formal planning processes in decision-making: avoid emotional reactions or being swayed by external influences, to ensure decisions are based on sound analysis and truly aligned with organizational goals. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Mar 19, 2024 • 23min

#33 Inside Google's Data Center Design

This episode covers Chapter 2 of the Site Reliability Engineering book (2016). In this first part, we talk about the intricacies of data center design outlined in the book. One thing is for sure. Building a data center for your own needs is HARD work with many considerations you must make.Here are key takeaways from our conversation: Importance of understanding data center fundamentals: Even if you're not operating at the scale of companies like Google, understanding the fundamentals behind data center infrastructure can help. This knowledge can inform decisions on cloud services, high availability strategies, and the architectural design of systems to ensure resilience and scalability. The impetus to leverage cloud infrastructure: The transition from traditional on-premises infrastructure to cloud-based solutions is a critical trend. Organizations can learn from how tech giants manage resources efficiently at scale, to improve their resource allocation. Cyclical trends in technology adoption: trends in technology are cyclical and that can inform strategic decisions. As there's a current discussion around moving from cloud-centric models back to more traditional data center approaches, understanding the history and evolution of tech infrastructure can prepare organizations to adapt to and anticipate future shifts in the technological landscape. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Mar 14, 2024 • 17min

#32 Clarifying Platform Engineering's Role (with Ajay Chankramath) BONUS EP

Will Platform Engineering replace DevOps or SRE or both? I don’t think this is the case at all. Neither does Ajay Chankramath.He is the Head of Platform Engineering at ThoughtWorks North America, an innovator consulting group. I’d take his word for it since he’s held senior leadership roles in release engineering and more since 2002.In this bonus episode of the SREpath podcast, Ajay shared his perspective on the debate about SRE vs DevOps vs Platform Engineering. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Mar 12, 2024 • 27min

#31 Introduction to FinOps (with Ajay Chankramath)

FinOps is on the tip of many tongues in the software space right now, as we try to curb our cloud costs. Ajay Chankramath has given talks on FinOps at conferences like the DevOps Enterprise Summit (DOES) among others.He is the Head of Platform Engineering at ThoughtWorks North America, an innovator consulting group. His peers like Martin Fowler and Neal Ford have originated ideas like refactoring, microservices, and more.He shared practical advice for avoiding a harsh, restrictive cost control approach and instead taking a holistic financial view of your software operations. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Mar 7, 2024 • 37min

#30 Clearing Delusions in Observability (with David Caudill)

Observability is going through interesting times. David Caudill believes that delusions are getting in the way of our success in this area. He's a senior engineering manager at Capital One, a US-based bank. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Feb 27, 2024 • 31min

#29 - Reacting to Google's SRE book 2016 (Chapter 1 Part 2)

Sebastian and I continue our breakdown of notable passages from Chapter 1 of Google's Site Reliability Engineering (2016) book by Betsy Beyer, Jennifer Pettof, Niall Murphy, et al. We covered passages like: Monitoring is one of the primary means by which service owners keep track of a system's health and availability. Efficient use of resources is important anytime a service cares about money. Humans add latency, even if a given system experiences more actual failures. A system that can avoid emergencies that require human intervention will have higher availability than a system that requires hands on intervention. SRE has found that roughly, 70 percent of outages are due to changes in a live system. Best practices in this domain use automation to accomplish implementing progressive rollouts. Demand forecasting and capacity planning can be viewed as ensuring that there is sufficient capacity and redundancy to serve projected future demand, the required availability. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Feb 20, 2024 • 26min

#28 - Reacting to Google's SRE Book 2016 (Chapter 1 Part 1)

Sebastian and I got together to react to and discuss 5 passages from Chapter 1 of Google's Site Reliability Engineering book (2016) by Betsy Beyer, Jennifer Pettof, Niall Murphy, et al. We covered passages like: The sysadmin approach and the accompanying development ops split have a number of disadvantages and pitfalls Google has chosen to run our systems with a different approach. Our Site Reliability Engineering teams focus on hiring software engineers to run our products The term DevOps emerged in industry. One could equivalently view SRE as a specific implementation of DevOps with some idiosyncratic extensions. Google caps operational work for SREs at 50 percent of their time. Their remaining time should be spent using their coding skills on project work. Product development and SRE teams can enjoy a productive working relationship by eliminating the structural conflict in their respective goals. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Feb 13, 2024 • 16min

#27 - Growing as a Site Reliability Engineer (Part 3)

Third and final instalment of the Growing as an SRE series covering practical ideas for planning your career progression This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Feb 8, 2024 • 19min

#26 - Growing as a Site Reliability Engineer (Part 2)

In part 1, we covered the first truth - that you don't grow in your career merely through tenure. That was a simple one.  Let's explore 2 more truths that are somewhat trickier...Background music credit: Luna by KaizanBlue This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
undefined
Jan 30, 2024 • 38min

#25 - DORA and the Pursuit of Engineering Excellence (with Tim Wheeler)

DORA metrics are a hot topic among technology executives in all kinds of enterprise. But there's more to engineering culture than solely relying on the numbers it goes you. We have a rare treat for you because Ash got Tim Wheeler on the pod. He doesn't do much of social media or podcast episodes. Tim is Director of Engineering Excellence at SquaredUp where he follows the DORA metrics but emphasizes starting conversations around them rather than setting directives. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode