Vanessa Huerta Granda, a technology manager passionate about resilience engineering, shares her insights on navigating metrics in incident management. She discusses the challenges of code freezes and the importance of adaptable metrics. Vanessa emphasizes the significance of context when analyzing Mean Time to Recovery (MTTR) and how it can lead to meaningful insights. The conversation also highlights the necessity for better communication between tech teams and executives to ensure effective decision-making based on accurate data.
The podcast highlights the ongoing debate on code freezes in software development, emphasizing their potential impact on both stability and productivity.
Effective presentation of metrics on resilience dashboards is crucial, as context around data like MTTR can significantly influence customer experience and decision-making.
Deep dives
Navigating Code Freezes in Software Development
The discussion highlights the ongoing debate surrounding code freezes in software development. While some professionals argue that code freezes are beneficial for stability and focus, many others view them as detrimental to productivity and innovation. The speaker expresses their personal stance against code freezes, indicating that this perspective resonates with a significant portion of the resilience engineering community. This dialogue emphasizes the necessity for further exploration of the advantages and drawbacks of code freezes, encouraging listeners to share their thoughts on the matter.
Metrics that Matter on Resilience Dashboards
The importance of effectively presenting metrics on resilience dashboards is underscored, particularly when management demands specific data. Utilizing metrics such as Mean Time to Resolve (MTTR) and incident counts can provide foundational insights, but they must be contextualized to derive meaningful conclusions. By engaging with various stakeholders across product lines, the discussion reveals that understanding the impact on customer experience is crucial, such as how system failures affect ticket purchases or service accessibility. Furthermore, it is suggested that the metrics displayed on dashboards should evolve to reflect changing business needs and priorities over time.
The Role of Context in Incident Metrics
Context plays a pivotal role in interpreting incident metrics, particularly MTTR, which is often criticized for being misleading without accompanying details. The dialogue proposes that rather than solely focusing on resolution times, it's imperative to analyze customer impact and engineer workload in a way that informs future improvements. By examining incidents holistically and considering environmental factors, teams can derive actionable insights that lead to better process adjustments and incident handling strategies. Ultimately, the conversation advocates for a nuanced understanding of metrics that balances quantitative data with qualitative context, ensuring they serve strategic industry goals.
We answered a set of questions about how to deal with dashboards and MTTR and how to make the best of the situation with the help of special guest Vanessa Huerta Granda.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode