AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The common refrain after an incident is “We could and should learn from this”.
To me, that alludes to the need for a robust learning culture.
We might think we already have a good learning culture because we talk about problems and deep-dive them into retrospectives.
But how often do we explore the nuances of how we are learning?
Sorrel Harriet is an expert in supporting software engineering teams to develop a stronger learning culture. She was a “Continuous Learning Lead” at Armakuni (software consultancy) and now does the same work under her own banner.
Her work ties in well with the ideas shared by Manuel Pais in episode #45 about how enabling teams can support a continuous learning culture.
We tackled issues like the value of certifications, comparing technical with non-technical skills, and more.
You can connect with Sorrel via LinkedIn
Learn more about what Sorrel does via LaaS.consulting
Here’s a bonus section because you read all this way. It covers 5 public outages and how the affected teams could improve their learning culture:
1. Slack Outage (February 2023)
Slack experienced a global outage disrupting communication for hours due to backend infrastructure issues. Perhaps the team could focus their learning on more robust infrastructure management and resilience improvement.
2. Twitter Algorithm Glitch (April 2023)
A glitch in Twitter's algorithm caused timeline issues, stemming from a problematic software update. Perhaps the team could focus their learning on thorough testing and game days to rectify critical system errors swiftly.
3. Microsoft Azure AD Outage (March 2023)
Azure Active Directory faced a significant outage due to an internal configuration change. Perhaps the team could focus their learning on the importance of rigorous change management and how to address misconfigurations quickly.
4. Google Cloud Platform Networking Issue (May 2023)
Google Cloud Platform experienced widespread service disruptions from a software bug in its networking infrastructure. Perhaps the team could focus their learning on the need for comprehensive testing and preventing disruptions.
5. GitHub Outage (June 2023)
GitHub suffered a major outage caused by a cascading failure in its storage infrastructure. Perhaps the team could focus their learning on robust fault-tolerance mechanisms and ways to address the root causes of failures.