

#726: Single region, zero excuses: Mastering AWS resilience
Jun 23, 2025
AWS experts Tarik Makota and John Formento dive into the crucial concept of single-region resilience. They debunk myths about multi-AZ setups, emphasizing that they alone aren't sufficient. The duo sheds light on essential AWS services for ensuring application robustness and offers actionable advice on spotting and addressing potential failure modes. Listeners will also learn about the importance of proactive design and strategic trade-offs necessary for achieving high availability and durability in their cloud architectures.
AI Snips
Chapters
Transcript
Episode notes
AWS Fault Isolation Layers
- AWS regions are like bulkheads isolating failures, protecting other parts of the system from impairments.
- Availability Zones act as internal bulkheads, creating additional layers of fault isolation within a region.
Misconceptions About Resilience
- Multi-region doesn't always equal higher resilience; synchronous replication can introduce failure dependencies.
- Misusing availability zones by crossing boundaries without handling impairments reduces application availability.
Test Recovery Procedures Regularly
- Separate recovery operations from normal operations in critical applications.
- Regularly test recovery procedures to avoid failures caused by untested recovery paths.