

The Next Iteration of Gremlin – with Kolton Andrus, CEO
Jul 23, 2025
Kolton Andrus, CEO and Founder of Gremlin, shares insights from his experience at Amazon and Netflix, emphasizing the significance of chaos engineering for system reliability. He discusses Gremlin's evolution and the launch of Gremlin 3.0, designed to enhance user experience by identifying system failures. Kolton also delves into the cautionary role of AI in chaos engineering and shares advice for aspiring founders, highlighting the importance of self-trust in the startup journey. Exciting times lie ahead for Gremlin and the broader SRE community!
AI Snips
Chapters
Transcript
Episode notes
Kolton's Early Chaos Engineering Story
- Kolton Andrus built an early platform for fault injection at Amazon to proactively test system failures.
- This platform was used by hundreds of teams and showed significant success in preventing outages.
From Chaos to Reliability Management
- The first arc of Gremlin was about making chaos engineering accessible but results were very patchy across teams.
- Adding reliability management with scoring, test suites, and dependency discovery helped standardize and scale these efforts.
Gremlin 3.0 Focuses on Fixing Failures
- Gremlin's next iteration aims to not only detect system failures but also guide engineers on how to fix them.
- The product will provide actionable recommendations based on test results and customer data patterns.