Code Story: Insights from Startup Tech Leaders

The Next Iteration of Gremlin – with Kolton Andrus, CEO

Jul 23, 2025
Kolton Andrus, CEO and Founder of Gremlin, shares insights from his experience at Amazon and Netflix, emphasizing the significance of chaos engineering for system reliability. He discusses Gremlin's evolution and the launch of Gremlin 3.0, designed to enhance user experience by identifying system failures. Kolton also delves into the cautionary role of AI in chaos engineering and shares advice for aspiring founders, highlighting the importance of self-trust in the startup journey. Exciting times lie ahead for Gremlin and the broader SRE community!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Kolton's Early Chaos Engineering Story

  • Kolton Andrus built an early platform for fault injection at Amazon to proactively test system failures.
  • This platform was used by hundreds of teams and showed significant success in preventing outages.
INSIGHT

From Chaos to Reliability Management

  • The first arc of Gremlin was about making chaos engineering accessible but results were very patchy across teams.
  • Adding reliability management with scoring, test suites, and dependency discovery helped standardize and scale these efforts.
INSIGHT

Gremlin 3.0 Focuses on Fixing Failures

  • Gremlin's next iteration aims to not only detect system failures but also guide engineers on how to fix them.
  • The product will provide actionable recommendations based on test results and customer data patterns.
Get the Snipd Podcast app to discover more snips from this episode
Get the app