Anish Agarwal and Raj Agrawal, co-founders of Traversal, are transforming how enterprises handle critical system failures. Their AI agents can perform root cause analysis in 2-4 minutes instead of the hours typically spent by teams of engineers scrambling in Slack channels. Drawing from their academic research in causal inference and gene regulatory networks, they’ve built agents that systematically traverse complex dependency maps to identify the smoking gun logs and problematic code changes. As AI-generated code becomes more prevalent, Traversal addresses a growing challenge: debugging systems where humans didn’t write the original code, making AI-powered troubleshooting essential for maintaining reliable software at scale.
Hosted by Sonya Huang and Bogomil Balkansky, Sequoia Capital
Mentioned in this episode:
-
-
SRE: Site reliability engineering. The function within engineering teams that monitors and improves the availability and performance of software systems and services.
-
-
Golden signals: four key metrics used by Site Reliability Engineers (SREs) to monitor the health and performance of IT systems: latency, traffic, errors and saturation.
-
MELT data: Metrics, events, log, and traces. A framework for observability.
-
The Bitter Lesson: Another mention of Nobel Prize winner Rich Sutton’s influential post.