
Screaming in the Cloud Is It Broken Everywhere or Just for Me with Omri Sass
Jan 22, 2026
Omri Sass, Director of Product Management at Datadog, delves into the innovative updog.ai, a tool revolutionizing outage detection using real-time data. He explains the significance of distinguishing between local issues and global outages at crucial times, like 3 AM. Omri discusses the challenges of synthetic testing and the importance of aggregate telemetry in spotting provider problems. He also shares insights on industry reactions to outage trackers and the engineering hurdles faced while building updog.ai, illustrating the ongoing evolution of cloud service monitoring.
AI Snips
Chapters
Transcript
Episode notes
Real-User Telemetry Beats Synthetic Tests
- Updog uses telemetry from real users rather than synthetic tests or user reports to detect service outages.
- Aggregating customer telemetry reveals when common SaaS providers are actually down across many users.
First Check If The Cloud Is At Fault
- When your site fails, first determine if the issue is your code or a major provider outage.
- If it's a provider outage, avoid risky local code changes that could make recovery harder.
Asymmetry Of Outage Data
- Outages vary widely across regions and customers; aggregate telemetry helps map that asymmetry.
- Datadog can combine many customers' signals to produce a clearer picture than any one user.
