

D2DO252: (Re)Building Cloudflare’s Millions-of-Logs-Per-Second Logging Pipeline
Oct 2, 2024
Colin Douch, Observability Tech Lead at Cloudflare, and Jayson Cena, SRE at Cloudflare, dive into the complexities of migrating from Syslog-NG to OpenTelemetry. They discuss the motivations for this shift, such as scalability and memory safety, while tackling challenges like maintaining uninterrupted customer traffic. The duo also highlights the importance of redundancy in logging systems and shares insights on logging protocols, illustrating the balance between resource usage and operational speed in a high-performance environment.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8
Intro
00:00 • 3min
Navigating Cloudflare's Resilient Network
02:50 • 4min
Navigating the Logging Transition
06:49 • 15min
Navigating Data Formats: The Trade-offs of Logging Protocols
21:30 • 2min
Challenges of Using Protobufs in Logging Systems
23:04 • 2min
Scaling Logging Systems with OpenTelemetry
24:46 • 4min
Navigating Logging Pipeline Upgrades
28:54 • 7min
Importance of Redundancy in Logging Systems
35:30 • 2min