

D2DO252: (Re)Building Cloudflare’s Millions-of-Logs-Per-Second Logging Pipeline
Oct 2, 2024
Colin Douch and Jayson Cena from Cloudflare share insights on their transition from SysLog-NG to OpenTelemetry, focusing on the need for scalability and maintainability. They discuss challenges in processing millions of logs per second while ensuring uninterrupted customer service. Their conversation covers the complexities of logging infrastructure, including performance optimization and the trade-offs between Protocol Buffers and JSON. They also touch on the importance of redundancy and team empowerment in enhancing system reliability.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8
Intro
00:00 • 3min
Cloudflare's Resilient Network
02:50 • 4min
Transitioning to OpenTelemetry in Logging Systems
06:49 • 15min
Navigating OpenTelemetry's Protobuf Challenges
21:30 • 2min
Understanding Protocol Buffers and Efficient Logging
23:04 • 2min
Optimizing Logging Infrastructure
24:45 • 4min
Navigating Logging Pipeline Upgrades
28:52 • 7min
Enhancing Reliability and Engagement in Tech Systems
35:30 • 2min