Colin Douch and Jayson Cena from Cloudflare share insights on their transition from SysLog-NG to OpenTelemetry, focusing on the need for scalability and maintainability. They discuss challenges in processing millions of logs per second while ensuring uninterrupted customer service. Their conversation covers the complexities of logging infrastructure, including performance optimization and the trade-offs between Protocol Buffers and JSON. They also touch on the importance of redundancy and team empowerment in enhancing system reliability.
Cloudflare's migration to OpenTelemetry significantly enhances log observability, scalability, and team collaboration through a modern, Go-based architecture.
Meticulous planning for system changes at Cloudflare, including phased deployment, is critical to avoid disruption of customer services during transitions.
Deep dives
Importance of Data Security in Diverse Environments
Ensuring the security of company data is a critical challenge, especially when users operate on unmanaged devices and applications. Traditional identity and access management (IAM) solutions often fall short in these scenarios, prompting the need for enhanced security measures. OnePassword's Extended Access Management offers a robust solution for this issue by providing comprehensive control over user sign-ins across all devices and applications. This approach not only addresses security concerns but also adapts to the realities of modern work environments, where flexibility and accessibility are paramount.
Cloudflare's Logging Transformation
Cloudflare has successfully migrated its logging infrastructure from syslog-ng to OpenTelemetry, allowing it to manage millions of logs per second across a vast global network. This switch significantly improves the observability of their systems and enhances the overall reliability of log processing. The transition tackled age-old issues with syslog-ng, particularly its dependence on C, which posed challenges in terms of memory safety and ease of management. By adopting OpenTelemetry, Cloudflare also benefits from a more modern, Go-based architecture that allows for improved scalability and easier contributions from a broader team of engineers.
Implementing Changes at Scale
When deploying system changes at Cloudflare's extensive scale, meticulous planning and execution are crucial to avoid any disruption to customer services. The deployment of OpenTelemetry was carried out in a phased manner, starting with internal sites, allowing the team to monitor performance metrics before wider implementation. This strategic approach enabled the detection of issues early on, preventing potential impact on customer traffic during the transition. Enhancing deployment processes by using configuration management tooling helped streamline these changes while minimizing the risk of downtime.
Observability versus Monitoring: The Key Distinction
Observability is a critical aspect that extends beyond traditional monitoring, focusing on how the collected data can be utilized to answer unforeseen questions about system performance. This shift emphasizes the need for visibility into systems that allows teams to identify and troubleshoot issues effectively without prior knowledge of specific metrics. The migration to OpenTelemetry has empowered Cloudflare to produce an increased number of metrics, enhancing the team's capacity to debug and ensure smoother operations. This paradigm shift not only helps the observability team create a more robust infrastructure but also facilitates contributions from various engineering teams, driving innovation and improvement.
Cloudflare’s transition from SysLog-NG to OpenTelemetry is the topic of discussion on this episode of Day Two DevOps. Guests Colin Douch and Jayson Cena from Cloudflare explain the reasons behind the migration, including the need for better scalability, memory safety, and maintainability. They delve into challenges such as ensuring uninterrupted customer traffic and optimizing performance.... Read more »
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode