Dominik Schmidle, a Product Manager at Giant Swarm, dives into the world of cloud-native monitoring and observability. He explains why these concepts are essential for DevOps, shifting focus from traditional monitoring to a more holistic approach. The conversation touches on Grafana 12's exciting new features, including improved alerting and enhanced drill-down capabilities. Dominik also discusses the significance of OpenTelemetry in shaping observability tools and emphasizes the challenges of developing a comprehensive cloud-native observability platform.
45:09
forum Ask episode
web_stories AI Snips
view_agenda Chapters
menu_book Books
auto_awesome Transcript
info_circle Episode notes
insights INSIGHT
Observability vs Monitoring
Observability uses metrics, logs, and traces to provide a comprehensive view of system health.
Monitoring focuses mainly on metrics to understand system health at specific points in time.
volunteer_activism ADVICE
Get Metrics and Logs Right
Use Prometheus to scrape metrics like CPU usage from your app endpoints.
Employ log streams to capture events, adapting your approach based on the data type.
insights INSIGHT
Interpreting Metrics Spikes
Spikes in metrics like CPU usage don't always need action; they can be expected and manageable.
Robust systems scale and self-heal to handle these typical spikes safely.
Get the Snipd Podcast app to discover more snips from this episode
OpenTelemetry pushed stable, turn-key Spring Boot instrumentation and added resource-aware Prometheus export, so labels like k8s.node.name flow straight into Prom dashboards.
Monitoring and Observability at Giant Swarm
Did our first Talk about how we created the Product on "Mastering Observability" Conference!
Topic was Observability Platform @ Scale and which challenges we faced when scaling
Interesting: Most problems when scaling were organizational instead of technical (observability as a product)
ie. one problem: What's actually our product and what's our interface to the customer?
ie. second problem: The Vision of the Observability Platform - where do we want to go with it -> Longterm Roadmap
Now working on the roadmap! What progress?
We worked a lot on our Multi-Tenancy concept based on Grafana Organisations that allows us to onboard a customers marketing team into the observability platform!
We connect GitOps and ClickOps in Grafana - managing a persisted Grafana with GitOps Resources (Dashboards and Orgs) loaded in, so people can create those resources in the way the like best.
Getting AI into the observability platform: directions and reviews!
Next Podcast
AI Event: AI for Infrastructure - How Will AI Change the Life of a Platform Engineer? Giant Swarm CTO, Timo Derstappen will walk us through the event while going over questions from our audience.