
OpenObservability Talks
On OpenObservability Talks we discuss harnessing the power of open source to advance observability initiatives for developers, DevOps and SRE practitioners around the world.
We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and chime in with your comments and questions on the live chat.
https://www.youtube.com/@openobservabilitytalks
You can find us on X (Twitter) @openobserv and BlueSky @openobservability.bsky.social
Latest episodes

Jan 31, 2022 • 58min
SaaS Observability Done Right - OpenObservability Talks S2E08
SaaS (software as a service) is a popular model for many businesses today. SaaS businesses need agility to move fast and remain competitive. This means agility in the software IT stack, but also agility in the business models and product-led growth (PLG). Observability plays a key role in enabling SaaS organizations to move fast.
Achieving this agility, however, raises specific observability requirements. On this episode of OpenObservability Talks we’ll host Aviad Mizrachi, the CTO and Co-Founder of Frontegg, to help us map these requirements. Having escorted dozens of SaaS businesses across many verticals, Aviad brings a wealth of experience in how today’s SaaS is built and operated, and will share his insights and best practices on how to design and build the observability stack right.
Aviad has been a developer for the last 20 years. He held a few management and architecture positions on startups such as Vicon and HTS as well as in larger companies such as NICE and CheckPoint. Today at Frontegg Aviad works closely with many customers to help them build their SaaS solutions.
The episode was live-streamed on YouTube Live and Twitch on 11 Jan 2022 and the video is available at https://www.youtube.com/watch?v=ZcneTMeBPeg
OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube.
We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.https://www.twitch.tv/openobservabilityhttps://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg
Show Notes:
What characteristics in today’s SaaS businesses dictate/influence the tech choices
How are SaaS systems built? Tech stack and architecture
Which observability is needed for SaaS?
Kubernetes & infra observability
Availability, responsiveness, low latency are critical in SaaS
product and business observability
Observability has many stakeholders
Recommended tooling for SaaS
Correlating different data signals
Persistence and the cost of storage
Final tips for SaaS observability
AWS recent outages and learnings
Log4j recent CVEs
Resources:
AWS outages and learnings: https://horovits.medium.com/retrospect-on-the-aws-outage-and-resilient-cloud-based-architecture-cc513a32747
Socials:
Twitter: https://twitter.com/OpenObserv
Twitch: https://www.twitch.tv/openobservability
YouTube: https://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg

Dec 21, 2021 • 1h 2min
Prometheus Pitfalls and the Rise of Continuous Profiling - OpenObservability Talks S2E07
We’ve grown to rely on “the three pillars” for observability - logs, metrics and traces. Popular frameworks such as Prometheus have helped popularize these practices. But now people are starting to realize that it’s not enough.
On this episode Dotan Horovits will host Frederic Branczyk for a discussion about the unspoken pitfalls of Prometheus and the challenges of current observability coverage. We will also discuss the rise of Continuous Profiling as a new observability signal, what it’s about and where it can help. We’ll also review the recent launch of Parca, an open source project for continuous profiling that traces its roots to Red Hat’s internal ConProf open source tool.
Frederic is the founder and CEO of Polar Signals. Before founding Polar Signals he was a senior principal engineer and the main architect for all things Observability at Red Hat, which he joined through the CoreOS acquisition. Frederic is a Prometheus and Thanos maintainer as well as the tech lead for the special interest group for instrumentation in Kubernetes. In a previous life, he was a security researcher working on key management solutions as well as intrusion detection systems. When not working on software Frederic enjoys obsessing over brewing a perfect cup of coffee.
The episode was live-streamed on 16 December 2021 and the video is available at https://www.youtube.com/watch?v=G02g63oI0IA
OpenObservability Talks episodes are released monthly, on the last Thursday of each month. The episodes are also live-streamed on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.
Show Notes:
The limitations of the three pillars model of observability
Prometheus strengths and pitfalls
how to start with continuous profiling
how to correlate between different telemetry
Parca OSS intro
eBPF turned out perfect for instrumenting continuous profiling
Parca OSS future plan
how is the performance penalty of continuous profiling kept low
what's the solution for high cardinality in Prometheus?
will Parca OSS be contributed to an established OSS foundation?
Prometheus Agent mode released
OTEL operator now has an instrumentation CR
continuous profiling support for interpreted languages
Resources:
https://www.parca.dev/
https://github.com/google/pprof
https://increment.com/containers/observing-containers-pillars-of-observability/
https://ebpf.io/
https://research.google/pubs/pub36575/
Social:
Twitter: https://twitter.com/OpenObserv
YouTube: https://www.youtube.com/@openobservabilitytalks

Nov 23, 2021 • 1h 2min
BPF origin story and the future of telemetry analytics OpenObservability Talks S2E06
OpenObservability Talks S2E06: Hosting Steve McCanne
We hear a lot about BPF in the industry today, applying this flexible technology to solve so many problems from routing, proxying, and of course observability. Correlating events and data from the operating system level across distributed systems is a key problem for the industry and community to solve. I am thrilled to announce Steve McCanne joining us for this episode. I have been lucky enough to spend time with Steve in my career and am delighted to have him join us to discuss the origin stories and where these foundational technologies might be applied in the future. Steve’s Bio and background speak for themselves.
Steve McCanne is the "Coding CEO" at Brim, a small startup working on the open-source Zed Project and a new application called "Brim" that leverages Zed. Back in the days before the Web, Steve worked at the Lawrence Berkeley National Laboratory where he developed BPF, libpcap, the PCAP file format, and the tcpdump language and compiler, while also working on the Real-time Transport Protocol (RTP) for Internet video when the telcos claimed that real-time Internet communication was impossible without end-to-end virtual-circuit guarantees. (Guess who was right?) After a brief stint in academia in the late '90s, Steve crossed over to the dark side, became a tech entrepreneur, and never looked back. He has founded several startups and took his '02 company and Sharkfest's sponsor, Riverbed, public in '06.
Resources
The USENEX paper from 1993 on BPF architecture: https://www.usenix.org/legacy/publications/library/proceedings/sd93/mccanne.pdf
Open source tools Steve shared in the podcast: https://github.com/brimdata/zed
https://github.com/brimdata/brim
Steve's GitHub BPF repo: https://github.com/brimdata/zbpf
Socials:
Twitter: https://twitter.com/OpenObserv
YouTube: https://www.youtube.com/@openobservabilitytalks

Oct 27, 2021 • 60min
SRE at Google: Planet-scale observability - OpenObservability Talks S2E05
Have you ever wondered how services are operated at Google’s scale? Here’s your opportunity to find out. Ramón will share how his SRE team runs Google’s identity services, and the elaborate end-to-end observability they use to achieve it with strict SLA. We’ll also get a glimpse at the birthplace of Kubernetes, OpenCensus, Dapper, Monarch and other cornerstones of today’s cloud-native DevOps and observability.
Ramón Medrano Llamas (@rmedranollamas) is a staff site reliability engineer at Google, focused on user identity and authentication. He concentrates on the reliability aspects of new Google products and new features of existing products, ensuring that they meet the same high bar as every other Google service. Before joining Google in 2013, he worked at CERN developing and designing distributed systems for physics. He holds a master’s degree in computer science and is pursuing a PhD on distributed systems.
The episode was live-streamed on 26 October 2021 and the video is available at https://youtube.com/live/jVTZf1SXZrg
Show Notes:
scale and size of Google Identity services operation
evolution from monitoring to observability
telemetry collection
SRE job description is changing
Google Dapper
Google Census
operating end-to-end observability at scale
flexibility vs. runbook in SRE
how SRE at google different
transition from monolith to MSA
Linux Foundation launching a DevOps bootcamp
Parca OSS launched
how to intro SRE culture
Resources:
Dapper paper: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
Borg paper: Large-scale cluster management at Google with Borg
MonArch paper: Monarch: Google’s Planet-Scale In-Memory Time Series Database
SRE books
Systemantics

Sep 19, 2021 • 1h 7min
Observability Into Your Business And FinOps - OpenObservability Talks S2E04
Observability is becoming a common practice for DevOps teams monitoring and troubleshooting IT systems. But Observability can offer much more than that. More advanced usage of telemetry, and in particular distributed tracing and its context propagation mechanism, can uncover insights into your business performance and can help solve business and FinOps problems.
On this episode of OpenObservability Talks I hosted Yuri Shkuro, creator of Jaeger project and a champion of Distributed Tracing, to discuss how tracing and observability can help beyond DevOps, whether on business cases, FinOps or even software development. We also caught up on the latest updates from Jaeger, the CNCF’s distributed tracing OSS project, its synergy with OpenTelemetry and more topics.
Yuri is a software engineer who works on distributed tracing, observability, reliability, and performance problems; author of the book "Mastering Distributed Tracing"; creator of Jaeger, an open source distributed tracing platform and a graduated CNCF project; co-founder of the OpenTracing and OpenTelemetry CNCF projects; member of the W3C Distributed Tracing Working Group.
The episode was live-streamed on 14 September 2021 and the video is available at https://youtube.com/live/YSOyTagKGtM
Show Notes:
why distributed tracing?
tracing through async flows
why is the slow adoption of tracing?
instrumentation challenge for tracing adoption
using context propagation for business use cases
observability tooling maturity
Jaeger project updates
OpenTelemetry accepted to CNCF incubation
Cortex and Thanos accepted to CNCF incubation
Google contributing SQLCommenter project to OTel
K8s v1.22 releases API Server tracing in alpha
Resource:
Great addition in Kubernetes v1.22 release: API Server Tracing, based on OpenTelemetry
Tracing at Uber and the beginning of Jaeger project: Distributed Tracing at Uber-Scale episode
OpenTelemetry becomes a CNCF incubating project
Cortex accepted to CNCF incubation in August
Thanos accepted to CNCF incubation in August
Google Donates Sqlcommenter to OpenTelemetry Project
From Distributed Tracing to APM: Taking OpenTelemetry and Jaeger Up a Level
Mastering Distributed Tracing by Yuri Shkuro

Aug 26, 2021 • 46min
Fluentd for logging and metrics and path forward - OpenObservability Talks S2E03
In this episode, we’ll talk with industry veteran and product manager Anurag Gupta who has been working in open source observability for over 4 years. We will go into depth on his background, and how he views the ecosystem of open source. Then we will dig into the Fluentd and Fluent Bit projects and discuss some of the amazing innovations coming from this project. Learn what’s next for logging, and how a consolidated data collection plane is being driven by the Fluentd project.

Jul 25, 2021 • 1h 10min
Prometheus, OpenMetrics, and the CNCF Observability Ecosystem - OpenObservability Talks S2E02
The CNCF has a rich suite to address monitoring Kubernetes and cloud-native workloads. First of which is Prometheus, which is widely adopted, with great out-of-the-box compatibility with Kubernetes. But under the CNCF you can also find OpenMetrics that offers standardization of the metrics format, Thanos and Cortex which offer long-term storage for Prometheus, and other complimentary solutions and integrations.
On this episode of OpenObservability Talks we’ll host “RichiH” Hartmann and discuss the different OSS projects, the synergy between them, and the future roadmap in building the community and making CNCF a leading offering.
Richard "RichiH" Hartmann is Director of Community at Grafana Labs, Prometheus team member, OpenMetrics founder, CNCF SIG Observability chair, and other things. He also organizes various conferences, including FOSDEM, DENOG, DebConf, and Chaos Communication Congress. In the past, he made mainframe databases work, ISP backbones run, and built a datacenter from scratch.
The episode was live-streamed on 02 July 2021 and the video is available at https://youtube.com/live/j3nFFHSosnI
Show Notes:
OpenTelemetry accepted to CNCF incubation
OpenTelemetry structure
OpenTelemetry community adoption
OpenMetrics and Open* confusion
OpenMetrics and OpenTelemetry synergy
OpenMetrics updates
CNCF’s Observability TAG (Technical Advisory Group)
How to sync between projects on CNCF
Prometheus state and roadmap
Prometheus conformance program
Thanos and Cortex projects
how the tech stack benefits humans
Grafana, Loki and Tempo projects
Resources:
OpenTelemetry.io
OpenTelemetry status page
Guide to OpenTelemetry
CNCF TAG Observability
Open* Explainer by RichiH
OpenMetrics

Jun 30, 2021 • 57min
Codeless Kubernetes Observability with eBPF - OpenObservability Talks S2E01
Current observability practice is largely based on manual instrumentation, which creates a barrier to entry for many wishing to implement observability in their environment. This is especially true in Kubernetes environments and microservices architecture.
eBPF (extended Berkeley Packet Filter) is an exciting new technology for Linux kernel level instrumentation, which bears the promise of no-code instrumentation and easier observability into Kubernetes environments (alongside other benefits for networking and security).
On this episode of OpenObservability Talks we’ll host Natalie Serrino, Principal Engineer at Pixie Labs, which was recently acquired by New Relic. We’ll talk about observability in Kubernetes environments, eBPF and its use cases for observability.
We’ll also talk about Pixie, the Kubernetes-native in-cluster observability platform, and the exciting news of it being open sourced and contributed these days to CNCF under Apache 2.0 license.
Natalie is a Principal Engineer and Tech Lead at New Relic. She works on the Pixie auto-telemetry observability platform, which was acquired and open sourced by New Relic. She focuses primarily on Pixie’s data layer, including its query language, compiler, and query execution engine.
The episode was live-streamed on 20 June 2021 and the video is available at https://youtube.com/live/NYDBj5ctKaw
Show Notes:
challenges in k8s observability
state of instrumentation
automatic instrumentation
eBPF overview
eBPF vs. service mesh side cars
Pixie project overview
Pixie’s roadmap and integration plans with CNCF ecosystem
Netflix engineering sharing use case of eBPF
instrumenting with Istio
opensearch RC1 released
K8s unpredictable spend
logs aren't enough, need tracing - recommended article
Resources:
http://www.brendangregg.com/ebpf.html
https://blog.px.dev/
https://docs.px.dev/about-pixie/roadmap/
https://www.businesswire.com/news/home/20210504005480/en/New-Relic-Joins-Cloud-Native-Computing-Foundation-Governing-Board-and-is-in-the-Process-of-Contributing-Pixie-Open-Source-for-Kubernetes-Native-Observability
https://netflixtechblog.com/how-netflix-uses-ebpf-flow-logs-at-scale-for-network-insight-e3ea997dca96
https://logz.io/blog/istio-instrumenting-microservices-distributed-tracing/
https://opensearch.org/blog/update/2021/06/opensearch-release-candidate-announcement/
https://thenewstack.io/tracing-why-logs-arent-enough-to-debug-your-microservices/
https://www.theregister.com/2021/06/29/kubernetes_spend_report/
Socials:
Twitter: https://twitter.com/OpenObserv
YouTube: https://www.youtube.com/@openobservabilitytalks

May 27, 2021 • 1h 1min
OpenSearch: The Open Source Successor of Elasticsearch? - OpenObservability Talks S1E12
OpenSearch project was born out of the passion for Elasticsearch and Kibana and the desire to keep them open source in the face of Elastic’s decision to close-source them. After a couple of months of hard work led by AWS, the Beta release was announced earlier this month under Apache2 license.
On this episode of OpenObservability Talks I hosted Kyle Davis, Senior Developer Advocate for OpenSearch at AWS. We talked about how OpenSearch came to be, what it took to fork Elasticsearch and Kibana, what the engineers discovered when they dug into the code, what’s planned ahead, and much more.
About Kyle Davis: While being a relative newcomer to Amazon, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based home garden.
The episode was live-streamed on 27 May 2021 and the video is available at https://youtube.com/live/UDvWdTeH5V4
Resources:
https://github.com/opensearch-project
Beta announcement
Roadmap available
Put the OPEN in Observability: Elasticsearch and Kibana relicensing and community chat - OpenObservability Talks S1E08
Socials:
Twitter: https://twitter.com/OpenObserv
YouTube: https://www.youtube.com/@openobservabilitytalks

Apr 30, 2021 • 56min
Diving deep into Jaeger and OpenTelemetry with Juraci Paixão Kröhling - OpenObservability Talks S1E11
We are thrilled to have Juraci Kröhling a Software Engineer at Red Hat; CNCF, Maintainer for Jaeger, and OpenTelemetry. He will be live and in-person this month on the podcast in a discussion with Jonah Kowall who is the CTO at logz.io and contributor to Jaeger, OpenTelemetry, and OpenSearch.