OpenObservability Talks cover image

OpenObservability Talks

Latest episodes

undefined
Jun 28, 2022 • 1h 1min

OpenTelemetry and the Vision for Unified Open Observability - OpenObservability Talks S3E01

OpenTelemetry is one of the most fascinating and ambitious open source projects of this era. It’s currently the second most active project in the CNCF (the Cloud Native Computing Foundation), with only Kubernetes being more active. The entire industry is aligning behind this project, including incumbent monitoring vendors that were deeply vested in proprietary and closed-source agents to that end. In this episode of OpenObservability Talks I’ll host Alolita Sharma to discuss OpenTelemetry, its origins and mission statement, as well as updates hot off the press from the recent KubeCon conference in Valencia about releases and future plans. Alolita is co-chair of the CNCF Technical Advisory Group for Observability, member of the OpenTelemetry Governance Committee and a board director of the Unicode Consortium. She has served on the boards of the OSI and SFLC.in. Alolita has led engineering teams at Wikipedia, Twitter, PayPal, IBM and AWS. Two decades of doing open source continue to inspire her. The episode was live-streamed on 15 June 2022 and the video is available at https://youtu.be/IK2TWOzDUBI  OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube. We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.https://www.twitch.tv/openobservabilityhttps://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg You can read the recap post: https://logz.io/blog/opentelemetry-roadmap-and-latest-updates/?utm_source=devrel&utm_medium=devrel Show Notes: Hot updates from KubeCon EMEA 2022 Alolita Sharma introduction The state of OpenTelemetry When OpenTelemetry Logging is expecting GA The onboarding challenge of instrumentation Client side instrumentation and real user monitoring Adding continuous profiling telemetry to OpenTelemetry Interoperability between OpenTelemetry and Prometheus Challenges in OpenTelemetry and observability Where OpenTelemetry is heading next Jaeger OSS now accept OTLP (OpenTelemetry protocol) Resources: OpenTelemetry Metrics reaches RC: https://opentelemetry.io/blog/2022/metrics-announcement/ OpenTelemetry guide: https://logz.io/learn/opentelemetry-guide/ CI/CD Observability: https://horovits.medium.com/fighting-slow-and-flaky-ci-cd-pipelines-starts-with-observability-19da2ac94677 Jaeger can now accept OpenTelemetry protocol https://medium.com/jaegertracing/introducing-native-support-for-opentelemetry-in-jaeger-eb661be8183c OTel Community Day summary: http://paulsbruce.io/blog/2022/06/opentelemetry-community-day-austin-2022 Contextual Logging in Kubernetes 1.24 https://kubernetes.io/blog/2022/05/25/contextual-logging/  PolarSignals announced FrostDB https://www.polarsignals.com/blog/posts/2022/05/04/introducing-arcticdb/  Socials: Twitter: https://twitter.com/OpenObserv Twitch: https://www.twitch.tv/openobservability YouTube: https://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg Dotan Horovits ============ Twitter: @horovits LinkedIn: in/horovits Mastodon: @horovits@fosstodon Alolita Sharma ============ Twitter: @alolita LinkedIn: https://www.linkedin.com/in/alolita/
undefined
May 26, 2022 • 59min

Observability for Developers Demystified - OpenObservability Talks E2E12

Developers hate monitoring, but we need it. We need it in many points of the software development lifecycle: before deprecating an API, before launching a new feature, after launching the feature, and more. In fact, monitoring needs can vary much more than the classic Ops monitoring. In this episode I’ll host Liran Haimovitch to discuss how to determine what developers should be monitoring, the difference between observability for Dev and for Ops, and how observability fits into our current dev tools, dev stack and dev processes. Liran is the Co-Founder and CTO of Rookout. He’s an Observability and Instrumentation expert with a deep understanding of Java, Python, Node, and C++. Liran has broad experience in cybersecurity and compliance from his past roles. When not coding, you can find Liran hosting his podcast, speaking at conferences, writing about his tech adventures, and trying out the local cuisine when traveling. The episode was live-streamed on 10 May 2022 and the video is available at https://youtu.be/OaHQp-qnVN0  OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube. We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.https://www.twitch.tv/openobservabilityhttps://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg Have you got an interesting topic you'd like to share in an episode? Reach out to us and submit your proposal at https://openobservability.io/ Show Notes: Which data do we need to collect for our observability How is observability for dev different from ops How does observability fit into dev tool stack Snapshots provide deep-dive telemetry signal Dynamic instrumentation Snapshots support in programming languages and runtimes Open source standardization around snapshots The cost associated with observability Google is applying to contribute Istio to the CNCF Shopify case study for observability team Resources: Istio applying to the CNCF: https://istio.io/latest/blog/2022/istio-has-applied-to-join-the-cncf/ Shopify case study for Observability team: https://ericmustin.substack.com/p/notes-on-an-observability-team?s=r Socials: Twitter: https://twitter.com/OpenObserv Twitch: https://www.twitch.tv/openobservability YouTube: https://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg
undefined
Apr 28, 2022 • 1h 1min

OpenSearch 2.0 and beyond with Eli - OpenObservability Talks E2E11

OpenSearch is a community-driven, open-source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 & Kibana 7.10.2. The OpenSearch project started just over a year ago and is now the open-source alternative to ELK, which is no longer open source. The team has spent much of the last year getting the project going, but there was innovation as well. We will cover and discuss what OpenSearch has accomplished, but more importantly what’s coming next, including a big 2.0 release. We are joined in this episode by Eli Fisher, who is the product lead at AWS, working on the OpenSearch project. He’ll dive into recent launches, including several observability features, and innovations planned for 2.0 and beyond.    The podcast episodes are available for listening on your favorite podcast app and on this YouTube channel.   We live-stream the episodes, and you’re welcome to join the stream here on YouTube Live or at https://www.twitch.tv/openobservability​.   
undefined
Mar 30, 2022 • 60min

SLO Driven Engineering: from Dev to Prod - OpenObservability Talks S2E10

Google’s SRE Book popularized the concept of Service Level Objective (SLO) and the SLO-driven approach. But what does it really mean to make SLO driven decisions? How can we generate observability and synchronize teams around joint SLOs? And how can we automate SLOs and integrate them into the software release pipeline? In this episode I’ll host Andreas Grabner. We’ll discuss the SRE practices, and how to automate SLO from dev all the way to prod. We’ll talk about the open source efforts to standardize the process under the Continuous Delivery Foundation, and about Keptn, the new CNCF open source project that promises to help with this automation. Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a contributor and DevRel for the CNCF open source project keptn (www.keptn.sh). Andreas is also a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on blog.dynatrace.com or medium. In his spare time you can most likely find him on one of the salsa dancefloors of the world. The episode was live-streamed on 15 March 2022 and the video is available at https://youtu.be/J81byOpVqrk  OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube. We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.https://www.twitch.tv/openobservabilityhttps://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg Show Notes: What’s SRE Where is SRE placed in the organization SRE vs. DevOps Good and bad SLOs How to define SLOs top-down Who owns SLO definition, monitoring, remediation Where is SRE within less mature organizations Keptn OSS project background Who uses and contributes to Keptn project What’s the CDF (Continuous Delivery Foundation) Creating a standard CD event format under the CDF (CDF Events SIG) Cloud Native Observability survey by the CNCF Resources: SLO in the age of microservices: Keptn OSS project: https://keptn.sh/ Keptn 0.14.0 major release TechWorld with Nana on Keptn CD Foundation - SIG Events: https://github.com/cdfoundation/sig-events PurePerformance podcast Cloud Native Observability survey by the CNCF Socials: Twitter: https://twitter.com/OpenObserv Twitch: https://www.twitch.tv/openobservability YouTube: https://www.youtube.com/channel/UCLKOtaBdQAJVR
undefined
Feb 27, 2022 • 59min

Building web-scale observability at Slack, Pinterest & Twitter - OpenObservability Talks S2E09

What does it take to build observability in a web-scale company such as Slack, Pinterest and Twitter? On this episode of OpenObsevability Talks I'll host Suman Karumuri to hear how he built these systems from the ground up on these #BigTech co's, about his recent research papers and more. Suman Karumuri is a Sr. Staff Software Engineer and the tech lead for Observability at Slack. Suman Karumuri is an expert in distributed tracing and was a tech lead of Zipkin and a co-author of OpenTracing standard, a Linux Foundation project via the CNCF. Previously, Suman Karumuri has spent several years building and operating petabyte scale log search, distributed tracing and metrics systems at Pinterest, Twitter and Amazon.  In his spare time, he enjoys board games, hiking and playing with his kids. The episode was live-streamed on 16 February 2022 and the video is available at https://youtu.be/IvidkV3TfYg  OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube. We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.https://www.twitch.tv/openobservabilityhttps://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg Show Notes: * Who owns observability in large organizations? * The gaps in current way of handling metrics  * MACH research paper for metrics storage engine * The gaps in current way of handling logs Slack KalDB * SlackTrace - Slack in house tracing system  Resources: Research paper: building Observability Data Management Systems CIDR paper: Video SlackTrace blog post, talk. Logging at Twitter Pintrace: A Distributed Tracing Pipeline talk by Suman at LISA Observability Engineering book Observability Trends for 2022 Yelp engineering with Elasticsearch and Lucene Socials: Twitter: https://twitter.com/OpenObserv Twitch: https://www.twitch.tv/openobservability YouTube: https://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg
undefined
Jan 31, 2022 • 58min

SaaS Observability Done Right - OpenObservability Talks S2E08

SaaS (software as a service) is a popular model for many businesses today. SaaS businesses need agility to move fast and remain competitive. This means agility in the software IT stack, but also agility in the business models and product-led growth (PLG). Observability plays a key role in enabling SaaS organizations to move fast. Achieving this agility, however, raises specific observability requirements. On this episode of OpenObservability Talks we’ll host Aviad Mizrachi, the CTO and Co-Founder of Frontegg, to help us map these requirements. Having escorted dozens of SaaS businesses across many verticals, Aviad brings a wealth of experience in how today’s SaaS is built and operated, and will share his insights and best practices on how to design and build the observability stack right. Aviad has been a developer for the last 20 years. He held a few management and architecture positions on startups such as Vicon and HTS as well as in larger companies such as NICE and CheckPoint. Today at Frontegg Aviad works closely with many customers to help them build their SaaS solutions. The episode was live-streamed on YouTube Live and Twitch on 11 Jan 2022 and the video is available at https://www.youtube.com/watch?v=ZcneTMeBPeg  OpenObservability Talks episodes are released monthly, on the last Thursday of each month and are available for listening on your favorite podcast app and on YouTube.  We live-stream the episodes on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat.https://www.twitch.tv/openobservabilityhttps://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg Show Notes: What characteristics in today’s SaaS businesses dictate/influence the tech choices How are SaaS systems built? Tech stack and architecture Which observability is needed for SaaS? Kubernetes & infra observability Availability, responsiveness, low latency are critical in SaaS product and business observability Observability has many stakeholders Recommended tooling for SaaS Correlating different data signals Persistence and the cost of storage Final tips for SaaS observability AWS recent outages and learnings Log4j recent CVEs  Resources: AWS outages and learnings: https://horovits.medium.com/retrospect-on-the-aws-outage-and-resilient-cloud-based-architecture-cc513a32747 Socials: Twitter: https://twitter.com/OpenObserv Twitch: https://www.twitch.tv/openobservability YouTube: https://www.youtube.com/channel/UCLKOtaBdQAJVRJqhJDuOlPg
undefined
Dec 21, 2021 • 1h 2min

Prometheus Pitfalls and the Rise of Continuous Profiling - OpenObservability Talks S2E07

We’ve grown to rely on “the three pillars” for observability - logs, metrics and traces. Popular frameworks such as Prometheus have helped popularize these practices. But now people are starting to realize that it’s not enough. On this episode Dotan Horovits will host Frederic Branczyk for a discussion about the unspoken pitfalls of Prometheus and the challenges of current observability coverage. We will also discuss the rise of Continuous Profiling as a new observability signal, what it’s about and where it can help. We’ll also review the recent launch of Parca, an open source project for continuous profiling that traces its roots to Red Hat’s internal ConProf open source tool. Frederic is the founder and CEO of Polar Signals. Before founding Polar Signals he was a senior principal engineer and the main architect for all things Observability at Red Hat, which he joined through the CoreOS acquisition. Frederic is a Prometheus and Thanos maintainer as well as the tech lead for the special interest group for instrumentation in Kubernetes. In a previous life, he was a security researcher working on key management solutions as well as intrusion detection systems. When not working on software Frederic enjoys obsessing over brewing a perfect cup of coffee. The episode was live-streamed on 16 December 2021 and the video is available at https://www.youtube.com/watch?v=G02g63oI0IA  OpenObservability Talks episodes are released monthly, on the last Thursday of each month. The episodes are also live-streamed on Twitch and YouTube Live - tune in to see us live, and pitch in with your comments and questions on the live chat. Show Notes: The limitations of the three pillars model of observability Prometheus strengths and pitfalls how to start with continuous profiling how to correlate between different telemetry Parca OSS intro eBPF turned out perfect for instrumenting continuous profiling Parca OSS future plan how is the performance penalty of continuous profiling kept low what's the solution for high cardinality in Prometheus? will Parca OSS be contributed to an established OSS foundation? Prometheus Agent mode released OTEL operator now has an instrumentation CR continuous profiling support for interpreted languages Resources: https://www.parca.dev/ https://github.com/google/pprof https://increment.com/containers/observing-containers-pillars-of-observability/ https://ebpf.io/ https://research.google/pubs/pub36575/ Social: Twitter: https://twitter.com/OpenObserv YouTube: ⁠https://www.youtube.com/@openobservabilitytalks⁠
undefined
Nov 23, 2021 • 1h 2min

BPF origin story and the future of telemetry analytics OpenObservability Talks S2E06

OpenObservability Talks S2E06: Hosting Steve McCanne   We hear a lot about BPF in the industry today, applying this flexible technology to solve so many problems from routing, proxying, and of course observability. Correlating events and data from the operating system level across distributed systems is a key problem for the industry and community to solve. I am thrilled to announce Steve McCanne joining us for this episode. I have been lucky enough to spend time with Steve in my career and am delighted to have him join us to discuss the origin stories and where these foundational technologies might be applied in the future. Steve’s Bio and background speak for themselves.   Steve McCanne is the "Coding CEO" at Brim, a small startup working on the open-source Zed Project and a new application called "Brim" that leverages Zed. Back in the days before the Web, Steve worked at the Lawrence Berkeley National Laboratory where he developed BPF, libpcap, the PCAP file format, and the tcpdump language and compiler, while also working on the Real-time Transport Protocol (RTP) for Internet video when the telcos claimed that real-time Internet communication was impossible without end-to-end virtual-circuit guarantees. (Guess who was right?) After a brief stint in academia in the late '90s, Steve crossed over to the dark side, became a tech entrepreneur, and never looked back. He has founded several startups and took his '02 company and Sharkfest's sponsor, Riverbed, public in '06.   Resources   The USENEX paper from 1993 on BPF architecture:  https://www.usenix.org/legacy/publications/library/proceedings/sd93/mccanne.pdf   Open source tools Steve shared in the podcast: https://github.com/brimdata/zed https://github.com/brimdata/brim   Steve's GitHub BPF repo: https://github.com/brimdata/zbpf   Socials: Twitter:⁠ https://twitter.com/OpenObserv⁠ YouTube: ⁠https://www.youtube.com/@openobservabilitytalks⁠
undefined
Oct 27, 2021 • 60min

SRE at Google: Planet-scale observability - OpenObservability Talks S2E05

Have you ever wondered how services are operated at Google’s scale? Here’s your opportunity to find out. Ramón will share how his SRE team runs Google’s identity services, and the elaborate end-to-end observability they use to achieve it with strict SLA. We’ll also get a glimpse at the birthplace of Kubernetes, OpenCensus, Dapper, Monarch and other cornerstones of today’s cloud-native DevOps and observability. Ramón Medrano Llamas (@rmedranollamas) is a staff site reliability engineer at Google, focused on user identity and authentication. He concentrates on the reliability aspects of new Google products and new features of existing products, ensuring that they meet the same high bar as every other Google service. Before joining Google in 2013, he worked at CERN developing and designing distributed systems for physics. He holds a master’s degree in computer science and is pursuing a PhD on distributed systems. The episode was live-streamed on 26 October 2021 and the video is available at https://youtube.com/live/jVTZf1SXZrg Show Notes: scale and size of Google Identity services operation evolution from monitoring to observability telemetry collection SRE job description is changing Google Dapper Google Census operating end-to-end observability at scale flexibility vs. runbook in SRE how SRE at google different transition from monolith to MSA Linux Foundation launching a DevOps bootcamp Parca OSS launched how to intro SRE culture Resources: Dapper paper: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure Borg paper: Large-scale cluster management at Google with Borg MonArch paper: Monarch: Google’s Planet-Scale In-Memory Time Series Database SRE books  Systemantics
undefined
Sep 19, 2021 • 1h 7min

Observability Into Your Business And FinOps - OpenObservability Talks S2E04

Observability is becoming a common practice for DevOps teams monitoring and troubleshooting IT systems. But Observability can offer much more than that. More advanced usage of telemetry, and in particular distributed tracing and its context propagation mechanism, can uncover insights into your business performance and can help solve business and FinOps problems. On this episode of OpenObservability Talks I hosted Yuri Shkuro, creator of Jaeger project and a champion of Distributed Tracing, to discuss how tracing and observability can help beyond DevOps, whether on business cases, FinOps or even software development. We also caught up on the latest updates from Jaeger, the CNCF’s distributed tracing OSS project, its synergy with OpenTelemetry and more topics. Yuri is a software engineer who works on distributed tracing, observability, reliability, and performance problems; author of the book "Mastering Distributed Tracing"; creator of Jaeger, an open source distributed tracing platform and a graduated CNCF project; co-founder of the OpenTracing and OpenTelemetry CNCF projects; member of the W3C Distributed Tracing Working Group. The episode was live-streamed on 14 September 2021 and the video is available at https://youtube.com/live/YSOyTagKGtM Show Notes: why distributed tracing? tracing through async flows why is the slow adoption of tracing? instrumentation challenge for tracing adoption using context propagation for business use cases observability tooling maturity Jaeger project updates OpenTelemetry accepted to CNCF incubation Cortex and Thanos accepted to CNCF incubation Google contributing SQLCommenter project to OTel K8s v1.22 releases API Server tracing in alpha Resource: Great addition in Kubernetes v1.22 release: API Server Tracing, based on OpenTelemetry Tracing at Uber and the beginning of Jaeger project: Distributed Tracing at Uber-Scale episode OpenTelemetry becomes a CNCF incubating project Cortex accepted to CNCF incubation in August Thanos accepted to CNCF incubation in August Google Donates Sqlcommenter to OpenTelemetry Project From Distributed Tracing to APM: Taking OpenTelemetry and Jaeger Up a Level Mastering Distributed Tracing by Yuri Shkuro

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode