Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Confluent
undefined
Mar 22, 2022 • 43min

Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole

Data availability, usability, integrity, and security are words that we sometimes hear a lot. But what do they actually look like when put into practice? That’s where data governance comes in. This becomes especially tricky when working with real-time data architectures.Tushar Thole (Senior Manager, Engineering, Trust & Security, Confluent) focuses on delivering features for software-defined storage, software-defined networking (SD-WAN), security, and cloud-native domains. In this episode, he shares the importance of real-time data governance and the product portfolio—Stream Governance, which his team has been building to fostering the collaboration and knowledge sharing necessary to become an event-centric business while remaining compliant within an ever-evolving landscape of data regulations. With the increase of data volume, variety, and velocity, data governance is mandatory for trustworthy, usable, accurate, and accessible data across organizations, especially with distributed data in motion. When it comes to choosing a tool to govern real-time distributed data, there is often a paradox of choice. Some tools are built for handling data at rest, while open source alternatives lack features and are not managed services that can be integrated with the Apache Kafka® ecosystem natively. To solve governance use cases by delivering high-quality data assets, Tushar and his team have been taking Confluent Schema Registry, considered the de facto metadata management standard for the ecosystem, to the next level. This approach to governance allows organizations to scale Kafka operations for real-time observability with security and quality. The fully managed, cloud-native Stream Governance framework is based on three key workflows: Stream catalog: Search and discover data in a self-service fashionStream lineage: Understand the complex data relationships with interactive, end-to-end maps of event streams Stream quality: Deliver trusted, high-quality event streams to the organization Tushar also shares use cases around data governance and sheds light on the Stream Governance roadmap. EPISODE LINKSStream Governance – How it WorksData Mess to Data Mesh | Jay KrepsDemo: Stream GovernanceData Governance for Real Time DataWatch the video version of this podcastKris Jenkins TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 15, 2022 • 42min

Handling 2 Million Apache Kafka Messages Per Second at Honeycomb

How many messages can Apache Kafka® process per second? At Honeycomb, it's easily over one million messages.  In this episode,  get a taste of how Honeycomb uses Kafka on massive scale. Liz Fong-Jones (Principal Developer Advocate, Honeycomb) explains how Honeycomb manages Kafka-based telemetry ingestion pipelines and scales Kafka clusters. And what is Honeycomb? Honeycomb is an observability platform that helps you visualize, analyze, and improve cloud application quality and performance. Their data volume has grown by a factor of 10 throughout the pandemic, while the total cost of ownership has only gone up by 20%. But how, you ask? As a developer advocate for site reliability engineering (SRE) and observability, Liz works alongside the platform engineering team on optimizing infrastructure for reliability and cost. Two years ago, the team was facing the prospect of growing from 20 Kafka brokers to 200 Kafka brokers as data volume increased. The challenge was to scale and shuffle data between the number of brokers while maintaining cost efficiency.The Honeycomb engineering team has experimented with using sc1 or st1 EBS hard disks to store the majority of longer-term archives and keep only the latest hours of data on NVMe instance storage. However, this approach to cost reduction was not ideal, which resulted in needing to keep data that is older than 24 hours on SSD. The team began to explore and adopt Zstandard compression to decrease bandwidth and disk size; however, the clusters were still struggling to keep up. When Confluent Platform 6.0 rolled out Tiered Storage, the team saw it as a feature to help them break away from being storage bound. Before bringing the feature into production, the team did a proof of concept, which helped them gain confidence as they watched Kafka tolerate broker death and reduce latencies in fetching historical data. Tiered Storage now shrinks their clusters significantly so that they can hold on to local NVMe SSD and the tiered data is only stored once on Amazon S3, rather than consuming SSD on all replicas. In combination with the AWS Im4gn instance, Tiered Storage allows the team to scale for long-term growth. Honeycomb also saved 87% on the cost per megabyte of Kafka throughput by optimizing their Kafka clusters.EPISODE LINKSTiered StorageIntroducing Confluent Platform 6.0Scaling Kafka at HoneycombWatch the video version of this podcastKris Jenkins TwitterStreaming Audio Playlist Join the ConSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
7 snips
Mar 10, 2022 • 45min

Why Data Mesh? ft. Ben Stopford

With experience in data infrastructure and distributed data technologies, author of the book “Designing Event-Driven Systems” Ben Stopford (Lead Technologist, Office of the CTO, Confluent) explains the data mesh paradigm, differences between traditional data warehouses and microservices, as well as how you can get started with data mesh.   Unlike standard data architecture, data mesh is about moving data away from a monolithic data warehouse into distributed data systems. Doing so will allow data to be available as a product—this is also one of the four principles of data mesh: Data ownership by domainData as a productData available everywhere for self-serviceData governed wherever it isThese four principles are technology agnostic, which means that they don’t restrict you to a programming language, Apache Kafka®, or other databases. Data mesh is all about building point-to-point architecture that lets you evolve and accommodate real-time data needs with governance tools.Fundamentally, data mesh is more than a technological shift. It’s a mindset shift that requires cultural adaptation of product thinking—treating data as a product instead of data as an asset or resource. Data mesh invests ownership of data by the people who create it with requirements that ensure quality and governance. Because data mesh consists of a map of interconnections, it’s important to have governance tools in place to identify data sources and provide data discovery capabilities. There are many ways to implement data mesh, event streaming being one of them. You can ingest data sets from across organizations and sources into your own data system. Then you can use stream processing to trigger an application response to the data set. By representing each data product as a data stream, you can tag it with sub-elements and secondary dimensions to enable data searchability. If you are using a managed service like Confluent Cloud for data mesh, you can visualize how data flows inside the mesh through a stream lineage graph. Ben also discusses the importance of keeping data architecture as simple as you can to avoid derivatives of data products.EPISODE LINKSData Mesh 101 courseData Mesh 101 with Live Walkthrough ExerciseIntroduction and Guide to Data MeshThe Definitive Guide to Building a Data Mesh with Event StreamsWhat is Data Mesh, and How Does it Work? ft. Zhamak DehghaniDesigning Event-Driven SystemsSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 3, 2022 • 42min

Serverless Stream Processing with Apache Kafka ft. Bill Bejeck

What is serverless?Having worked as a software engineer for over 15 years and as a regular contributor to Kafka Streams, Bill Bejeck (Integration Architect, Confluent) is an Apache Kafka® committer and author of “Kafka Streams in Action.” In today’s episode, he explains what serverless and the architectural concepts behind it are. To clarify, serverless doesn’t mean you can run an application without a server—there are still servers in the architecture, but they are abstracted away from your application development. In other words, you can focus on building and running applications and services without any concerns over infrastructure management. Using a cloud provider such as Amazon Web Services (AWS) enables you to allocate machine resources on demand while handling provisioning, maintenance, and scaling of the server infrastructure. There are a few important terms to know when implementing serverless functions with event stream processors: Functions as a service (FaaS)Stateless stream processingStateful stream processingServerless commonly falls into the FaaS cloud computing service category—for example, AWS Lambda is the classic definition of a FaaS offering. You have a greater degree of control to run a discrete chunk of code in response to certain events, and it lets you write code to solve a specific issue or use case. Stateless processing is simpler in comparison to stateful processing, which is more complex as it involves keeping the state of an event stream and needs a key-value store. ksqlDB allows you to perform both stateless and stateful processing, but its strength lies in stateful processing to answer complex questions while AWS Lambda is better suited for stateless processing tasks. By integrating ksqlDB with AWS Lambda together, they deliver serverless event streaming and analytics at scale.EPISODE LINKSWhat is Serverless?Serverless Stream Processing with Apache Kafka, AWS Lambda, and ksqlDBStateful Serverless Architectures with ksqlDB and AWS Lambda Serverless GitHub repositoryKafka Streams in ActionWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Feb 24, 2022 • 47min

The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps

When it comes to Apache Kafka®, there’s no one better to tell the story than Jay Kreps (Co-Founder and CEO, Confluent), one of the original creators of Kafka. In this episode, he talks about the evolution of Kafka from in-house infrastructure to a managed cloud service and discusses what’s next for infrastructure engineers who used to self-manage the workload. Kafka started out at LinkedIn as a distributed stream processing framework and was core to their central data pipeline. At the time, the challenge was to address scalability for real-time data feeds. The social media platform’s initial data system was built on Apache™Hadoop®, but the team later realized that operationalizing and scaling the system required a considerable amount of work. When they started re-engineering the infrastructure, Jay observed a big gap in data streaming—on one end, data was being looked at constantly for analytics, while on the other end, data was being looked at once a day—missing real-time data interconnection. This ushered in efforts to build a distributed system that connects applications, data systems, and organizations for real-time data. That goal led to the birth of Kafka and eventually a company around it—Confluent.Over time, Confluent progressed from focussing solely on Kafka as a software product to a more holistic view—Kafka as a complete central nervous system for data, integrating connectors and stream processing with a fully-managed cloud service.Now as organizations make a similar shift from in-house infrastructure to fully-managed services, Jay outlines five guiding points to keep in mind: Cloud-native systems abstract away operational efforts for you without infrastructure concernsIt’s important to have a complete ecosystem for Kafka, including connectors, a SQL layer, and data governanceA distributed system should allow data to be accessible everywhere and across organizationsIdentifying a reliable storage infrastructure layer that is dependable, such as Amazon S3 is criticalCost-effective models mean sustainability and systems that are easy to build aroundEPISODE LINKSBuilding Real-Time Data Systems the Hard WayKris Jenkins TwitterThe Hitchhiker’s Guide to the GalaxyHedonic treadmillWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven MicroserviceSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Feb 16, 2022 • 3min

What’s Next for the Streaming Audio Podcast ft. Kris Jenkins

Meet your new host of the Streaming Audio podcast: Kris Jenkins (Senior Developer Advocate, Confluent)! In this preview, Kris shares a few highlights from forthcoming episodes to look forward to, spanning topics from data mesh, cloud-native technologies, and serverless Apache Kafka®, to data modeling. As a developer advocate, Kris is endlessly fascinated about software design, functional programming, real-time systems, and electronic music. He is a veteran software developer and engineer, with a broad background from roles such as CTO of a Java/Oracle gold exchange and contract developer of several Haskell/PureScript-based event systems.There is still a raft of data streaming narratives to tell and many community experts to feature. We’ll cover what’s new and emerging, real-life Kafka use cases, and how people are currently using managed Kafka as a service, as well as the latest in the data streaming spaceIf there’s a subject you’d like to see covered on the show or if you know someone who should be featured, let us know via the Confluent Community engagement form. EPISODE LINKSGet involved in the Confluent CommunitySubscribe on Apple PodcastSubscribe on SpotifySubscribe on AndroidListen and Subscribe on PodLinkWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Feb 3, 2022 • 7min

On to the Next Chapter ft. Tim Berglund

After nearly 200 podcast episodes of Streaming Audio, Tim Berglund bids farewell in his last episode as host of the show. Tim reflects on the many great memories with guests who have appeared on the segment—and each for its own reasons. He has covered a wide variety of topics, ranging from Apache Kafka® fundamentals, microservices, event stream processing, use cases, to cloud-native Kafka, data mesh, and more. As Tim mentions, the Streaming Audio podcast will continue on to explore all things about Kafka and the cloud while featuring new voices and topics. You can subscribe to the Streaming Audio podcast on your podcast platform of choice to get the latest updates and news. Thank you for listening and stay tuned. EPISODE LINKSI Interviewed Nearly 200 Apache Kafka Experts and I learned These 10 ThingsWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Feb 1, 2022 • 30min

Intro to Event Sourcing with Apache Kafka ft. Anna McDonald

What is event sourcing and how does it work?Event sourcing is often used interchangeably with event-driven architecture and event stream processing. However, Anna McDonald (Principal Customer Success Technical Architect, Confluent) explains it's a specific category of its own—an event streaming pattern. Anna is passionate about event-driven architectures and event patterns. She’s a tour de force in the Apache Kafka® community and is the presenter of the Event Sourcing and Event Storage with Apache Kafka course on Confluent Developer. In this episode, she previews the course by providing an overview of what event sourcing is and what you need to know in order to build event-driven systems. Event sourcing is an architectural design pattern, which defines the approach to handling data operations that are driven by a sequence of events. The pattern ensures that all changes to an application state are captured and stored as an immutable sequence of events, known as a log of events. The events are persisted in an event store, which acts as the system of record. Unlike traditional databases where only the latest status is saved, an event-based system saves all events into a database in sequential order. If you find a past event is incorrect, you can replay each event from a certain timestamp up to the present to recreate the latest status of data. Event sourcing is commonly implemented with a command query responsibility segregation (CQRS) system to perform data computation tasks in response to events. To implement CQRS with Kafka, you can use Kafka Connect, along with a database, or alternatively use Kafka with the streaming database ksqlDB.In addition, Anna also shares about:Data at rest and data in motion techniques for event modelingThe differences between event streaming and event sourcingHow CQRS, change data capture (CDC), and event streaming help you leverage event-driven systemsThe primary qualities and advantages of an event-based storage systemUse cases for event sourcing and how it integrates with your systemsEPISODE LINKSEvent Sourcing courseEvent Streaming in 3 MinutesIntroducing Derivative Event SourcingMeetup: Event Sourcing and Apache KafkaWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-DrivSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Jan 27, 2022 • 31min

Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela

In an effort to make Apache Kafka® cloud native, Anna Povzener (Principal Engineer, Confluent) and Anastasia Vela (Software Engineer I, Confluent) have been working to expand multi-tenancy to cloud-native systems with automated capacity planning and scaling in Confluent Cloud. They explain how cloud-native data systems are different from legacy databases and share the technical requirements needed to create multi-tenancy for managed Kafka as a service. As a distributed system, Kafka is designed to support multi-tenant systems by: Isolating data with authentication, authorization, and encryptionIsolating user namespacesIsolating performance with quotasTraditionally, Kafka’s multi-tenant capabilities are used in on-premises data centers to make data available and accessible across the company—a single company would run a multi-tenant Kafka cluster with all its workloads to stream data across organizations. Some processes behind setting up multi-tenant Kafka clusters are manual with the requirement to over-provision resources and capacity in order to protect the cluster from unplanned demand increases. When Kafka is on cloud instances, you have the ability to scale cloud resources on the fly for any unplanned workloads to meet expectations instantaneously. To shift multi-tenancy to the cloud, Anna and Anastasia identify the following as essential for the architectural design: Abstractions: requires minimal operational complexity of a cloud servicePay-per-use model: requires the system to use only the minimum required resources until additional is necessary Uptime and performance SLA/SLO: requires support for unknown and unpredictable workloads with minimal operational workload while protecting the cluster from distributed denial-of-service (DDoS) attacksCost-efficiency: requires a lower cost of ownership You can also read more about the shift from on-premises to cloud-native, multi-tenant services in Anna and Anastasia’s publication on the Confluent blog. EPISODE LINKSFrom On-Prem to Cloud-Native: Multi-Tenancy in Confluent CloudCloud-Native Apache KafkaJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)Watch the video version of this pSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Jan 24, 2022 • 5min

Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs

Apache Kafka® 3.1 is here with exciting new features and improvements! On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares release highlights that you won’t want to miss, including foreign-key joins in Kafka Streams and improvements that will provide consistency for Kafka latency metrics. KAFKA-13439 deprecates the eager protocol, which has been the default since Kafka 2.4—it’s advised to upgrade your applications to the cooperative protocol as the eager protocol will no longer be supported in future releases. Previously, foreign-key joins in Kafka Streams only worked if both primary and foreign-key tables were joined. This release adds support for foreign-key joins on tables with custom partitioners, which will be passed in as part of a new `TableJoined` object, comparable to the existing `Joined` and `StreamJoined` objects. With the goal of making Kafka more intuitive, KIP-773 enhances naming consistency for three new client metrics with millis and nanos. For example, `io-waittime-total` is reintroduced as `io-wait-time-ns-total`. The previously introduced metrics without ns will be deprecated but available for backward compatibility. KIP-768 continues the work started in KIP-255 to implement the necessary interfaces for a production-grade way to connect to an OpenID identity provider for authentication and token retrieval. This update provides an out-of-the-box implementation of an `AuthenticateCallbackHandler` that can be used to communicate with OAuth/OIDC. Additionally, this Kafka release introduces two new metrics for active brokers specifically, `ActiveBrokerCount` and `FenceBrokerCount`. These two metrics expose the number of active brokers in the cluster known by the controller and the number of fenced brokers known by the controller. Tune in to learn more about the Apache Kafka 3.1 release! EPISODE LINKSApache Kafka 3.1 release notes Read the blog to learn moreDownload Apache Kafka 3.1Watch the video version of this podcastSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app