

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov
Confluent
Hi, we’re Tim Berglund, Adi Polak, and Viktor Gamov and we’re excited to bring you the Confluent Developer podcast (formerly “Streaming Audio.”) Our hand-crafted weekly episodes feature in-depth interviews with our community of software developers (actual human beings - not AI) talking about some of the most interesting challenges they’ve faced in their careers. We aim to explore the conditions that gave rise to each person’s technical hurdles, as well as how their experiences transformed their understanding and approach to building systems. Whether you’re a seasoned open source data streaming engineer, or just someone who’s interested in learning more about Apache Kafka®, Apache Flink® and real-time data, we hope you’ll appreciate the stories, the discussion, and our effort to bring you a high-quality show worth your time.
Episodes
Mentioned books

Apr 14, 2021 • 32min
Connecting Azure Cosmos DB with Apache Kafka - Better Together ft. Ryan CrawCour
When building solutions for customers in Microsoft Azure, it is not uncommon to come across customers who are deeply entrenched in the Apache Kafka® ecosystem and want to continue expanding within it. Thus, figuring out how to connect Azure first-party services to this ecosystem is of the utmost importance.Ryan CrawCour is a Microsoft engineer who has been working on all things data and analytics for the past 10+ years, including building out services like Azure Cosmos DB, which is used by millions of people around the globe. More recently, Ryan has taken a customer-facing role where he gets to help customers build the best solutions possible using Microsoft Azure’s cloud platform and development tools. In one case, Ryan helped a customer leverage their existing Kafka investments and persist event messages in a durable managed database system in Azure. They chose Azure Cosmos DB, a fully managed, distributed, modern NoSQL database service as their preferred database, but the question remained as to how they would feed events from their Kafka infrastructure into Azure Cosmos DB, as well as how they could get changes from their database system back into their Kafka topics. Although integration is in his blood, Ryan confesses that he is relatively new to the world of Kafka and has learned to adjust to what he finds in his customers’ environments. Oftentimes this is Kafka, and for many good reasons, customers don’t want to change this core part of their solution infrastructure. This has led him to embrace Kafka and the ecosystem around it, enabling him to better serve customers. He’s been closely tracking the development and progress of Kafka Connect. To him, it is the natural step from Kafka as a messaging infrastructure to Kafka as a key pillar in an integration scenario. Kafka Connect can be thought of as a piece of middleware that can be used to connect a variety of systems to Kafka in a bidirectional manner. This means getting data from Kafka into your downstream systems, often databases, and also taking changes that occur in these systems and publishing them back to Kafka where other systems can then react. One day, a customer asked him how to connect Azure Cosmos DB to Kafka. There wasn’t a connector at the time, so he helped build two with the Confluent team: a sink connector, where data flows from Kafka topics into Azure Cosmos DB, as well as a source connector, where Azure Cosmos DB is the source of data pushing changes that occur in the database into Kafka topics.EPISODE LINKSIntegrating Azure and Confluent: Ingesting Data to Azure Cosmos DB through Apache Kafka Download the Azure Cosmos DB Connector (Source and Sink) Join the Confluent CommunityGitHub: Kafka Connect for Azure Cosmos DBSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Apr 12, 2021 • 25min
Automated Cluster Operations in the Cloud ft. Rashmi Prabhu
If you’ve heard the term “clusters,” then you might know it refers to Confluent components and features that we run in all three major cloud providers today, including an event streaming platform based on Apache Kafka®, ksqlDB, Kafka Connect, the Kafka API, databalancers, and Kafka API services. Rashmi Prabhu, a software engineer on the Control Plane team at Confluent, has the opportunity to help govern the data plane that comprises all these clusters and enables API-driven operations on these clusters. But running operations on the cloud in a scaling organization can be time consuming, error prone, and tedious. This episode addresses manual upgrades and rolling restarts of Confluent Cloud clusters during releases, fixes, experiments, and the like, and more importantly, the progress that’s been made to switch from manual operations to an almost fully automated process. You’ll get a sneak peek into what upcoming plans to make cluster operations a fully automated process using the Cluster Upgrader, a new microservice in Java built with Vertx. This service runs as part of the control plane and exposes an API to the user to submit their workflows and target a set of clusters. It performs statement management on the workflow in the backend using Postgres.So what’s next? Looking forward, there will be the selection phase will be improved to support policy-based deployment strategies that enable you to plan ahead and choose how you want to phase your deployments (e.g., first Azure followed by part of Amazon Web Services and then Google Cloud, or maybe Confluent internal clusters on all cloud providers followed by customer clusters on Google Cloud, Azure, and finally AWS)—the possibilities are endless! The process will become more flexible, more configurable, and more error tolerant so that you can take measured risks and experience a standardized way of operating Cloud. In addition, expanding operation automations to internal application deployments and other kinds of fleet management operations that fit the “Select/Apply/Monitor” paradigm are in the works.EPISODE LINKSWatch Project Metamorphosis videos Learn about elastic scaling with Apache KafkaNick Carr: The Many Ways Cloud Computing Will Disrupt IT Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (detailsSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Apr 7, 2021 • 25min
Resurrecting In-Sync Replicas with Automatic Observer Promotion ft. Anna McDonald
As most developers and architects know, data always needs to be accessible no matter what happens outside of the system. This week, Tim Berglund virtually sits down with Anna McDonald (Principal Customer Success Technical Architect, Confluent) to discuss how Automatic Observer Promotion (AOP) can help solve the Apache Kafka® 2.5 datacenter dilemma as a feature now available in Confluent Platform 6.1 and above. Many industries must have a backup plan not only to do the right thing by the data that they collect but because they are regulated by law to do so. Anna has a knack for preparing operations that makes replication of data possible both synchronously and asynchronously. To avoid roadblocks in stretch clusters, she’s found that you need both a replication factor and a minimum in-sync replica (ISR). There needs to be a consideration for not just one but multiple copies for the protection of your data criteria. Not replicating the correct number on the datacenter can mean that your application is down, and there’s no way to retrieve vital information during this outage. The presence of observers enables asynchronous replicas that don’t count towards that minimum ISR. These ISRs work because they help recover data without invalidating any other standards. Architects should try to maintain topic availability during an event in a two-zone configuration. This assures that the writes go to both zones during normal operation without compromise. With the newest version of Confluent, you can get data in sync and within the minimum ISR. AOP is an excellent solution for developers who want to prepare for the unexpected and maintain accessibility across zones. When you can avoid manual interruption, you’re more likely to avoid errors and tedious operations, which would otherwise lead to a higher probability of data loss. In other exciting news, Anna shares about discovering patterns in order to make the entire Confluent ecosystem more automatic. EPISODE LINKSAutomatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1Amusing Ourselves to DeathJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Mar 31, 2021 • 31min
Building Real-Time Data Pipelines with Microsoft Azure, Databricks, and Confluent
Processing data in real time is a process, as some might say. Angela Chu (Solution Architect, Databricks) and Caio Moreno (Senior Cloud Solution Architect, Microsoft) explain how to integrate Azure, Databricks, and Confluent to build real-time data pipelines that enable you to ingest data, perform analytics, and extract insights from data at hand. They share about where to start within the Apache Kafka® ecosystem and how to maximize the tools and components that it offers using fully managed services like Confluent Cloud for data in motion.EPISODE LINKSConsuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure Azure Data Lake Storage Gen2 introductionBest practices for using Azure Data Lake Storage Gen2Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Mar 24, 2021 • 51min
Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman
Availability in Kafka Streams is hard, especially in the face of any changes. Any change to topic metadata or group membership triggers a rebalance. But Kafka Streams struggles even after this stop-the-world rebalance has finished. According to Apache Kafka® Committer and Confluent Software Engineer Sophie Blee-Goldman, this is because a Streams app will generally have some state associated with a given partition, and to move this state from one consumer instance to another requires rebuilding this state from a special backing topic called a changelog, the source of truth for a partition’s state. Restoring the changelog can take hours, and until the state is ready, Streams can’t do any further processing on that partition. Furthermore, it can’t serve any requests for local state until the local state is “caught up” with the changelog. So scaling out your Streams application results in pretty significant downtime—which is a bummer, especially if the reason for scaling out in the first place was to handle a particularly heavy workload.To solve the stop-the-world rebalance, we have to find a way to safely assign partitions so we can be confident that they’ve been revoked from their previous owner before being given to a new consumer. To solve the scaling out problem in Kafka Streams, we go a step further. When you add a new instance to your Streams application, we won’t immediately assign any stateful partitions to it. Instead, we’ll leave them assigned to their current owner to continue processing and serving queries as usual. During this time, the new instance will start to “warm up” the local state in the background; it starts consuming from the changelog and building up the local state. We then follow a similar pattern as in cooperative rebalancing, and issue a follow-up rebalance. In KIP-441, we call these probing rebalances. Every so often (i.e., 10 minutes by default), we trigger a rebalance. In the member’s subscription metadata that it sends to the group leader, each member encodes the current status of its local state. We use the changelog lag as a measure of how “caught up” a partition is. During a rebalance, only instances that are completely caught up are allowed to own stateful tasks; everything else must first warm up the state. So long as there is some task still warming up on a node, we will “probe” with rebalances until it’s ready.EPISODE LINKSFrom Eager to Smarter in Apache Kafka Consumer RebalancesKIP-429: Kafka Consumer Incremental Rebalance Protocol KIP-441: Smooth Scaling Out for Kafka StreamsJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Mar 17, 2021 • 43min
Event-Driven Architecture - Common Mistakes and Valuable Lessons ft. Simon Aubury
Event-driven architecture has taken on numerous meanings over the years—from event notification to event-carried state transfer, to event sourcing, and CQRS. Why has event-driven programming become so popular, and why is it such a topic of interest? For the first time, Simon Aubury (Principal Data Engineer, ThoughtWorks) joins Tim Berglund on the Streaming Audio podcast to tell all, including his own experiences adopting event-driven technologies and common blunders when working in this area.Simon admits that he’s made some mistakes and learned some valuable lessons that can benefit others. Among these are accidentally building a message bus, the idea that messages are not events, realizing that getting too fixated on the size of a microservice is the wrong problem, the importance of understanding events and boundaries, defining choreography vs. orchestration, and dealing with passive-aggressive events.This brings Simon to where he is today, as he advocates for Apache Kafka® as a foundation for building a scalable, event-driven architecture and data-intensive applications. EPISODE LINKSShould You Put Several Event Types in the Same Kafka Topic? Meetup Recording: Event-Driven Architecture Mistakes – I’ve Made a FewJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Mar 8, 2021 • 45min
The Human Side of Apache Kafka and Microservices ft. SPOUD
Many industries depend on real-time data, requiring a range of solutions that Apache Kafka® can help solve. Samuel Benz (CTO) and Patrick Bönzli (Product Owner) explain how their company, SPOUD, has fully embraced Kafka for data delivery, which has proven to be successful for SPOUD since 2016 across various industries and use cases. The four Kafka use cases that Sam and Patrick see most often are microservices, event processing, event sourcing/the data lake, and integration architecture. But implementing streaming software for each of these areas is not without its challenges. It’s easy to become frustrated by trivial problems that arise when integrating Kafka into the enterprise, because it’s not just about technology but also people and how they react to a new technology that they are not yet familiar with. Should enterprises be scared of Kafka? Why can it be hard to adopt Kafka? How do you drive Kafka adoption internally? All good questions.When adopting Kafka into a new data service, there will be challenges from a data sharing perspective, but with the right architecture, the possibilities are endless. Kafka enables collaboration on previously siloed data in a controlled and layered way. Sam and Patrick’s goal today is to educate others on Kafka and show what success looks like from a data-driven point of view. It’s not always easy, but in the end, event streaming is more than worth it. EPISODE LINKSRead blog posts from SPOUDImprove the Quality of Breaks with KafkaApache Kafka PyramidReady, Steady, Connect. Help Your Organization to Appreciate KafkaAGOORA by SPOUDJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Mar 3, 2021 • 34min
Gamified Fitness at Synthesis Software Technologies Using Apache Kafka and IoT
Synthesis Software Technologies, a Confluent partner, is migrating an existing behavioral IoT framework into Kafka to streamline and normalize vendor information. The legacy messaging technology that they currently use has altered the behavioral IoT data space, and now Apache Kafka® will allow them to take that to the next level. New ways of normalizing the data will allow for increased efficiency for vendors, users, and manufacturers. It will also enable the scaling IoT technology going forward. Nick Walker (Principle of Streaming) and Yoni Lew (DevOps Developer) of Synthesis discuss how they utilize Confluent Platform in a personal behavior data pipeline provided by Vitality Group. Vitality Group promotes a shared-value insurance model, which sources behavioral change information and transforms it into personal incentives and rewards to members associated with their global partners.Yoni shares about the motivators of moving their data from an existing product over to Kafka. The decision was made for two reasons: taking different forms and features of existing data from vendors and streamlining it, and addressing how quickly users of the system want the processed data from the system. Kafka is the best choice for Synthesis because it can stream messages through various topics and workflows while storing them appropriately. It is especially important for Synthesis to be able to replay data as needed without losing its integrity. Yoni explains how Kafka gives them the opportunity to—even if something goes wrong downstream and someone doesn’t process something correctly—process the data on their own timeline and at their rate, because they have the data.The implementation of Kafka into Synthesis’ current workflow has allowed them to create new functionality for assisting various groups that use the data in different ways. This has furthermore opened up new options for the company to build up its framework using Kafka features that lead to creative reactive applications. With Kafka, Synthesis sees endless opportunities to integrate the data that they collect into usable, historical pipelines for long-term models. EPISODE LINKSJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Feb 22, 2021 • 48min
Becoming Data Driven with Apache Kafka and Stream Processing ft. Daniel Jagielski
When it comes to adopting event-driven architectures, a couple of key considerations often arise: the way that an asynchronous core interacts with external synchronous systems and the question of “how do I refactor my monolith into services?” Daniel Jagielski, a consultant working as a tech lead/dev manager at VirtusLab for Tesco, recounts how these very themes emerged in his work with European clients. Through observing organizations as they pivot toward becoming real time and event driven, Daniel identifies the benefits of using Apache Kafka® and stream processing for auditing, integration, pub/sub, and event streaming.He describes the differences between a provisioned cluster vs. managed cluster and the importance of this within the Kafka ecosystem. Daniel also dives into the risk detection platform used by Tesco, which he helped build as a VirtusLab consultant and that marries the asynchronous and synchronous worlds.As Tesco migrated from a legacy platform to event streaming, determining risk and anomaly detection patterns have become more important than ever. They need the flexibility to adjust due to changing usage patterns with COVID-19. In this episode, Daniel talks integrations with third parties, push-based actions, and materialized views/projects for APIs.Daniel is a tech lead/dev manager, but he’s also an individual contributor for the Apollo project (an ICE organization) focused on online music usage processing. This means working with data in motion; breaking the monolith (starting with a proof of concept); ETL migration to stream processing, and ingestion via multiple processes that run in parallel with record-level processing.EPISODE LINKSBuilding an Apache Kafka Center of Excellence Within Your Organization ft. Neil Buesing Risk Management in Retail with Stream ProcessingEvent Sourcing, Stream Processing and ServerlessIt’s Time for Streaming to Have a Maturity Model ft. Nick DeardenRead Daniel Jagielski's articles on the Confluent blogJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Feb 17, 2021 • 45min
Integrating Spring Boot with Apache Kafka ft. Viktor Gamov
Viktor Gamov (Developer Advocate, Confluent) joins Tim Berglund on this episode to talk all about Spring and Apache Kafka®. Viktor’s main focus lately has been helping developers build their apps with stream processing, and helping them do this effectively with different languages. Viktor recently hosted an online Spring Boot workshop that turned out to be a lot of fun. This means it was time to get him back on the show to talk about this all-important framework and how it integrates with Kafka and Kafka Streams. Spring Boot enables you to do more with less. Its features offer numerous benefits, making it easy to create standalone, production-grade Spring-based applications that you can just run. The pattern also runs inside the Spring framework for a long time. The Spring Integration Framework implements many enterprise integration patterns and also has a pre-built Kafka connector.Spring Boot was highly inspired by a 12-factor app manifesto that allows you to write portable apps and extract the configuration, providing different profiles for you to customize your deployment. This is a critical part of the Kafka client infrastructure. Even though it's a Java client, Confluent offers a native Spring for Apache Kafka integration for the configuration as a springboard inside Confluent Cloud. If you try to connect your application, you can copy a snippet and place it directly to your Spring application, which works with Confluent Cloud. Now, he’s working on bringing Spring Cloud Stream like YAML-based configuration into Confluent Cloud too so folks can easily copy and paste to work out of the box.To close, Viktor shares about an interesting new project that the Confluent Developer Relations team is working on. Stick around to hear all about it and learn how Spring and Kafka work together.EPISODE LINKSSpring example in streaming-opsAvro, Protobuf, Spring Boot, Kafka Streams and Confluent Cloud | LivestreamsEvent-Driven Microservices with Spring Boot and Confluent Cloud | Livestreams Choosing Christmas Movies with Kubernetes, Spring Boot, and Apache Kafka | Livestreams 015Spring Cloud Stream and Confluent Cloud | Livestreams 018Joining Forces with Spring Boot, Apache Kafka, and Kotlin ft. Josh LongMastering DevOps with Apache Kafka, Kubernetes, and Confluent Cloud ft. Rick Spurgeon and Allison WaltherFollow Viktor Gamov on TwitterSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.


