Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Confluent
undefined
Apr 22, 2021 • 31min

Powering Microservices Using Apache Kafka on Node.js with KafkaJS at Klarna ft. Tommy Brunn

At Klarna, Lead Engineer Tommy Brunn is building a runtime platform for developers. But outside of his professional role, he is also one of the authors of the JavaScript client for Apache Kafka® called KafkaJS, which has grown from being a niche open source project to the most downloaded Kafka client for Node.js since 2018.Using Kafka in Node.js has previously meant relying on community-contributed bindings to librdkafka, which required you to spend more of your time debugging failed builds than working on your application. With the original authors moving away from supporting the bindings, and the community only partially picking up the slack, using Kafka on NodeJS was a painful proposition.Kafka is a core part of Klarna’s microservice architecture, with hundreds of services using it to communicate among themselves. In 2017, as their engineering team was building the ecosystem of Node.js services powering the Klarna app, it was clear that the experience of working with any of the available Kafka clients was not good enough, so they decided to perform something similar for the Erlang client, Brod, and build their own. Rather than wrapping librdkafka, their client is a complete reimplementation in native JavaScript, allowing for a far superior user experience at the cost of being a lot more work to implement. Towards the end of 2017, KafkaJS 0.1.0 was released.Tommy has also used KafkaJS to build several Kafka-powered services at Klarna, as well as worked on supporting libraries such as integrations with Confluent Schema Registry and Zstandard compression.Since KafkaJS is written entirely in JavaScript, there is no build step required. It will work 100% of the time in any version of Node.js and evolve together with the platform with no effort required from the end user. It also unlocks some creative use cases. For example, Klarna once did an experiment where they got it to run in a browser. KafkaJS will also run on any platform that’s supported by Node.js, such as ARM. Klarna’s “no dependencies” policy also means that the deployment footprint is small, which makes it a perfect fit for serverless environments.EPISODE LINKSNode.js ❤️ Apache Kafka – Getting Started with KafkaJS Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 19, 2021 • 11min

Apache Kafka 2.8 - ZooKeeper Removal Update (KIP-500) and Overview of Latest Features

Apache Kafka 2.8 is out! This release includes early access to the long-anticipated ZooKeeper removal encapsulated in KIP-500, as well as other key updates, including the addition of a Describe Cluster API, support for mutual TLS authentication on SASL_SSL listeners, exposed task configurations in the Kafka Connect REST API, the removal of a properties argument for the TopologyTestDriver, the introduction of a Kafka Streams specific uncaught exception handler, improved handling of window size in Streams, and more.EPISODE LINKSRead about what’s new in Apache Kafka 2.8Check out the Apache Kafka 2.8 release notesWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 14, 2021 • 32min

Connecting Azure Cosmos DB with Apache Kafka - Better Together ft. Ryan CrawCour

When building solutions for customers in Microsoft Azure, it is not uncommon to come across customers who are deeply entrenched in the Apache Kafka® ecosystem and want to continue expanding within it. Thus, figuring out how to connect Azure first-party services to this ecosystem is of the utmost importance.Ryan CrawCour is a Microsoft engineer who has been working on all things data and analytics for the past 10+ years, including building out services like Azure Cosmos DB, which is used by millions of people around the globe. More recently, Ryan has taken a customer-facing role where he gets to help customers build the best solutions possible using Microsoft Azure’s cloud platform and development tools. In one case, Ryan helped a customer leverage their existing Kafka investments and persist event messages in a durable managed database system in Azure. They chose Azure Cosmos DB, a fully managed, distributed, modern NoSQL database service as their preferred database, but the question remained as to how they would feed events from their Kafka infrastructure into Azure Cosmos DB, as well as how they could get changes from their database system back into their Kafka topics. Although integration is in his blood, Ryan confesses that he is relatively new to the world of Kafka and has learned to adjust to what he finds in his customers’ environments. Oftentimes this is Kafka, and for many good reasons, customers don’t want to change this core part of their solution infrastructure. This has led him to embrace Kafka and the ecosystem around it, enabling him to better serve customers. He’s been closely tracking the development and progress of Kafka Connect. To him, it is the natural step from Kafka as a messaging infrastructure to Kafka as a key pillar in an integration scenario. Kafka Connect can be thought of as a piece of middleware that can be used to connect a variety of systems to Kafka in a bidirectional manner. This means getting data from Kafka into your downstream systems, often databases, and also taking changes that occur in these systems and publishing them back to Kafka where other systems can then react. One day, a customer asked him how to connect Azure Cosmos DB to Kafka. There wasn’t a connector at the time, so he helped build two with the Confluent team: a sink connector, where data flows from Kafka topics into Azure Cosmos DB, as well as a source connector, where Azure Cosmos DB is the source of data pushing changes that occur in the database into Kafka topics.EPISODE LINKSIntegrating Azure and Confluent: Ingesting Data to Azure Cosmos DB through Apache Kafka Download the Azure Cosmos DB Connector (Source and Sink) Join the Confluent CommunityGitHub: Kafka Connect for Azure Cosmos DBSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 12, 2021 • 25min

Automated Cluster Operations in the Cloud ft. Rashmi Prabhu

If you’ve heard the term “clusters,” then you might know it refers to Confluent components and features that we run in all three major cloud providers today, including an event streaming platform based on Apache Kafka®, ksqlDB, Kafka Connect, the Kafka API, databalancers, and Kafka API services. Rashmi Prabhu, a software engineer on the Control Plane team at Confluent, has the opportunity to help govern the data plane that comprises all these clusters and enables API-driven operations on these clusters. But running operations on the cloud in a scaling organization can be time consuming, error prone, and tedious. This episode addresses manual upgrades and rolling restarts of Confluent Cloud clusters during releases, fixes, experiments, and the like, and more importantly, the progress that’s been made to switch from manual operations to an almost fully automated process. You’ll get a sneak peek into what upcoming plans to make cluster operations a fully automated process using the Cluster Upgrader, a new microservice in Java built with Vertx. This service runs as part of the control plane and exposes an API to the user to submit their workflows and target a set of clusters. It performs statement management on the workflow in the backend using Postgres.So what’s next? Looking forward, there will be the selection phase will be improved to support policy-based deployment strategies that enable you to plan ahead and choose how you want to phase your deployments (e.g., first Azure followed by part of Amazon Web Services and then Google Cloud, or maybe Confluent internal clusters on all cloud providers followed by customer clusters on Google Cloud, Azure, and finally AWS)—the possibilities are endless! The process will become more flexible, more configurable, and more error tolerant so that you can take measured risks and experience a standardized way of operating Cloud. In addition, expanding operation automations to internal application deployments and other kinds of fleet management operations that fit the “Select/Apply/Monitor” paradigm are in the works.EPISODE LINKSWatch Project Metamorphosis videos Learn about elastic scaling with Apache KafkaNick Carr: The Many Ways Cloud Computing Will Disrupt IT Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (detailsSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 7, 2021 • 25min

Resurrecting In-Sync Replicas with Automatic Observer Promotion ft. Anna McDonald

As most developers and architects know, data always needs to be accessible no matter what happens outside of the system. This week, Tim Berglund virtually sits down with Anna McDonald (Principal Customer Success Technical Architect, Confluent) to discuss how Automatic Observer Promotion (AOP) can help solve the Apache Kafka® 2.5 datacenter dilemma as a feature now available in Confluent Platform 6.1 and above. Many industries must have a backup plan not only to do the right thing by the data that they collect but because they are regulated by law to do so. Anna has a knack for preparing operations that makes replication of data possible both synchronously and asynchronously. To avoid roadblocks in stretch clusters, she’s found that you need both a replication factor and a minimum in-sync replica (ISR). There needs to be a consideration for not just one but multiple copies for the protection of your data criteria. Not replicating the correct number on the datacenter can mean that your application is down, and there’s no way to retrieve vital information during this outage. The presence of observers enables asynchronous replicas that don’t count towards that minimum ISR. These ISRs work because they help recover data without invalidating any other standards. Architects should try to maintain topic availability during an event in a two-zone configuration. This assures that the writes go to both zones during normal operation without compromise. With the newest version of Confluent, you can get data in sync and within the minimum ISR. AOP is an excellent solution for developers who want to prepare for the unexpected and maintain accessibility across zones. When you can avoid manual interruption, you’re more likely to avoid errors and tedious operations, which would otherwise lead to a higher probability of data loss. In other exciting news, Anna shares about discovering patterns in order to make the entire Confluent ecosystem more automatic. EPISODE LINKSAutomatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1Amusing Ourselves to DeathJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 31, 2021 • 31min

Building Real-Time Data Pipelines with Microsoft Azure, Databricks, and Confluent

Processing data in real time is a process, as some might say. Angela Chu (Solution Architect, Databricks) and Caio Moreno (Senior Cloud Solution Architect, Microsoft) explain how to integrate Azure, Databricks, and Confluent to build real-time data pipelines that enable you to ingest data, perform analytics, and extract insights from data at hand. They share about where to start within the Apache Kafka® ecosystem and how to maximize the tools and components that it offers using fully managed services like Confluent Cloud for data in motion.EPISODE LINKSConsuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure Azure Data Lake Storage Gen2 introductionBest practices for using Azure Data Lake Storage Gen2Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 24, 2021 • 51min

Smooth Scaling and Uninterrupted Processing with Apache Kafka ft. Sophie Blee-Goldman

Availability in Kafka Streams is hard, especially in the face of any changes. Any change to topic metadata or group membership triggers a rebalance. But Kafka Streams struggles even after this stop-the-world rebalance has finished. According to Apache Kafka® Committer and Confluent Software Engineer Sophie Blee-Goldman, this is because a Streams app will generally have some state associated with a given partition, and to move this state from one consumer instance to another requires rebuilding this state from a special backing topic called a changelog, the source of truth for a partition’s state. Restoring the changelog can take hours, and until the state is ready, Streams can’t do any further processing on that partition. Furthermore, it can’t serve any requests for local state until the local state is “caught up” with the changelog. So scaling out your Streams application results in pretty significant downtime—which is a bummer, especially if the reason for scaling out in the first place was to handle a particularly heavy workload.To solve the stop-the-world rebalance, we have to find a way to safely assign partitions so we can be confident that they’ve been revoked from their previous owner before being given to a new consumer. To solve the scaling out problem in Kafka Streams, we go a step further. When you add a new instance to your Streams application, we won’t immediately assign any stateful partitions to it. Instead, we’ll leave them assigned to their current owner to continue processing and serving queries as usual. During this time, the new instance will start to “warm up” the local state in the background; it starts consuming from the changelog and building up the local state. We then follow a similar pattern as in cooperative rebalancing, and issue a follow-up rebalance. In KIP-441, we call these probing rebalances. Every so often (i.e., 10 minutes by default), we trigger a rebalance. In the member’s subscription metadata that it sends to the group leader, each member encodes the current status of its local state. We use the changelog lag as a measure of how “caught up” a partition is. During a rebalance, only instances that are completely caught up are allowed to own stateful tasks; everything else must first warm up the state. So long as there is some task still warming up on a node, we will “probe” with rebalances until it’s ready.EPISODE LINKSFrom Eager to Smarter in Apache Kafka Consumer RebalancesKIP-429: Kafka Consumer Incremental Rebalance Protocol KIP-441: Smooth Scaling Out for Kafka StreamsJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 17, 2021 • 43min

Event-Driven Architecture - Common Mistakes and Valuable Lessons ft. Simon Aubury

Event-driven architecture has taken on numerous meanings over the years—from event notification to event-carried state transfer, to event sourcing, and CQRS. Why has event-driven programming become so popular, and why is it such a topic of interest? For the first time, Simon Aubury (Principal Data Engineer, ThoughtWorks) joins Tim Berglund on the Streaming Audio podcast to tell all, including his own experiences adopting event-driven technologies and common blunders when working in this area.Simon admits that he’s made some mistakes and learned some valuable lessons that can benefit others. Among these are accidentally building a message bus, the idea that messages are not events, realizing that getting too fixated on the size of a microservice is the wrong problem, the importance of understanding events and boundaries, defining choreography vs. orchestration, and dealing with passive-aggressive events.This brings Simon to where he is today, as he advocates for Apache Kafka® as a foundation for building a scalable, event-driven architecture and data-intensive applications. EPISODE LINKSShould You Put Several Event Types in the Same Kafka Topic? Meetup Recording: Event-Driven Architecture Mistakes – I’ve Made a FewJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 8, 2021 • 45min

The Human Side of Apache Kafka and Microservices ft. SPOUD

Many industries depend on real-time data, requiring a range of solutions that Apache Kafka® can help solve. Samuel Benz (CTO) and Patrick Bönzli (Product Owner) explain how their company, SPOUD, has fully embraced Kafka for data delivery, which has proven to be successful for SPOUD since 2016 across various industries and use cases. The four Kafka use cases that Sam and Patrick see most often are microservices, event processing, event sourcing/the data lake, and integration architecture. But implementing streaming software for each of these areas is not without its challenges. It’s easy to become frustrated by trivial problems that arise when integrating Kafka into the enterprise, because it’s not just about technology but also people and how they react to a new technology that they are not yet familiar with. Should enterprises be scared of Kafka? Why can it be hard to adopt Kafka? How do you drive Kafka adoption internally? All good questions.When adopting Kafka into a new data service, there will be challenges from a data sharing perspective, but with the right architecture, the possibilities are endless. Kafka enables collaboration on previously siloed data in a controlled and layered way. Sam and Patrick’s goal today is to educate others on Kafka and show what success looks like from a data-driven point of view. It’s not always easy, but in the end, event streaming is more than worth it. EPISODE LINKSRead blog posts from SPOUDImprove the Quality of Breaks with KafkaApache Kafka PyramidReady, Steady, Connect. Help Your Organization to Appreciate KafkaAGOORA by SPOUDJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Mar 3, 2021 • 34min

Gamified Fitness at Synthesis Software Technologies Using Apache Kafka and IoT

Synthesis Software Technologies, a Confluent partner, is migrating an existing behavioral IoT framework into Kafka to streamline and normalize vendor information. The legacy messaging technology that they currently use has altered the behavioral IoT data space, and now Apache Kafka® will allow them to take that to the next level. New ways of normalizing the data will allow for increased efficiency for vendors, users, and manufacturers. It will also enable the scaling IoT technology going forward. Nick Walker (Principle of Streaming) and Yoni Lew (DevOps Developer) of Synthesis discuss how they utilize Confluent Platform in a personal behavior data pipeline provided by Vitality Group. Vitality Group promotes a shared-value insurance model, which sources behavioral change information and transforms it into personal incentives and rewards to members associated with their global partners.Yoni shares about the motivators of moving their data from an existing product over to Kafka. The decision was made for two reasons: taking different forms and features of existing data from vendors and streamlining it, and addressing how quickly users of the system want the processed data from the system. Kafka is the best choice for Synthesis because it can stream messages through various topics and workflows while storing them appropriately. It is especially important for Synthesis to be able to replay data as needed without losing its integrity. Yoni explains how Kafka gives them the opportunity to—even if something goes wrong downstream and someone doesn’t process something correctly—process the data on their own timeline and at their rate, because they have the data.The implementation of Kafka into Synthesis’ current workflow has allowed them to create new functionality for assisting various groups that use the data in different ways. This has furthermore opened up new options for the company to build up its framework using Kafka features that lead to creative reactive applications. With Kafka, Synthesis sees endless opportunities to integrate the data that they collect into usable, historical pipelines for long-term models. EPISODE LINKSJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app