Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Confluent

Hi, we’re Tim Berglund, Adi Polak, and Viktor Gamov and we’re excited to bring you the Confluent Developer podcast (formerly “Streaming Audio.”) Our hand-crafted weekly episodes feature in-depth interviews with our community of software developers (actual human beings - not AI) talking about some of the most interesting challenges they’ve faced in their careers. We aim to explore the conditions that gave rise to each person’s technical hurdles, as well as how their experiences transformed their understanding and approach to building systems. Whether you’re a seasoned open source data streaming engineer, or just someone who’s interested in learning more about Apache Kafka®, Apache Flink® and real-time data, we hope you’ll appreciate the stories, the discussion, and our effort to bring you a high-quality show worth your time.

Episodes

Mentioned books

Jun 15, 2021 • 26min

Boosting Security for Apache Kafka with Confluent Cloud Private Link ft. Dan LaMotte

Confluent Cloud isn’t just for public access anymore. As the requirement for security across sectors increases, so does the need for virtual private cloud (VPC) connections. It is becoming more common today to come across Apache Kafka® implementations with the latest private link connectivity option. In the past, most Confluent Cloud users were satisfied with public connectivity paths and VPC peering. However, enabling private links on the cloud is increasingly important for security across networks and even the reliability of stream processing. Dan LaMotte, who since this recording became a staff software engineer II, and his team are focused on making secure connections for customers to utilize Confluent Cloud. This is done by allowing two VPCs to connect without sharing their own private IP address space. There’s no crossover between them, and it lends itself to entirely secure connection unidirectional connectivity from customer to service provider without sharing IPs. But why do clients still want to peer if they have the option to interface privately? Dan explains that peering has been known as the base architecture for this type of connection. Peering at the core is just point-to-point cloud connections that happen between two VPCs. With global connectivity becoming more commonplace and the rise of globally distributed working teams, networks are often not even based in the same region. Regardless of region, however, organizations must take the level of security into account. Peering and transit gateways with a high level of analogy are the new baseline for these use cases, and this is where Kafka’s private links come in handy. Private links now allow team members to connect to Confluent Cloud instantaneously without depending on the internet. You can directly connect all of your multi-cloud options and microservices within your own secure space that is private to the company and to specific IP addresses. Also, the connection must be initiated on the client side for an increased security measure. With the option of private links, you can now also build microservices that use new functionality that wasn’t available in the past, such as:Multi-cloud clustersProduct enhancements with IP rangesUnlimited IP space Scalability Load balancingYou no longer need to segment your workflow, thanks to completely secure connections between teams that are otherwise disconnected from one another. EPISODE LINKSSecuring the Cloud with VPC Peering ft. Daniel LaMotteSoftware Engineer, Cloud Networking [Remote – AMER]Software Engineer, Cloud Networking [Remote – USA]eBPF documentationWatch the video version SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Jun 10, 2021 • 9min

Engaging Database Partials with Apache Kafka for Distributed System Consistency ft. Pat Helland

When compiling database reports using a variety of data from different systems, obtaining the right data when you need it in real time can be difficult. With cloud connectivity and distributed data pipelines, Pat Helland (Principal Architect, Salesforce) explains how to make educated partial answers when you need to use the Apache Kafka® platform. After all, you can’t get guarantees across a distance, making it critical to consider partial results.Despite best efforts, managing systems from a distance can result in lag time. The secret, according to Helland, is to anticipate these situations and have a plan for when (not if) they happen. Your outputs may be incomplete from time to time, but that doesn’t mean that there isn’t valuable information and data to be shared. Although you cannot guarantee that stream data will be available when you need it, you can gather replicas within a batch to obtain a consistent result, also known as convergence. Distributed systems of all sizes and across large distances rely on reference architecture for database reporting. Plan and anticipate that there will be incomplete inputs at times. Regardless of the types of data that you’re using within a distributed database, there are many inferences that can be made from repetitive monitoring over time. There would be no reason to throw out data from 19 machines when you’re only waiting on one while approaching a deadline. You can make the sources that you have work by making the most out of what is available in the presence of a partition for the overall distributed database.Confluent Cloud and convergence capabilities have allowed Salesforce to make decisions very quickly even when only partial data is available using replicated systems across multiple databases. This analytical approach is vital for consistency for large enterprises, especially those that depend on multi-cloud functionality. EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

May 13, 2021 • 32min

The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka ft. Jason Gustafson and Colin McCabe

Jason Gustafson and Colin McCabe, Apache Kafka® developers, discuss the project to remove ZooKeeper—now known as the KRaft (Kafka on Raft) project. A previous episode of Streaming Audio featured both developers on the podcast before the release of Apache Kafka 2.8. Now they’re back to share their progress.The KRraft code has been merged (and continues to be merged) in phases. Both developers talk about the foundational Kafka Improvement Proposals (KIPs), such as KIP-595: a Raft protocol for Kafka, and KIP-631: the quorum-based Kafka controller. The idea going into this new release was to give users a chance to try out no-ZooKeeper mode for themselves. There are a lot of exciting milestones on the way for KRaft. The next release will feature Raft snapshot support, as well as support for running with security authorizers enabled. EPISODE LINKSKIP-500: Apache Kafka Without ZooKeeper ft. Colin McCabe and Jason GustafsonWhat’s New in Apache Kafka 2.8Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

May 4, 2021 • 27min

Resilient Edge Infrastructure for IoT Using Apache Kafka ft. Kai Waehner

What is the internet of things (IoT), and how does it relate to event streaming and Apache Kafka®? The deployment of Kafka outside the datacenter creates many new possibilities for processing data in motion and building new business cases.In this episode, Kai Waehner, field CTO and global technology advisor at Confluent, discusses the intersection of edge data infrastructure, IoT, and cloud services for Kafka. He also details how businesses get into the sticky situation of not accounting for solutions when data is running dangerously close to the edge. Air-gapped environments and strong security requirements are the norm in many edge deployments.Defining the edge for your industry depends on what sector you’re in plus the amount of data and interaction involved with your customers. The edge could lie on various points of the spectrum and carry various meanings to various people. Before you can deploy Kafka to the edge, you must first define where that edge is as it relates to your connectivity needs. Edge resiliency enables your enterprise to not only control your datacenter with ease but also preserve the data without privacy risks or data leaks. If a business does not have the personnel to handle these big IT jobs on their own or an organization simply does not have an IT department at all, this is where Kafka solutions can come in to fill the gap. This podcast explores use cases and architectures at the edge (i.e., outside the datacenter) across industries, including manufacturing, energy, retail, restaurants, and banks. The trade-offs of edge deployments are compared to a hybrid integration with Confluent Cloud. EPISODE LINKSProcessing IoT Data End to End with MQTT & KafkaEnd-to-End Integration: IoT Edge to ConfluentInfrastructure Checklist for Kafka at the EdgeUse Cases & Architectures for Kafka at the EdgeArchitecture Patterns for Distributed, Hybrid, Edge & Global Kafka DeploymentsBuilding a Smart Factory with Kafka & 5G Campus NetworksKafka Is the New Black at the Edge in Industrial IoT, Logistics & Retailing Kafka, KSQL & Apache PLC4X for IIoT Data IntegSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Apr 29, 2021 • 28min

Data Management and Digital Transformation with Apache Kafka at Van Oord

Imagine if you could create a better world for future generations simply by delivering marine ingenuity. Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure.Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becoming a data-driven company requires that all data connected, synchronized, and visualized—in fact, truly digitized.This requires a central nervous system that supports:Legacy (monolith environment) as well as microservicesELT/ETL/streaming ETLAll types of data, including transactional, streaming, geo, machine, and (sea) survey/bathymetryMaster data/enterprise common data modelThe need for agility and speed makes it necessary to have a fully integrated DevOps-infrastructure-as-code environment, where data lineage, data governance, and enterprise architecture are holistically embedded. Thousands of topics need to be developed, updated, tested, accepted, and deployed each day. This together with different scripts for connectors requires a holistic data management solution, where data lineage, data governance and enterprise architecture are an integrated part.Thus, Marlon Hiralal (Enterprise/Data Management Architect, Van Oord) and Andreas Wombacher (Data Engineer, Van Oord) turned to Confluent for a three-month proof of concept and explored the pre-prep stage of using Apache Kafka® on Van Oord’s vessels.Since the environment in Van Oord is dynamic with regards to the application landscape and offered services, it is essential that a stable environment with controlled continuous integration and deployment is applied. Beyond the software components itself, this also applies to configurations and infrastructure, as well as applying the concept of CI/CD with infrastructure as code. The result: using Terraform and Confluent together.Publishing information is treated as a product at Van Oord. An information product is a set of Kafka topics: topics to communicate change (via change data capture) and topics for sharing the state of a data source (Kafka tables). The set of all information products forms the enterprise data model.Apache Atlas is used as a data dictionary and governance tool to capture the meaning of different information products. All changes in the data dictionary are available as an information product in Confluent, allowing for consumers of information products to subscribe to the information and be notified about changes.Van Oord’s enterprise architecture model must remain up to date and aligned with the current implementation. This is achieved by automatically inspecting and analyzing Confluent data flows. Fortunately, Confluent embeds homogeneously in this holistic reference architecture. The basis of the holistic reference architecture is a change data capture (CDC) layer and a persistent layer, which makes Confluent the core component of the VanSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Apr 22, 2021 • 31min

Powering Microservices Using Apache Kafka on Node.js with KafkaJS at Klarna ft. Tommy Brunn

At Klarna, Lead Engineer Tommy Brunn is building a runtime platform for developers. But outside of his professional role, he is also one of the authors of the JavaScript client for Apache Kafka® called KafkaJS, which has grown from being a niche open source project to the most downloaded Kafka client for Node.js since 2018.Using Kafka in Node.js has previously meant relying on community-contributed bindings to librdkafka, which required you to spend more of your time debugging failed builds than working on your application. With the original authors moving away from supporting the bindings, and the community only partially picking up the slack, using Kafka on NodeJS was a painful proposition.Kafka is a core part of Klarna’s microservice architecture, with hundreds of services using it to communicate among themselves. In 2017, as their engineering team was building the ecosystem of Node.js services powering the Klarna app, it was clear that the experience of working with any of the available Kafka clients was not good enough, so they decided to perform something similar for the Erlang client, Brod, and build their own. Rather than wrapping librdkafka, their client is a complete reimplementation in native JavaScript, allowing for a far superior user experience at the cost of being a lot more work to implement. Towards the end of 2017, KafkaJS 0.1.0 was released.Tommy has also used KafkaJS to build several Kafka-powered services at Klarna, as well as worked on supporting libraries such as integrations with Confluent Schema Registry and Zstandard compression.Since KafkaJS is written entirely in JavaScript, there is no build step required. It will work 100% of the time in any version of Node.js and evolve together with the platform with no effort required from the end user. It also unlocks some creative use cases. For example, Klarna once did an experiment where they got it to run in a browser. KafkaJS will also run on any platform that’s supported by Node.js, such as ARM. Klarna’s “no dependencies” policy also means that the deployment footprint is small, which makes it a perfect fit for serverless environments.EPISODE LINKSNode.js ❤️ Apache Kafka – Getting Started with KafkaJS Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Apr 19, 2021 • 11min

Apache Kafka 2.8 - ZooKeeper Removal Update (KIP-500) and Overview of Latest Features

Apache Kafka 2.8 is out! This release includes early access to the long-anticipated ZooKeeper removal encapsulated in KIP-500, as well as other key updates, including the addition of a Describe Cluster API, support for mutual TLS authentication on SASL_SSL listeners, exposed task configurations in the Kafka Connect REST API, the removal of a properties argument for the TopologyTestDriver, the introduction of a Kafka Streams specific uncaught exception handler, improved handling of window size in Streams, and more.EPISODE LINKSRead about what’s new in Apache Kafka 2.8Check out the Apache Kafka 2.8 release notesWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app