Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Confluent
undefined
Jun 15, 2021 • 26min

Boosting Security for Apache Kafka with Confluent Cloud Private Link ft. Dan LaMotte

Confluent Cloud isn’t just for public access anymore. As the requirement for security across sectors increases, so does the need for virtual private cloud (VPC) connections. It is becoming more common today to come across Apache Kafka® implementations with the latest private link connectivity option. In the past, most Confluent Cloud users were satisfied with public connectivity paths and VPC peering. However, enabling private links on the cloud is increasingly important for security across networks and even the reliability of stream processing. Dan LaMotte, who since this recording became a staff software engineer II, and his team are focused on making secure connections for customers to utilize Confluent Cloud. This is done by allowing two VPCs to connect without sharing their own private IP address space. There’s no crossover between them, and it lends itself to entirely secure connection unidirectional connectivity from customer to service provider without sharing IPs. But why do clients still want to peer if they have the option to interface privately? Dan explains that peering has been known as the base architecture for this type of connection. Peering at the core is just point-to-point cloud connections that happen between two VPCs. With global connectivity becoming more commonplace and the rise of globally distributed working teams, networks are often not even based in the same region. Regardless of region, however, organizations must take the level of security into account. Peering and transit gateways with a high level of analogy are the new baseline for these use cases, and this is where Kafka’s private links come in handy. Private links now allow team members to connect to Confluent Cloud instantaneously without depending on the internet. You can directly connect all of your multi-cloud options and microservices within your own secure space that is private to the company and to specific IP addresses. Also, the connection must be initiated on the client side for an increased security measure. With the option of private links, you can now also build microservices that use new functionality that wasn’t available in the past, such as:Multi-cloud clustersProduct enhancements with IP rangesUnlimited IP space Scalability Load balancingYou no longer need to segment your workflow, thanks to completely secure connections between teams that are otherwise disconnected from one another. EPISODE LINKSSecuring the Cloud with VPC Peering ft. Daniel LaMotteSoftware Engineer, Cloud Networking [Remote – AMER]Software Engineer, Cloud Networking [Remote – USA]eBPF documentationWatch the video version SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Jun 10, 2021 • 9min

Confluent Platform 6.2 | What’s New in This Release + Updates

Based on Apache Kafka® 2.8, Confluent Platform 6.2 introduces Health+, which offers intelligent alerting, cloud-based monitoring tools, and accelerated support so that you can get notified of potential issues before they manifest as critical problems that lead to downtime and business disruption.Health+ provides ongoing, real-time analysis of performance and cluster metadata for your Confluent Platform deployment, collecting only metadata so that you can continue managing your deployment, as you see fit, with complete control.With cluster metadata being continuously analyzed, through an extensive library of expert-tested rules and algorithms, you can quickly get insights to cluster performance and spot potential problems before they occur using Health+. To ensure complete visibility, organizations can customize the types of notifications that they receive and choose to receive them via Slack, email, or webhook. Each notification that you receive is aimed at avoiding larger downtime or data loss by helping identify smaller issues before they become bigger problems.In today’s episode, Tim Berglund (Senior Director of Developer Experience, Confluent) highlights everything that’s new in Confluent Platform 6.2 and all the latest updates.EPISODE LINKSCheck out the release notesRead the blog post: Introducing Health+ with Confluent Platform 6.2Download Confluent Platform 6.2Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Jun 8, 2021 • 33min

Adopting OpenTelemetry in Confluent and Beyond ft. Xavier Léauté

Collecting internal, operational telemetry from Confluent Cloud services and thousands of clusters is no small feat. Stakeholders need to rely on the same data to make operational decisions. Whether it be metrics from clusters in Confluent Cloud or traces from our internal service, they all provide valuable insights not only to engineering teams but also to customers for their own operations and for business reporting needs. Traditionally, this data needs to be collected in multiple ways to satisfy all the different requirements. We leverage third-party vendors for our operational needs, which usually means deploying vendor agents or libraries in addition to our own, as we also need to collect some of the same data to expose to customers.However, this sometimes leads to discrepancies between various systems, which are often hard to reconcile and make it harder to troubleshoot issues across engineering, data science, and other teams.One of the earliest software engineers at Confluent, Xavier Léauté is no stranger to this. At Confluent, he leads our observability engineering efforts in Confluent Cloud.With OpenTelemetry, we can collect data in a vendor-agnostic way. It defines a standard format that all our services can use to expose telemetry, and it provides Go and Java libraries that we can use to instrument our services. Many vendors already integrate with OpenTelemetry, which gives us the flexibility to try out different observability solutions with minimal effort, without the need to rewrite applications or deploy new agents. This means that the same data we send to third parties can also be collected internally (in our own clusters).The same source of data can then be leveraged in many different ways:Using Kafka Connect, we can send this data to our data warehouse and data science teams in real time to derive many of the metrics that we use to track the health of our cloud businessThat very same data also powers our Cloud Metrics API to provide our customers visibility into their infrastructureEngineers and support teams can collect more fine-grained data to troubleshoot incidents or understand low-level application behaviorWe’ve also adopted the same approach for on-prem customers, which enables us to collect telemetry into our cloud and help them troubleshoot issues, leveraging the same infrastructure that we already built for Cloud. Regarding OpenTelemetry efforts in Apache Kafka®, we’re working on KIP-714 which will allow us to collect Kafka client metrics to help better understand client-side problems without the need to instrument client applications. Our ultimate goal has always been to migrate to OpenTelemetry, which is now underway. We’d like to make a way for direct integration with OpenTelemetry in Kafka, based on the work that we’ve done at Confluent.EPISODE LINKSOpenTelemetry Twitch channelConfluent Cloud Metrics APISEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
May 25, 2021 • 39min

Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra

Focused on optimizing Apache Kafka® performance with maximized efficiency, Confluent’s Product Infrastructure team has been actively exploring opportunities for scaling out Kafka clusters. They are able to run Kafka workloads with half the typical memory usage while saving infrastructure costs, which they have tested and now safely rolled out across Confluent Cloud. After spending seven years at Amazon Web Services (AWS) working on search services and Amazon Aurora as a software engineer, Adithya Chandra decided to apply his expertise in cluster management, load balancing, elasticity, and performance of search and storage clusters to the Confluent team.Last year, Confluent shipped Tiered Storage, which moves eligible data to remote storage from a Kafka broker. As most of the data moves to remote storage, we can upgrade to better storage volumes backed by solid-state drives (SSDs). SSDs are capable of higher throughput compared to hard disk drives (HDDs), capable of fast, random IO, yet more expensive per provisioned gigabyte. Given that SSDs are useful at random IO and can support higher throughput, Confluent started investigating whether it was possible to run Kafka with lesser RAM, which is comparatively much more expensive per gigabyte compared to SSD. Instance types in the cloud had the same CPU but half the memory was 20% cheaper.In this episode, Adithya covers how to run Kafka more efficiently on Confluent Cloud and dives into the following:Memory allocation on an instance running KafkaWhat is a JVM heap? Why should it be sized? How much is enough? What are the downsides of a small heap?Memory usage of Datadog, Kubernetes, and other processes, and allocating memory correctlyWhat is the ideal page cache size? What is a page cache used for? Are there any parameters that can be tuned? How does Kafka use the page cache?Testing via the simulation of a variety of workloads using TrogdorHigh-throughput, high-connection, and high-partition tests and their resultsAvailable cloud hardware and finding the best fit, including choosing the number of instance types, migrating from one instance to another, and using nodepools to migrate brokers safely, one by oneWhat do you do when your preferred hardware is not available? Can you run hybrid Kafka clusters if the preferred instance is not widely available?Building infrastructure that allows you to perform testing easily and that can support newer hardware faster (ARM processors, SSDs, etc.)EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
May 20, 2021 • 42min

Engaging Database Partials with Apache Kafka for Distributed System Consistency ft. Pat Helland

When compiling database reports using a variety of data from different systems, obtaining the right data when you need it in real time can be difficult. With cloud connectivity and distributed data pipelines, Pat Helland (Principal Architect, Salesforce) explains how to make educated partial answers when you need to use the Apache Kafka® platform. After all, you can’t get guarantees across a distance, making it critical to consider partial results.Despite best efforts, managing systems from a distance can result in lag time. The secret, according to Helland, is to anticipate these situations and have a plan for when (not if) they happen. Your outputs may be incomplete from time to time, but that doesn’t mean that there isn’t valuable information and data to be shared. Although you cannot guarantee that stream data will be available when you need it, you can gather replicas within a batch to obtain a consistent result, also known as convergence. Distributed systems of all sizes and across large distances rely on reference architecture for database reporting. Plan and anticipate that there will be incomplete inputs at times. Regardless of the types of data that you’re using within a distributed database, there are many inferences that can be made from repetitive monitoring over time. There would be no reason to throw out data from 19 machines when you’re only waiting on one while approaching a deadline. You can make the sources that you have work by making the most out of what is available in the presence of a partition for the overall distributed database.Confluent Cloud and convergence capabilities have allowed Salesforce to make decisions very quickly even when only partial data is available using replicated systems across multiple databases. This analytical approach is vital for consistency for large enterprises, especially those that depend on multi-cloud functionality. EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
May 13, 2021 • 32min

The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka ft. Jason Gustafson and Colin McCabe

Jason Gustafson and Colin McCabe, Apache Kafka® developers, discuss the project to remove ZooKeeper—now known as the KRaft (Kafka on Raft) project. A previous episode of Streaming Audio featured both developers on the podcast before the release of Apache Kafka 2.8. Now they’re back to share their progress.The KRraft code has been merged (and continues to be merged) in phases. Both developers talk about the foundational Kafka Improvement Proposals (KIPs), such as KIP-595: a Raft protocol for Kafka, and KIP-631: the quorum-based Kafka controller. The idea going into this new release was to give users a chance to try out no-ZooKeeper mode for themselves. There are a lot of exciting milestones on the way for KRaft. The next release will feature Raft snapshot support, as well as support for running with security authorizers enabled.  EPISODE LINKSKIP-500: Apache Kafka Without ZooKeeper ft. Colin McCabe and Jason GustafsonWhat’s New in Apache Kafka 2.8Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
May 4, 2021 • 27min

Resilient Edge Infrastructure for IoT Using Apache Kafka ft. Kai Waehner

What is the internet of things (IoT), and how does it relate to event streaming and Apache Kafka®? The deployment of Kafka outside the datacenter creates many new possibilities for processing data in motion and building new business cases.In this episode, Kai Waehner, field CTO and global technology advisor at Confluent, discusses the intersection of edge data infrastructure, IoT, and cloud services for Kafka. He also details how businesses get into the sticky situation of not accounting for solutions when data is running dangerously close to the edge. Air-gapped environments and strong security requirements are the norm in many edge deployments.Defining the edge for your industry depends on what sector you’re in plus the amount of data and interaction involved with your customers. The edge could lie on various points of the spectrum and carry various meanings to various people. Before you can deploy Kafka to the edge, you must first define where that edge is as it relates to your connectivity needs. Edge resiliency enables your enterprise to not only control your datacenter with ease but also preserve the data without privacy risks or data leaks. If a business does not have the personnel to handle these big IT jobs on their own or an organization simply does not have an IT department at all, this is where Kafka solutions can come in to fill the gap. This podcast explores use cases and architectures at the edge (i.e., outside the datacenter) across industries, including manufacturing, energy, retail, restaurants, and banks. The trade-offs of edge deployments are compared to a hybrid integration with Confluent Cloud. EPISODE LINKSProcessing IoT Data End to End with MQTT & KafkaEnd-to-End Integration: IoT Edge to ConfluentInfrastructure Checklist for Kafka at the EdgeUse Cases & Architectures for Kafka at the EdgeArchitecture Patterns for Distributed, Hybrid, Edge & Global Kafka DeploymentsBuilding a Smart Factory with Kafka & 5G Campus NetworksKafka Is the New Black at the Edge in Industrial IoT, Logistics & Retailing Kafka, KSQL & Apache PLC4X for IIoT Data IntegSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 29, 2021 • 28min

Data Management and Digital Transformation with Apache Kafka at Van Oord

Imagine if you could create a better world for future generations simply by delivering marine ingenuity. Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure.Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becoming a data-driven company requires that all data connected, synchronized, and visualized—in fact, truly digitized.This requires a central nervous system that supports:Legacy (monolith environment) as well as microservicesELT/ETL/streaming ETLAll types of data, including transactional, streaming, geo, machine, and (sea) survey/bathymetryMaster data/enterprise common data modelThe need for agility and speed makes it necessary to have a fully integrated DevOps-infrastructure-as-code environment, where data lineage, data governance, and enterprise architecture are holistically embedded. Thousands of topics need to be developed, updated, tested, accepted, and deployed each day. This together with different scripts for connectors requires a holistic data management solution, where data lineage, data governance and enterprise architecture are an integrated part.Thus, Marlon Hiralal (Enterprise/Data Management Architect, Van Oord) and Andreas Wombacher (Data Engineer, Van Oord) turned to Confluent for a three-month proof of concept and explored the pre-prep stage of using Apache Kafka® on Van Oord’s vessels.Since the environment in Van Oord is dynamic with regards to the application landscape and offered services, it is essential that a stable environment with controlled continuous integration and deployment is applied. Beyond the software components itself, this also applies to configurations and infrastructure, as well as applying the concept of CI/CD with infrastructure as code. The result: using Terraform and Confluent together.Publishing information is treated as a product at Van Oord. An information product is a set of Kafka topics: topics to communicate change (via change data capture) and topics for sharing the state of a data source (Kafka tables). The set of all information products forms the enterprise data model.Apache Atlas is used as a data dictionary and governance tool to capture the meaning of different information products. All changes in the data dictionary are available as an information product in Confluent, allowing for consumers of information products to subscribe to the information and be notified about changes.Van Oord’s enterprise architecture model must remain up to date and aligned with the current implementation. This is achieved by automatically inspecting and analyzing Confluent data flows. Fortunately, Confluent embeds homogeneously in this holistic reference architecture. The basis of the holistic reference architecture is a change data capture (CDC) layer and a persistent layer, which makes Confluent the core component of the VanSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 22, 2021 • 31min

Powering Microservices Using Apache Kafka on Node.js with KafkaJS at Klarna ft. Tommy Brunn

At Klarna, Lead Engineer Tommy Brunn is building a runtime platform for developers. But outside of his professional role, he is also one of the authors of the JavaScript client for Apache Kafka® called KafkaJS, which has grown from being a niche open source project to the most downloaded Kafka client for Node.js since 2018.Using Kafka in Node.js has previously meant relying on community-contributed bindings to librdkafka, which required you to spend more of your time debugging failed builds than working on your application. With the original authors moving away from supporting the bindings, and the community only partially picking up the slack, using Kafka on NodeJS was a painful proposition.Kafka is a core part of Klarna’s microservice architecture, with hundreds of services using it to communicate among themselves. In 2017, as their engineering team was building the ecosystem of Node.js services powering the Klarna app, it was clear that the experience of working with any of the available Kafka clients was not good enough, so they decided to perform something similar for the Erlang client, Brod, and build their own. Rather than wrapping librdkafka, their client is a complete reimplementation in native JavaScript, allowing for a far superior user experience at the cost of being a lot more work to implement. Towards the end of 2017, KafkaJS 0.1.0 was released.Tommy has also used KafkaJS to build several Kafka-powered services at Klarna, as well as worked on supporting libraries such as integrations with Confluent Schema Registry and Zstandard compression.Since KafkaJS is written entirely in JavaScript, there is no build step required. It will work 100% of the time in any version of Node.js and evolve together with the platform with no effort required from the end user. It also unlocks some creative use cases. For example, Klarna once did an experiment where they got it to run in a browser. KafkaJS will also run on any platform that’s supported by Node.js, such as ARM. Klarna’s “no dependencies” policy also means that the deployment footprint is small, which makes it a perfect fit for serverless environments.EPISODE LINKSNode.js ❤️ Apache Kafka – Getting Started with KafkaJS Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
undefined
Apr 19, 2021 • 11min

Apache Kafka 2.8 - ZooKeeper Removal Update (KIP-500) and Overview of Latest Features

Apache Kafka 2.8 is out! This release includes early access to the long-anticipated ZooKeeper removal encapsulated in KIP-500, as well as other key updates, including the addition of a Describe Cluster API, support for mutual TLS authentication on SASL_SSL listeners, exposed task configurations in the Kafka Connect REST API, the removal of a properties argument for the TopologyTestDriver, the introduction of a Kafka Streams specific uncaught exception handler, improved handling of window size in Streams, and more.EPISODE LINKSRead about what’s new in Apache Kafka 2.8Check out the Apache Kafka 2.8 release notesWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app