Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Confluent

Hi, we’re Tim Berglund, Adi Polak, and Viktor Gamov and we’re excited to bring you the Confluent Developer podcast (formerly “Streaming Audio.”) Our hand-crafted weekly episodes feature in-depth interviews with our community of software developers (actual human beings - not AI) talking about some of the most interesting challenges they’ve faced in their careers. We aim to explore the conditions that gave rise to each person’s technical hurdles, as well as how their experiences transformed their understanding and approach to building systems. Whether you’re a seasoned open source data streaming engineer, or just someone who’s interested in learning more about Apache Kafka®, Apache Flink® and real-time data, we hope you’ll appreciate the stories, the discussion, and our effort to bring you a high-quality show worth your time.

Episodes

Mentioned books

Sep 15, 2022 • 34min

Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka

Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. In this episode, Simon shares his Confluent Hackathon ’22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which would have made it untenable. Instead, he was able to utilize the open-source models, which are essentially neural nets pretrained on relevant data sets—in his case, backyard animals.Simon's system, which consists of around 200 lines of code, employs a Kafka producer running a while loop, which connects to a camera feed using a Python library. For each frame brought down, object masking is applied in order to crop and reduce pixel density, and then the frame is compared to the models mentioned above. A Python dictionary containing probable found objects is sent to a Kafka broker for processing; the images themselves aren't sent. (Note that Simon's system is also capable of alerting if a specific, rare animal is detected.) On the broker, Simon uses ksqlDB and windowing to smooth the data in case the frames were inconsistent for some reason (it may look back over thirty seconds, for example, and find the highest number of animals per type). Finally, the data is sent to a Kibana dashboard for analysis, through a Kafka Connect sink connector. Simon’s system is an extremely low-cost system that can simulate the behaviors of more expensive, proprietary systems. And the concepts can easily be applied to many other use cases. For example, you could use it to estimate traffic at a shopping mall to gauge optimal opening hours, or you could use it to monitor the queue at a coffee shop, counting both queued patrons as well as impatient patrons who decide to leave because the queue is too long.EPISODE LINKSReal-Time Wildlife Monitoring with Apache KafkaWildlife Monitoring GithubksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work TogetherEvent-Driven Architecture - Common Mistakes and Valuable LessonsSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Sep 8, 2022 • 35min

Reddit Sentiment Analysis with Apache Kafka-Based Microservices

How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data Pipelines Using Confluent Cloud.The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices:A user input service that collects requests in a Kafka topic, which consist of the desired subreddit, along with the dates between which data should be collectedAn API polling service that fetches the requests from the user input service, collects the relevant data from the Reddit API, then appends it to a new topicA sentiment analysis service that analyzes the appended topic from the API polling service using the Python library NLTK; it calculates averages with ksqlDBA results-displaying service that consumes from a topic with the calculationsInteresting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it. EPISODE LINKSData Enrichment in Existing Data Pipelines Using Confluent CloudWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Aug 30, 2022 • 1h 2min

Capacity Planning Your Apache Kafka Cluster

How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup. A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars). Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective. EPISODE LINKSCapacity Planning and Sizing for Kafka StreamsTales from the Frontline of Apache Kafka DevOpsWatch the video version of this podcastSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Aug 25, 2022 • 34min

Streaming Real-Time Sporting Analytics for World Table Tennis

Reimagining a data architecture to provide real-time data flow for sporting events can be complicated, especially for organizations with as much data as World Table Tennis (WTT). Vatsan Rama (Director of IT, ITTF Group) shares why real-time data is essential in the sporting world and how his team reengineered their data system in 18 months, moving from a solely on-premises infrastructure to a cloud-native data system that uses Confluent Cloud with Apache Kafka® as its central nervous system. World Table Tennis is a business created by the International Table Tennis Federation (ITTF) to manage the official professional Table Tennis series of events and its commercial rights. World Table Tennis is also leading the sport digital transformation and commercializes its software application for real-time event scoring worldwide. Previously, ITTF scoring was processed manually with a desktop-based, on-venue results system (OVR) —an on-premises solution to process match data that calculated rankings and records, then sent event information to other systems, such as scoreboards. To provide match status in real-time, which makes the sport more engaging for fans and adds a competitive edge for players, Vatsan reengineered their OVR system to allow instant data sync between on-premises competition systems with the Cloud. The redesign started by establishing an event-driven architecture with Kafka that consolidates all legacy data sources, including records in Excel along with some handwritten forms (some dating back 90 years, even including records from the 1930 World Championship). To reduce operational overhead and maintenance, the team decided to stream data through fully managed Kafka as a service on Azure, for a scalable, distributed infrastructure. Vatsan shares that multiple table tennis events can run in parallel globally, and every time an umpire marks scores in a table, the data moves from the venue into Confluent Cloud, and then the score and rankings are sent to betting organizations and individuals on their mobile apps. EPISODE LINKSEvent Processing ApplicationFully Managed Apache Kafka on AzureWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Aug 18, 2022 • 49min

Real-Time Event Distribution with Data Mesh

Inheriting software in the banking sector can be challenging. Perhaps the only thing harder is inheriting software built by a committee of banks. How do you keep it running, while improving it, refactoring it, and planning a bigger future for it? In this episode, Jean-Francois Garet (Technical Architect, Symphony) shares his experience at Symphony as he helps it evolve from an inherited, monolithic, single-tenant architecture to an event mesh for seamless event-streaming microservices. He talks about the journey they’ve taken so far, and the foundations they’ve laid for a modern data mesh.Symphony is the leading markets’ infrastructure and technology platform, which provides a full communication stack (chat, voice and video meetings, file and screen sharing) for the financial industry. Jean-Francois shares that its initial system was inherited from one of the founding institutions—and features the highest level of security to ensure confidentiality of business conversations, coupled with compliance with regulations covering financial transactions. However, its stacks are monolithic and single tenant. To modernize Symphony's architecture for real-time data, Jean-Francois and team have been exploring various approaches over the last four years. They started breaking down the monolith into microservices, and also made a move towards multitenancy by setting up an event mesh. However, they experienced a mix of success and failure in both attempts. To continue the evolution of the system, while maintaining business deliveries, the team started to focus on event streaming for asynchronous communications, as well as connecting the microservices for real-time data exchange. As they had prior Apache Kafka® usage in the company, the team decided to go with managed Kafka on the cloud as their streaming platform. The team has a set of principles in mind for the development of their event-streaming functionality: Isolate product domainsReach eventual consistency with event streamingClear contracts for the event streams, for both producers and consumers Multiregion and global data sharingJean-Francois shares that data mesh is ultimately what they are hoping to achieve with their platform—to provide governance around data and make data available as a product for self service. As of now, though, their focus is achieving real-time event streams with event mesh. EPISODE LINKSThe Definitive Guide to Building a Data Mesh with Event StreamsData Mesh 101What is Data Mesh? ft. Zhamak DehghaniData Mesh ArchitectureWatch the video version of this podcastKris JenkinsSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Aug 11, 2022 • 39min

Apache Kafka Security Best Practices

Security is a primary consideration for any system design, and Apache Kafka® is no exception. Out of the box, Kafka has relatively little security enabled. Rajini Sivaram (Principal Engineer, Confluent, and co-author of “Kafka: The Definitive Guide” ) discusses how Kafka has gone from a system that included no security to providing an extensible and flexible platform for any business to build a secure messaging system. She shares considerations, important best practices, and features Kafka provides to help you design a secure modern data streaming system. In order to build a secure Kafka installation, you need to securely authenticate your users. Whether you are using Kerberos (SASL/GSSAPI), SASL/PLAIN, SCRAM, or OAUTH. Verifying your users can authenticate, and non-users can’t, is a primary requirement for any connected system.But authentication is only one part of the security story. We also need to address other areas. Kafka added support for fine-grained access control using ACLs with a pluggable authorizer several years ago. Over time, this was extended to support prefixed ACLs to make ACLs more manageable in large organizations. Now on its second generation authorizer, Kafka is easily extendable to support other forms of authorization, like integrating with a corporate LDAP server to provide group or role-based access control.Even if you’ve set up your system to use secure authentication and each user is authorized using a series of ACLs if the data is viewable by anyone listening, how secure is your system? That’s where encryption comes in. Using TLS Kafka can encrypt your data-in-transit.Security has gone from a nice-to-have to being a requirement of any modern-day system. Kafka has followed a similar path from zero security to having a flexible and extensible system that helps companies of any size pick the right security path for them. Be sure to also check out the newest Apache Kafka Security course on Confluent Developer for an in-depth explanation along with other recommendations. EPISODE LINKSAn Introduction to Apache Kafka Security: Securing Real-Time Data StreamsKafka Security courseKafka: The Definitive Guide v2Security OverviewWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DevelopSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Aug 4, 2022 • 41min

What Could Go Wrong with a Kafka JDBC Connector?

Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations. In this episode, we'll cover how a JDBC connection works, and common issues with your database connection. Why the Kafka JDBC Connector? When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture. The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you’d like to execute against the database, and then the connector will take the data into Kafka. The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don’t represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector’s supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as: Infrequent snapshot timesOut-of-order eventsNon-incremental sequencesHard deletesTo help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs, and provides sequence numbers for you to speak the language of a DBA. They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate. EPISODE LINKSKafka Connect Deep Dive – JDBC Source ConnectorJDBC Source Connector: What could go wrong?Metadata parser Debezium DocumentationDatabase Migration with Apache Kafka and Apache Kafka ConnectSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Jul 28, 2022 • 37min

Apache Kafka Networking with Confluent Cloud

Setting up a reliable cloud networking for your Apache Kafka® infrastructure can be complex. There are many factors to consider—cost, security, scalability, and availability. With immense experience building cloud-native Kafka solutions on Confluent Cloud, Justin Lee (Principal Solutions Engineer, Enterprise Solutions Engineering, Confluent) and Dennis Wittekind (Customer Success Technical Architect, Customer Success Engineering, Confluent) talk about the different networking options on Confluent Cloud, including AWS Transit Gateway, AWS, and Azure Private Link, and discuss when and why you might choose one over the other. In order to build a secure cloud-native Kafka network, you need to consider information security and compliance requirements. These requirements may vary depending on your industry, location, and regulatory environment. For example, in financial organizations, transaction data or personal identifiable information (PII) may not be accessible over the internet. In this case, your network architecture may require private networking, which means you have to choose between private endpoints or a peering connection between your infrastructure and your Kafka clusters in the cloud.What are the differences between different networking solutions? Dennis and Justin talk about some of the benefits and drawbacks of different network architectures. For example, Transit Gateways offered by AWS are often a good fit for organizations with large, disparate network architectures, while Private Link is sometimes preferred for its security benefits. We also discuss the management overhead involved in administering different network architectures.Dennis and Justin also highlight their recently launched course on Confluent Developer—the Confluent Cloud Networking course. This hands-on course covers basic networking and cloud computing concepts that will offer support for you to get a clearer picture of the configurations and collaborate with the networking teams.EPISODE LINKSCloud Networking courseManage NetworkingWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (dSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Jul 21, 2022 • 53min

Event-Driven Systems and Agile Operations

How do the principles of chaotic, agile operations in the military apply to software development and event-driven systems? As a former Royal Marine, Ben Ford (Founder and CEO, Commando Development) is also a software developer, with many years of experience building event streaming architectures across financial services and startups. He shares principles that the military employs in chaotic conditions as well as how these can be applied to event-streaming and agile development.According to Ben, the operational side of the military is very emergent and reactive based on situations, like real-time, event-driven systems. Having spent the last five years researching, adapting, and applying these principles to technology leadership, he identifies a parallel in these concepts and operations ranging from DevOps to organizational architecture, and even when developing data streaming applications.One of the concepts Ben and Kris talk through is Colonel John Boyd’s OODA loop, which includes four cycles: Observe: the observation of the incoming events and informationOrient: the orientation stage involves reflecting on the events and how they are applied to your current situation Decide: the decision on what is the expected path to take. Then test and identify the potential outcomesAct: the action based on the decision, while also involves testing in generating further observationsThis concept of feedback loop helps to put in context and quickly make the most appropriate decision while understanding that changes can be made as more data becomes available. Ben and Kris also chat through their experience of building an event system together during the early days before the release of Apache Kafka® and more. EPISODE LINKSBuilding Real-Time Data Systems the Hard WayMission CtrlMission Command: The Doctrine of EmpowermentWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details) SEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Jul 14, 2022 • 1h 7min

Streaming Analytics and Real-Time Signal Processing with Apache Kafka

Imagine you can process and analyze real-time event streams for intelligence to mitigate cyber threats or keep soldiers constantly alerted to risks and precautions they should take based on events. In this episode, Jeffrey Needham (Senior Solutions Engineer, Advanced Technology Group, Confluent) shares use cases on how Apache Kafka® can be used for real-time signal processing to mitigate risk before it arises. He also explains the classic Kafka transactional processing defaults and the distinction between transactional and analytic processing. Jeffrey is part of the customer solutions and innovations division (CSID), which involves designing event streaming platforms and innovations to improve productivity for organizations by pushing the envelope of Kafka for real-time signal processing. What is signal intelligence? Jeffrey explains that it's not always affiliated with the military. Signal processing improves your operational or situational awareness by understanding the petabyte datasets of clickstream data, or the telemetry coming in from sensors, which could be the satellite or sensor arrays along a water pipeline. That is, bringing in event data from external sources to analyze, and then finding the pattern in the series of events to make informed decisions. Conventional On-Line Analytical Processing (OLAP) or data warehouse platforms evolved out of the transaction processing model. However, when analytics or even AI processing is applied to any data set, these algorithms never look at a single column or row, but look for patterns within millions of rows of transactionally derived data. Transaction-centric solutions are designed to update and delete specific rows and columns in an “ACID” compliant manner, which makes them inefficient and usually unaffordable at scale because this capability is less critical when the analytic goal is to look for a pattern within millions or even billions of these rows.Kafka was designed as a step forward from classic transaction processing technologies, which can also be configured in a way that’s optimized for signal processing high velocities of noisy or jittery data streams, in order to make sense, in real-time, of a dynamic, non-transactional environment.With its immutable, write-append commit logs, Kafka functions as a flight data recorder, which remains resilient even when network communications, or COMMs, are poor or nonexistent. Jeffrey shares the disconnected edge project he has been working on—smart soldier, which runs Kafka on a Raspberry Pi and x64-based handhelds. These devices are ergonomically integrated on each squad member to provide real-time visibility into the soldiers’ activities or situations. COMMs permitting, the topic data is then mirrored upstream and aggregated at multiple tiers—mobile command post, battalion, HQ—to provide ever-increasing views of the entire battlefield, or whatever the sensor array is monitoring, including the all important supply chain. Jeffrey also shares a couple of other use cases on how Kafka can be used for signal intelligence, including cybersecurity and protecting national critical infrastSEASON 2 Hosted by Tim Berglund, Adi Polak and Viktor Gamov Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed Music by Coastal Kites Artwork by Phil Vo 🎧 Subscribe to Confluent Developer wherever you listen to podcasts. ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes. 👍 If you enjoyed this, please leave us a rating. 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner