David Aronchick, CEO of Expanso and former Kubernetes team member at Google, dives into the importance of processing data geographically to minimize latency and enhance security. He shares insights on Baccay.io, a platform for edge data processing and the challenges of log management. Discussing data governance, he emphasizes the need for effective integration in distributed systems. Finally, Aronchick highlights innovative approaches to server job methods, particularly advocating for Docker containers as a flexible alternative to Kubernetes.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Processing data geographically improves performance and security while ensuring adherence to regulations like GDPR by minimizing unnecessary data movement.
Expanso's infrastructure significantly lowers operational costs by enabling localized data processing, as demonstrated by a client saving $2 million annually on ingestion fees.
Deep dives
Importance of Geographical Data Processing
Processing large datasets effectively requires substantial computational resources, and the geographical location of data processing can be crucial. Expanso offers infrastructure that allows jobs to be executed where the data resides, thereby minimizing latency and enhancing security. This approach not only streamlines data handling but also addresses regulatory concerns related to data governance, such as GDPR compliance, by ensuring data processing occurs within the required geographical zones. Organizations can reduce data movement, thereby optimizing resource use and maintaining data privacy.
Specific Use Cases Highlighting Data Processing Needs
One prominent application of Expanso's technology is in log processing, where organizations typically experience high costs when moving and processing data centrally. Instead of transferring all logs to a centralized location, Expanso allows initial data processing at the point of log creation, significantly cutting down on unnecessary data movement and associated costs. For instance, a client saw their annual ingestion costs drop by $2 million by implementing this solution, thus enhancing both their security and efficiency. Additionally, there are potential applications in video streaming where processing can occur near video data collection points, minimizing the need to send large amounts of data to a central server.
Advantages of Running Computation Near Data
There are three primary advantages of running computations close to data: improved latency and performance, enhanced security, and better data governance. By processing data where it is generated, organizations can detect and react to issues in real-time, drastically reducing the time needed for alerting and remediation. Furthermore, this approach helps in managing costs related to data storage and movement, as unnecessary data transmission is minimized. Ensuring compliance with regulations like GDPR also becomes simpler, as sensitive data remains in its respective zone while still being accessible for processing.
Open Source vs. Commercial Offerings
Expanso operates on a blended model with an open source platform, Baca.io, and a commercial offering that supports it. While the source code for Baca.io is freely available, the binaries and trademarked components require a business relationship for use. This model not only accommodates diverse customer needs, including those in highly regulated environments, but also ensures that critical security and compliance measures are maintained. The open source basis allows for community collaboration while the commercial aspect guarantees reliability, support, and regular updates.
Large datasets require large computational resources to process that data. More frequently, where you process that data geographically can be just as important as how you process it.
Expanso provides job execution infrastructure that runs jobs where data resides, to help reduce latency and improve security and data governance.
David Aronchick is the CEO of Expanso. He previously worked at Google on the Kubernetes team, which influenced his decision to start Expanso. David joins the show to talk about his company.
This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His best-selling book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments.
Lee is the host of his podcast, Modern Digital Business, an engaging and informative podcast produced for people looking to build and grow their digital business with the help of modern applications and processes developed for today’s fast-moving business environment. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com, and see all his content at leeatchison.com.