Open||Source||Data cover image

Open||Source||Data

Latest episodes

undefined
Jun 1, 2022 • 4min

Data Observability with Barr Moses, Einat Orr, and Shinji Kim

This bonus episode features conversations from season 2 of the Open||Source||Data podcast. In this episode, you’ll hear from Barr Moses, Co-founder and CEO at Monte Carlo; Einat Orr, Co-founder and CEO at Treeverse; and Shinji Kim, Founder and CEO at Select Star.Sam sat down with each guest to discuss data observability. You can listen to the full episodes from Barr Moses, Einat Orr, and Shinji Kim by clicking the links below.-------------------Episode Timestamps:(00:35): Barr Moses(01:21): Einat Orr(02:07): Shinji Kim-------------------Links:Listen to Barr’s episodeListen to Einat’s episodeListen to Shinji’s episode
undefined
May 25, 2022 • 41min

Apache Pinot and Real-Time Analytics with Neha Pawar

This episode features an interview with Neha Pawar, a Founding Engineer at StarTree. StarTree is a software development company that focuses on democratizing data for all users by providing real-time, user-facing analytics.Prior to her time at StarTree, Neha was a Senior Software Engineer on LinkedIn’s Data Analytics team where she spent five years working on Apache Pinot. Neha has provided countless contributions to Pinot over the years, focusing on real-time streaming integrations, ingestion, and storage. In this episode, Sam sits down with Neha to discuss Apache Pinot’s impact on the data community and how LinkedIn popularized real-time analytics.-------------------"Many people do think that a batch is good enough, real-time infra is expensive anyway. And what difference is it going to make if the data shown in this application is a day ago or an hour ago, and it's not real-time to the nearest second? And while that is true, in some cases, but in many other cases, not having real-time data can be super expensive and can affect the business badly and also make them irrelevant. You need the real-time data and then you also need to be able to analyze that data at the speed of your thought. For example, if you are having fraudulent activity somewhere, you can't wait for, ‘Hey, my model is going to learn about this.’ And then the next time, be able to tell me that that was a fraudulent activity. You need to be able to analyze all that data right now. So, it's not just a nice-to-have, it's a must-have.” – Neha Pawar-------------------Episode Timestamps:(01:58): What open source data means to Neha(06:04): Neha’s learnings from the LinkedIn Data Analytics Team(07:07): What peaked Neha’s interest in real-time data analytics(08:30): Neha’s first experiences working on Apache Pinot(11:40): How the work of real-time data spread from LinkedIn to other companies(17:30): How the Apache community has grown(24:04): Neha’s focus at StarTree(30:41): Neha’s motivation for tiered storage at StarTree (37:07): Neha’s advice for open source data folks-------------------Links:LinkedIn - Connect with NehaLinkedIn - Connect with StarTreeTwitter - Follow NehaTwitter - Follow StarTreeVisit StarTree
undefined
May 11, 2022 • 40min

Real-Time Data, Enabling Developers, and User Experience with DeVaris Brown

This episode features an interview with DeVaris Brown, CEO and Co-Founder of Meroxa. Meroxa was founded in 2020 and enables teams of any size and any expertise to build real-time data pipelines in minutes.Previously, DeVaris was a product leader at Twitter, Heroku, and Zendesk. Sam and DeVaris even crossed paths at Microsoft in the aughts.In this episode, Sam and DeVaris discuss enabling developers, real-time data, and providing the ultimate user experience.-------------------"From the beginning we wanted to be system engineer first, software engineer second, and we were happy to stand on the shoulders of giants that built foundational pieces of technology to help us get our job done more efficiently. [...] The one thing I love about my co-founder and he's super humble, Ali, we did billions of events a minute at Heroku on the data platform for tens of thousands of Kafka clusters for thousands of customers. But the team was six and he was a lead on that team. And we had five nines for years. Why? Because automation. And that's really what we built. [...] And so what we said was the experience will be our differentiator, but the components and the architecture which we run on, that can be standard. And that was a real big lesson that I learned at Heroku." – DeVaris Brown-------------------Episode Timestamps:(05:47): What open source data means to DeVaris (09:08): DeVaris’ inspiration for building a Heroku for data (14:09): The open source underneath Meroxa (20:06): What the Meroxa open source community looks like(25:13): How will data engineering evolve over time?(28:41): DeVaris breaks down real-time data(33:40): Where does the name Meroxa come from? (35:01): DeVaris’ advice for open source data folks-------------------Links:LinkedIn - Connect with DeVarisLinkedIn - Connect with MeroxaTwitter - Follow DeVarisTwitter - Follow MeroxaVisit MeroxaVisit Orbit
undefined
May 4, 2022 • 4min

Data Meshes, Fabrics, and Discovery with Zhamak Dehghani, David Thomas, and Shirshanka Das

This bonus episode features conversations from season 1 and 2 of the Open||Source||Data podcast. In this episode, you’ll hear from Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks North America; David Thomas, Principal at Deloitte; and Shirshanka Das, Founder of LinkedIn DataHub and Acryl Data.Sam sat down with each guest to discuss data meshes, fabrics, and discovery. You can listen to the full episodes from Zhamak Dehghani, David Thomas, and Shirshanka Das by clicking the links below.-------------------Episode Timestamps:(00:36): Zhamak Dehghani(01:41): David Thomas(02:43): Shirshanka Das-------------------Links:Listen to Zhamak’s episodeListen to David’s episodeListen to Shirshanka’s episode 
undefined
Apr 27, 2022 • 35min

Investing in Communities, Differentiating, and Trusting Your Gut with Erica Brescia

This episode features an interview with Erica Brescia, Managing Director of Redpoint Ventures. At Redpoint, Erica focuses her investing on infrastructure, DevOps, and security.Erica has over 15 years of experience in the open source community and currently serves on the board of directors of the Linux Foundation. Prior to joining Redpoint, Erica was also an angel investor and advisor to companies such as Netlify, Coda, and Xata.In this episode, Sam and Erica discuss the evolution of open source data, what’s changed for practitioners, and why you should always listen to your gut.-------------------“I think there is just so much good motivation to make the world a better place, especially during my time at GitHub. When you can see what kinds of opportunity open source can bring to people in developing countries, that’s really exciting. You see people whose lives and livelihoods have literally been changed because they were able to participate in a global open source project. And then you can see the way that open source projects, even back when we were packaging things at Bitnami, we’d hear from non-profits in Africa that were never able to use open source until we made it easy to consume. When you feel like you’re really making that kind of a difference and you’re doing it in a community of great people, it’s a really great way to spend your time.” – Erica Brescia-------------------Episode Timestamps:(03:18): What open source data means to Erica(11:31): What’s changed in open source data in recent years(18:01): How the journey has evolved for innovators and practitioners(24:11): What stands out as a venture capitalist to Erica(30:03): Don’t discount junior investors(31:17): Erica’s advice: get quiet and listen to your gut-------------------Links:LinkedIn - Connect with EricaLinkedIn - Connect with Red PointTwitter - Follow EricaTwitter - Follow RedpointVisit RedpointXataDagger
undefined
Apr 20, 2022 • 4min

Data on Kubernetes with Kelsey Hightower, Lachlan Evenson, and Patrick McFadin

This bonus episode features conversations from season 1 of the Open||Source||Data podcast. In this episode, you’ll hear from Kelsey Hightower, Principal Engineer at Google Cloud; Lachlan Evenson, Principal Program Manager at Microsoft Azure; and Patrick McFadin, Head of Developer Relations at DataStax. Sam sat down with each guest to discuss Data on Kubernetes and how they’re making progress on a stateless infrastructure.You can listen to the full episodes from Kelsey Hightower, Lachlan Evenson, and Patrick McFadin by clicking the links below.-------------------Timestamps:(00:39): Kelsey Hightower(01:33): Lachlan Evenson(02:06): Patrick McFadin-------------------Links:Listen to Kelsey’s episodeListen to Lachlan’s episodeListen to Patrick’s episode
undefined
Apr 13, 2022 • 45min

Deep Fakes, Responsible Data Science, and Trust with David Danks

This episode features an interview with David Danks, Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. Prior to UCSD, David was the L.L. Thurstone Professor of Philosophy and Psychology at Carnegie Mellon University. David’s research interests are at the intersection of philosophy, cognitive science, and machine learning. He has also examined the ethics surrounding artificial intelligence in the fields of healthcare, privacy, and security. In this episode, David and Sam dive into responsible data science, deep fakes, and if data is to blame for the lack of trust among consumers.-------------------"There's a, almost, glorification of the technology that's happening at the moment. And the technology is obviously crucial, but what I really care about in a lot of ways is what are the human beings who build and use that technology doing with it? Because the exact same ones and zeros, the exact same code can lead to enormous social benefit or social harm, depending on what we humans do with it. And so, I think we need to recognize that technology is not this hurricane bearing down on us, it's a thing that people build and use. And how do we influence the people, and the companies is maybe an easier thing to do than trying to focus just on the data and algorithms." – David Danks-------------------Episode Timestamps:(01:41): What open source data means to David(05:58): David’s transition from philosophy to AI(09:03): Is data to blame for lack of trust in AI?(13:40): How to be “future aware”(16:32): Data science vs responsible data science(20:20): Deep Fakes(40:17): Advice for Ethical AI newcomers-------------------Links:Connect with David
undefined
Mar 30, 2022 • 37min

Cloud Innovation, Analytics, and Data Transformation with Monica Kumar

This episode features an interview with Monica Kumar, Senior Vice President of Marketing and Cloud-Go-To Market at Nutanix. Nutanix is a data platform that is redefining workloads in cloud environments. Prior to Nutanix, Monica spent two decades at Oracle where she launched several market solutions. Monica is passionate about positioning and supporting women in leadership roles. She is a founding limited partner of Neythri Futures Fund, a venture fund dedicated to bringing South Asian women into the investment community. Monica also serves on the board of Directors at Watermark, an organization dedicated to women in leadership. In this episode, Monica and Sam discuss the evolving world of marketing analytics, tech’s biggest innovation to date, and how the data industry can change for the better.-------------------“I believe that cloud has now become more of an operating model. It started out in the public cloud, but now organizations have adopted the same philosophy of self-service, metering, chargeback, quick deployment, on-demand deployment, on-premises as well. So, my assertion is that cloud has become more of an operating model than a location. And what we’re going to see going forward more and more is this notion of multi-cloud and hybrid multi-cloud data platforms that would be able to access data from multiple locations and be able to provide on top of that, the analysis that the user is looking for.” – Monica Kumar-------------------Episode Timestamps:(02:23): What open source data means to Monica(12:37): The evolving world of marketing insight analytics(16:42): How remote work is changing the industry for the better(20:25): How Monica supports diverse entrepreneurs (24:11): The transformation of a database to a data platform(26:37): Why the cloud is tech’s biggest innovation(29:23): What’s next for data storage in 5 years?(29:41): Monica’s analogy for data storage(33:34): Monica’s advice for newcomers -------------------Links:LinkedIn - Connect with MonicaLinkedIn - Connect with NutanixTwitter - Follow MonicaTwitter - Follow NutanixVisit NutanixThe Neythri Futures Fund
undefined
Mar 16, 2022 • 30min

Data Lakehouses, Interoperability, and Accessibility with Tomer Shiran

This episode features an interview with Tomer Shiran, Founder and Chief Product Officer at Dremio. Dremio is a high-performance SQL lakehouse platform that helps companies get more from their data in the fastest way possible. Prior to Dremio, Tomer served as VP of Product at MapR and also held product management and engineering roles at Microsoft and IBM Research. He also has a master’s degree from Carnegie Mellon University as well as a bachelor’s from Technion - Israel Institute of Technology.In this episode, Tomer and Sam dive into the economics of storing data, how to build an open architecture, and what exactly a data lakehouse is.-------------------“I think in the world of data lakes and lakehouses, the model has shifted upside down. Now, instead of bringing the data into the engines, you’re actually bringing the engines to the data. So you have this open data tier built on open source technology. The data is represented in open source formats and stored in the company’s S3 account or Azure storage account. And then you can use a variety of engines. We at Dremio, we take pride in building the best SQL engine to use on the data. There are different streaming engines, like Spark and Flink. There are different batch processing and machine learning engines. Spark is an example of that as well that companies can use on that same data. And I think that’s one of the really important things from a cost standpoint, too, is that this really lowers your overall costs, both today and also in the future as you scale.” – Tomer Shiran-------------------Episode Timestamps:(02:04): What open source data means to Tomer(03:14): Tomer’s motivation behind Apache Arrow(06:42): How Tomer solved data accessibility (08:43): The unit economics of storing data(14:31): Tomer’s motivations for Iceberg and how it relates to Project Nessie(17:06): What is a data lakehouse?(18:31): What gives Dremio its magic?(23:39): What cloud data architecture will look like in 5 years(27:19): Advice for building an open data architecture-------------------Links:LinkedIn - Connect with TomerLinkedIn - Connect with DremioTwitter - Follow TomerTwitter - Follow DremioVisit DremioGet started with Dremio
undefined
Mar 2, 2022 • 31min

Interoperability, Governance, and Divergent Teams with Prukalpa Sankar

This episode features an interview with Prukalpa Sankar, Co-Founder of Atlan. Atlan is a venture-backed startup building a modern data workspace. Prukalpa also co-founded SocialCops, a data for good company behind landmark projects such as India’s National Data Platform. Prukalpa is a recognized industry leader, landing on the Forbes 30 Under 30 list and Fortune’s 40 Under 40.In this episode, Prukalpa and Sam discuss how diversity is a data team’s biggest strength, why governance isn’t always a bad thing, and what they hope the modern data stack will look like in 5 years.-------------------“Diversity is our biggest strength but our biggest weakness, because it's really hard to make that team collaborate. Because most of the teams in the world are very uniform. So when every single person in the room is a subject matter expert on something, nobody else actually can have oversight on each other's work because they've never done it before. Then how do you create true trust? How do you create trust when things are breaking? If you're able to create a way for these diverse people to collaborate really effectively, to be a dream team, a dream data team where they trust each other and they can collaborate effectively, then magic can happen.” – Prukalpa Sankar-------------------Episode Timestamps:[01:55]: What open source data means to Prukalpa[05:38]: Prukalpa’s journey to data for good movement[04:51]: How Prukalpa and her team provided gas to 80 million Indian women[06:33]: How diversity can help a data team succeed[15:10]: What gives Atlan its magic[18:58]: How being open by default influenced Atlan’s architecture choices[22:45]: The reality of the modern data stack in 5 years[27:36]: Advice for people getting started with DataOps-------------------Links:LinkedIn - Connect with PrukalpaLinkedIn - Connect with AtlanTwitter - Follow PrukalpaTwitter - Follow AtlanVisit Atlan

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app