

Open||Source||Data
Charna Parkey
What can we learn from ai-native development through stimulating conversations with developers, regulators, academics and people like you that drive forward development, seek to understand impact, and are working to mitigate risk in this new world?
Join Charna Parkey and the community shaping the future of open source data, open source software, data in AI, and much more.
Join Charna Parkey and the community shaping the future of open source data, open source software, data in AI, and much more.
Episodes
Mentioned books

Jun 29, 2022 • 6min
Democratization and Cognition with Margot Gerritsen, Rachel Chalmers, and Patricia Boswell
This bonus episode features conversations from season 1 of the Open||Source||Data podcast. In this episode, you’ll hear from Margot Gerritsen, Stanford Professor and Co-Founder/Director of WiDS; Rachel Chalmers, Partner at Alchemist Accelerator; and Patricia Boswell, Staff Technical Writer at Google.Sam sat down with each guest to discuss cognition and democratization in data. You can listen to the full episodes from Margot Gerritsen, Rachel Chalmers, and Patricia Boswell by clicking the links below.-------------------Episode Timestamps:(00:18): Margot Gerritsen(02:07): Rachel Chalmers(03:46): Patricia Boswell-------------------Links:Listen to Margot’s episodeListen to Rachel’s episodeListen to Patricia's episode

Jun 22, 2022 • 36min
Vector Search, the AI Stack and more with Bob van Luijt
This episode features an interview with Bob van Luijt, CEO and Co-Founder of SeMI Technologies and co-creator of Weaviate, an open source vector search engine. At just 15 years of age, Bob started his own software company in the Netherlands. He went on to study music at ArtEZ University of the Arts and Berklee College of Music, and completed the Harvard Business School Program of Management Excellence. Bob is also a TedX speaker, discussing the relationship between software and language.In this episode, Sam sits down with Bob to break down vector search, the AI-first ecosystem, and how music and software relate to one another.-------------------“I dare to argue that from the two big waves in database technology that we've seen, so first, in the seventies and eighties with SQL. And then the whole NoSQL wave that we have seen and the big winners that are in there, I dare to argue that we see a third wave coming up. And the third wave, I simply call it AI-first. And what I mean with that is that these models play an important role. So we do it from the perspective of the models first. And in that new segment, you see four niches. So the first niche that we see are what I like to call the embedding providers. The Hugging Faces of this world, the OpenAIs of this world, etc. Those who bring us the embeddings that we need to do the vectorization. Then secondly, we have so-called neural search frameworks. So we see frameworks like Haystack and Jina. Then third, we have the feature stores. So the feature stores take care of storing large chunks of features that we later can use to do vectorization on those kinds of things.And then we have the search engines. And Weaviate is an example of such a search engine that takes care of searching through data on a large scale that is vectorized.It might be a bold statement, but I really believe that we see this third wave of database technology happening.” – Bob van Luijt-------------------Episode Timestamps:(01:45): How Bob defines open source data (04:09): What is a vector database and why do we need them? (07:55): How data is different before and after vectorization(13:58): Orders of magnitude faster or personal(16:09): How music and software relate to each other for Bob(19:33): Bob’s inspiration behind Weaviate(25:02): The AI-first ecosystem(27:38): The distinction between vector search engines, feature stores, neural search frameworks, and embedding (32:28): Bob’s advice for folks on the OSS startup journey-------------------Links:LinkedIn - Connect with BobTwitter - Follow BobTwitter - Follow WeaviateWeaviateSeMI TechnologiesBob’s TedX TalkBob's Forbes Article on the AI-First Database Ecosystem

Jun 8, 2022 • 40min
Open Source Innovation, The GPL for Data, and The Data In to Data Out Ratio with Larry Augustin
This episode features an interview with Larry Augustin, angel investor and advisor to early-stage technology companies. Larry previously served as the Vice President for Applications at AWS, where he was responsible for application services like Pinpoint, Chime, and WorkSpaces.Before joining AWS, Larry was the CEO of SugarCRM, an open source CRM vendor. He also was the founder and CEO of VA Linux, where he launched SourceForge. Among the group who coined the term “open source”, Larry has sat on the boards of several open source and Linux organizations.In this episode, Sam and Larry discuss who owns the rights to data, the data in to data out ratio, and why Larry is an open source titan.-------------------"People are willing to give up so much of their personal information because they get an awful lot back. And privacy experts come along and say, ‘Well, you're taking all this personal information’. But then most people look at that and say, ‘But I get a lot of value back out of that.’ And it's this data ratio value question, which is: for a little in, I get a lot back. That becomes a key element in this. And I think there has to be some kind of similar thought process around open source data in general, which is if I contribute some data into this, I'm going to get a lot of value back. So this data in to data out ratio, I think it's an incredibly important one. It's a principle that I drive into application development. If you put a user in front of an app and they start using the app, you're going to ask them for things. And my principle is always, ‘How do you figure out how to never ask them and only give them?’ And you can't get 100% of the way there, but every time it's like, ‘Why did you ask them for that? Couldn't you figure it out?’ And it gets everyone in the mindset of, ‘How do I provide more and more and take less and less?’ It's a principle of application development that I like a lot. And I think there's a similar concept here around open-source data. Are there models or structures that we can come up with where people can contribute small amounts of data and as a result of that, they get back a lot of value.” – Larry Augustin-------------------Episode Timestamps:(02:14): How Larry is spending his time after AWS(06:01): What drove Larry to open source(18:04): What is the GPL for data?(23:51): Areas of progress in open source data(28:37): The data in to data out ratio(36:02): Larry’s advice for folks in open source-------------------Links:LinkedIn - Connect with LarryTwitter - Follow Larry

Jun 1, 2022 • 4min
Data Observability with Barr Moses, Einat Orr, and Shinji Kim
This bonus episode features conversations from season 2 of the Open||Source||Data podcast. In this episode, you’ll hear from Barr Moses, Co-founder and CEO at Monte Carlo; Einat Orr, Co-founder and CEO at Treeverse; and Shinji Kim, Founder and CEO at Select Star.Sam sat down with each guest to discuss data observability. You can listen to the full episodes from Barr Moses, Einat Orr, and Shinji Kim by clicking the links below.-------------------Episode Timestamps:(00:35): Barr Moses(01:21): Einat Orr(02:07): Shinji Kim-------------------Links:Listen to Barr’s episodeListen to Einat’s episodeListen to Shinji’s episode

May 25, 2022 • 41min
Apache Pinot and Real-Time Analytics with Neha Pawar
This episode features an interview with Neha Pawar, a Founding Engineer at StarTree. StarTree is a software development company that focuses on democratizing data for all users by providing real-time, user-facing analytics.Prior to her time at StarTree, Neha was a Senior Software Engineer on LinkedIn’s Data Analytics team where she spent five years working on Apache Pinot. Neha has provided countless contributions to Pinot over the years, focusing on real-time streaming integrations, ingestion, and storage. In this episode, Sam sits down with Neha to discuss Apache Pinot’s impact on the data community and how LinkedIn popularized real-time analytics.-------------------"Many people do think that a batch is good enough, real-time infra is expensive anyway. And what difference is it going to make if the data shown in this application is a day ago or an hour ago, and it's not real-time to the nearest second? And while that is true, in some cases, but in many other cases, not having real-time data can be super expensive and can affect the business badly and also make them irrelevant. You need the real-time data and then you also need to be able to analyze that data at the speed of your thought. For example, if you are having fraudulent activity somewhere, you can't wait for, ‘Hey, my model is going to learn about this.’ And then the next time, be able to tell me that that was a fraudulent activity. You need to be able to analyze all that data right now. So, it's not just a nice-to-have, it's a must-have.” – Neha Pawar-------------------Episode Timestamps:(01:58): What open source data means to Neha(06:04): Neha’s learnings from the LinkedIn Data Analytics Team(07:07): What peaked Neha’s interest in real-time data analytics(08:30): Neha’s first experiences working on Apache Pinot(11:40): How the work of real-time data spread from LinkedIn to other companies(17:30): How the Apache community has grown(24:04): Neha’s focus at StarTree(30:41): Neha’s motivation for tiered storage at StarTree (37:07): Neha’s advice for open source data folks-------------------Links:LinkedIn - Connect with NehaLinkedIn - Connect with StarTreeTwitter - Follow NehaTwitter - Follow StarTreeVisit StarTree

May 11, 2022 • 40min
Real-Time Data, Enabling Developers, and User Experience with DeVaris Brown
This episode features an interview with DeVaris Brown, CEO and Co-Founder of Meroxa. Meroxa was founded in 2020 and enables teams of any size and any expertise to build real-time data pipelines in minutes.Previously, DeVaris was a product leader at Twitter, Heroku, and Zendesk. Sam and DeVaris even crossed paths at Microsoft in the aughts.In this episode, Sam and DeVaris discuss enabling developers, real-time data, and providing the ultimate user experience.-------------------"From the beginning we wanted to be system engineer first, software engineer second, and we were happy to stand on the shoulders of giants that built foundational pieces of technology to help us get our job done more efficiently. [...] The one thing I love about my co-founder and he's super humble, Ali, we did billions of events a minute at Heroku on the data platform for tens of thousands of Kafka clusters for thousands of customers. But the team was six and he was a lead on that team. And we had five nines for years. Why? Because automation. And that's really what we built. [...] And so what we said was the experience will be our differentiator, but the components and the architecture which we run on, that can be standard. And that was a real big lesson that I learned at Heroku." – DeVaris Brown-------------------Episode Timestamps:(05:47): What open source data means to DeVaris (09:08): DeVaris’ inspiration for building a Heroku for data (14:09): The open source underneath Meroxa (20:06): What the Meroxa open source community looks like(25:13): How will data engineering evolve over time?(28:41): DeVaris breaks down real-time data(33:40): Where does the name Meroxa come from? (35:01): DeVaris’ advice for open source data folks-------------------Links:LinkedIn - Connect with DeVarisLinkedIn - Connect with MeroxaTwitter - Follow DeVarisTwitter - Follow MeroxaVisit MeroxaVisit Orbit

May 4, 2022 • 4min
Data Meshes, Fabrics, and Discovery with Zhamak Dehghani, David Thomas, and Shirshanka Das
This bonus episode features conversations from season 1 and 2 of the Open||Source||Data podcast. In this episode, you’ll hear from Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks North America; David Thomas, Principal at Deloitte; and Shirshanka Das, Founder of LinkedIn DataHub and Acryl Data.Sam sat down with each guest to discuss data meshes, fabrics, and discovery. You can listen to the full episodes from Zhamak Dehghani, David Thomas, and Shirshanka Das by clicking the links below.-------------------Episode Timestamps:(00:36): Zhamak Dehghani(01:41): David Thomas(02:43): Shirshanka Das-------------------Links:Listen to Zhamak’s episodeListen to David’s episodeListen to Shirshanka’s episode

Apr 27, 2022 • 35min
Investing in Communities, Differentiating, and Trusting Your Gut with Erica Brescia
This episode features an interview with Erica Brescia, Managing Director of Redpoint Ventures. At Redpoint, Erica focuses her investing on infrastructure, DevOps, and security.Erica has over 15 years of experience in the open source community and currently serves on the board of directors of the Linux Foundation. Prior to joining Redpoint, Erica was also an angel investor and advisor to companies such as Netlify, Coda, and Xata.In this episode, Sam and Erica discuss the evolution of open source data, what’s changed for practitioners, and why you should always listen to your gut.-------------------“I think there is just so much good motivation to make the world a better place, especially during my time at GitHub. When you can see what kinds of opportunity open source can bring to people in developing countries, that’s really exciting. You see people whose lives and livelihoods have literally been changed because they were able to participate in a global open source project. And then you can see the way that open source projects, even back when we were packaging things at Bitnami, we’d hear from non-profits in Africa that were never able to use open source until we made it easy to consume. When you feel like you’re really making that kind of a difference and you’re doing it in a community of great people, it’s a really great way to spend your time.” – Erica Brescia-------------------Episode Timestamps:(03:18): What open source data means to Erica(11:31): What’s changed in open source data in recent years(18:01): How the journey has evolved for innovators and practitioners(24:11): What stands out as a venture capitalist to Erica(30:03): Don’t discount junior investors(31:17): Erica’s advice: get quiet and listen to your gut-------------------Links:LinkedIn - Connect with EricaLinkedIn - Connect with Red PointTwitter - Follow EricaTwitter - Follow RedpointVisit RedpointXataDagger

Apr 20, 2022 • 4min
Data on Kubernetes with Kelsey Hightower, Lachlan Evenson, and Patrick McFadin
This bonus episode features conversations from season 1 of the Open||Source||Data podcast. In this episode, you’ll hear from Kelsey Hightower, Principal Engineer at Google Cloud; Lachlan Evenson, Principal Program Manager at Microsoft Azure; and Patrick McFadin, Head of Developer Relations at DataStax. Sam sat down with each guest to discuss Data on Kubernetes and how they’re making progress on a stateless infrastructure.You can listen to the full episodes from Kelsey Hightower, Lachlan Evenson, and Patrick McFadin by clicking the links below.-------------------Timestamps:(00:39): Kelsey Hightower(01:33): Lachlan Evenson(02:06): Patrick McFadin-------------------Links:Listen to Kelsey’s episodeListen to Lachlan’s episodeListen to Patrick’s episode

Apr 13, 2022 • 45min
Deep Fakes, Responsible Data Science, and Trust with David Danks
This episode features an interview with David Danks, Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. Prior to UCSD, David was the L.L. Thurstone Professor of Philosophy and Psychology at Carnegie Mellon University. David’s research interests are at the intersection of philosophy, cognitive science, and machine learning. He has also examined the ethics surrounding artificial intelligence in the fields of healthcare, privacy, and security. In this episode, David and Sam dive into responsible data science, deep fakes, and if data is to blame for the lack of trust among consumers.-------------------"There's a, almost, glorification of the technology that's happening at the moment. And the technology is obviously crucial, but what I really care about in a lot of ways is what are the human beings who build and use that technology doing with it? Because the exact same ones and zeros, the exact same code can lead to enormous social benefit or social harm, depending on what we humans do with it. And so, I think we need to recognize that technology is not this hurricane bearing down on us, it's a thing that people build and use. And how do we influence the people, and the companies is maybe an easier thing to do than trying to focus just on the data and algorithms." – David Danks-------------------Episode Timestamps:(01:41): What open source data means to David(05:58): David’s transition from philosophy to AI(09:03): Is data to blame for lack of trust in AI?(13:40): How to be “future aware”(16:32): Data science vs responsible data science(20:20): Deep Fakes(40:17): Advice for Ethical AI newcomers-------------------Links:Connect with David