Open||Source||Data

Charna Parkey
undefined
Apr 13, 2022 • 45min

Deep Fakes, Responsible Data Science, and Trust with David Danks

This episode features an interview with David Danks, Professor of Data Science and Philosophy and affiliate faculty in Computer Science and Engineering at University of California, San Diego. Prior to UCSD, David was the L.L. Thurstone Professor of Philosophy and Psychology at Carnegie Mellon University. David’s research interests are at the intersection of philosophy, cognitive science, and machine learning. He has also examined the ethics surrounding artificial intelligence in the fields of healthcare, privacy, and security. In this episode, David and Sam dive into responsible data science, deep fakes, and if data is to blame for the lack of trust among consumers.-------------------"There's a, almost, glorification of the technology that's happening at the moment. And the technology is obviously crucial, but what I really care about in a lot of ways is what are the human beings who build and use that technology doing with it? Because the exact same ones and zeros, the exact same code can lead to enormous social benefit or social harm, depending on what we humans do with it. And so, I think we need to recognize that technology is not this hurricane bearing down on us, it's a thing that people build and use. And how do we influence the people, and the companies is maybe an easier thing to do than trying to focus just on the data and algorithms." – David Danks-------------------Episode Timestamps:(01:41): What open source data means to David(05:58): David’s transition from philosophy to AI(09:03): Is data to blame for lack of trust in AI?(13:40): How to be “future aware”(16:32): Data science vs responsible data science(20:20): Deep Fakes(40:17): Advice for Ethical AI newcomers-------------------Links:Connect with David
undefined
Mar 30, 2022 • 37min

Cloud Innovation, Analytics, and Data Transformation with Monica Kumar

This episode features an interview with Monica Kumar, Senior Vice President of Marketing and Cloud-Go-To Market at Nutanix. Nutanix is a data platform that is redefining workloads in cloud environments. Prior to Nutanix, Monica spent two decades at Oracle where she launched several market solutions. Monica is passionate about positioning and supporting women in leadership roles. She is a founding limited partner of Neythri Futures Fund, a venture fund dedicated to bringing South Asian women into the investment community. Monica also serves on the board of Directors at Watermark, an organization dedicated to women in leadership. In this episode, Monica and Sam discuss the evolving world of marketing analytics, tech’s biggest innovation to date, and how the data industry can change for the better.-------------------“I believe that cloud has now become more of an operating model. It started out in the public cloud, but now organizations have adopted the same philosophy of self-service, metering, chargeback, quick deployment, on-demand deployment, on-premises as well. So, my assertion is that cloud has become more of an operating model than a location. And what we’re going to see going forward more and more is this notion of multi-cloud and hybrid multi-cloud data platforms that would be able to access data from multiple locations and be able to provide on top of that, the analysis that the user is looking for.” – Monica Kumar-------------------Episode Timestamps:(02:23): What open source data means to Monica(12:37): The evolving world of marketing insight analytics(16:42): How remote work is changing the industry for the better(20:25): How Monica supports diverse entrepreneurs (24:11): The transformation of a database to a data platform(26:37): Why the cloud is tech’s biggest innovation(29:23): What’s next for data storage in 5 years?(29:41): Monica’s analogy for data storage(33:34): Monica’s advice for newcomers -------------------Links:LinkedIn - Connect with MonicaLinkedIn - Connect with NutanixTwitter - Follow MonicaTwitter - Follow NutanixVisit NutanixThe Neythri Futures Fund
undefined
Mar 16, 2022 • 30min

Data Lakehouses, Interoperability, and Accessibility with Tomer Shiran

This episode features an interview with Tomer Shiran, Founder and Chief Product Officer at Dremio. Dremio is a high-performance SQL lakehouse platform that helps companies get more from their data in the fastest way possible. Prior to Dremio, Tomer served as VP of Product at MapR and also held product management and engineering roles at Microsoft and IBM Research. He also has a master’s degree from Carnegie Mellon University as well as a bachelor’s from Technion - Israel Institute of Technology.In this episode, Tomer and Sam dive into the economics of storing data, how to build an open architecture, and what exactly a data lakehouse is.-------------------“I think in the world of data lakes and lakehouses, the model has shifted upside down. Now, instead of bringing the data into the engines, you’re actually bringing the engines to the data. So you have this open data tier built on open source technology. The data is represented in open source formats and stored in the company’s S3 account or Azure storage account. And then you can use a variety of engines. We at Dremio, we take pride in building the best SQL engine to use on the data. There are different streaming engines, like Spark and Flink. There are different batch processing and machine learning engines. Spark is an example of that as well that companies can use on that same data. And I think that’s one of the really important things from a cost standpoint, too, is that this really lowers your overall costs, both today and also in the future as you scale.” – Tomer Shiran-------------------Episode Timestamps:(02:04): What open source data means to Tomer(03:14): Tomer’s motivation behind Apache Arrow(06:42): How Tomer solved data accessibility (08:43): The unit economics of storing data(14:31): Tomer’s motivations for Iceberg and how it relates to Project Nessie(17:06): What is a data lakehouse?(18:31): What gives Dremio its magic?(23:39): What cloud data architecture will look like in 5 years(27:19): Advice for building an open data architecture-------------------Links:LinkedIn - Connect with TomerLinkedIn - Connect with DremioTwitter - Follow TomerTwitter - Follow DremioVisit DremioGet started with Dremio
undefined
Mar 2, 2022 • 31min

Interoperability, Governance, and Divergent Teams with Prukalpa Sankar

This episode features an interview with Prukalpa Sankar, Co-Founder of Atlan. Atlan is a venture-backed startup building a modern data workspace. Prukalpa also co-founded SocialCops, a data for good company behind landmark projects such as India’s National Data Platform. Prukalpa is a recognized industry leader, landing on the Forbes 30 Under 30 list and Fortune’s 40 Under 40.In this episode, Prukalpa and Sam discuss how diversity is a data team’s biggest strength, why governance isn’t always a bad thing, and what they hope the modern data stack will look like in 5 years.-------------------“Diversity is our biggest strength but our biggest weakness, because it's really hard to make that team collaborate. Because most of the teams in the world are very uniform. So when every single person in the room is a subject matter expert on something, nobody else actually can have oversight on each other's work because they've never done it before. Then how do you create true trust? How do you create trust when things are breaking? If you're able to create a way for these diverse people to collaborate really effectively, to be a dream team, a dream data team where they trust each other and they can collaborate effectively, then magic can happen.” – Prukalpa Sankar-------------------Episode Timestamps:[01:55]: What open source data means to Prukalpa[05:38]: Prukalpa’s journey to data for good movement[04:51]: How Prukalpa and her team provided gas to 80 million Indian women[06:33]: How diversity can help a data team succeed[15:10]: What gives Atlan its magic[18:58]: How being open by default influenced Atlan’s architecture choices[22:45]: The reality of the modern data stack in 5 years[27:36]: Advice for people getting started with DataOps-------------------Links:LinkedIn - Connect with PrukalpaLinkedIn - Connect with AtlanTwitter - Follow PrukalpaTwitter - Follow AtlanVisit Atlan
undefined
Feb 16, 2022 • 37min

Trust, Automation, and Trade-Offs with Joseph Jacks

This episode features an interview with Joseph Jacks, Founder and General Partner of OSS Capital. OSS Capital is the first and only COSS (Commercial Open Source Software) company investor that focuses on supporting early-stage COSS founders. Joseph, also known as JJ, has worked at Mesosphere, TIBCO Software, and Talend in various sales, engineering, and strategy roles. In this episode, JJ and Sam weigh the trade-offs of open and closed core companies and discuss how each can go public. JJ also dives into the misconception of trust equating privacy within tech. Guest Quote [25:14]: “There’s a societal recognition that if you use technology to automate some part of your life and you use that regularly, you have to be able to trust it. And I think gradually, consumers are becoming more and more aware that one of the most effective ways of checking the trust box is answering the question, ‘Is the technology I'm using open source at the core, yes or no?’ And if the answer is no, I think it's very difficult and a lot harder to achieve the levels of trust that you can if the answer is yes.” – Joseph Jacks Time Stamps [12:59]: The difference between open and closed core companies [17:23]: Understanding the trade-off between open and closed source [18:23]: Trends within open source data companies [20:21]: Is it possible to go public as a closed source database? [22:35]: Leveraging the automation opportunity of open source systems [23:47]: How can consumers trust the technology they’re using? [34:01]: Advice for those starting open source projects Links LinkedIn - Connect with JJ LinkedIn - Connect with OSS Capital Twitter - Follow OSS Capital Visit OSS Capital See omnystudio.com/listener for privacy information.
undefined
Feb 2, 2022 • 47min

Open Source, Adoptability, and Name Changes with Martin Traverso

This episode features an interview with Martin Traverso, CTO at Starburst Data and Co-founder of Trino, a lightning fast distributed SQL query engine. Martin was previously a software engineer at Facebook where he led the Presto (now Trino) development team. Trino has gained worldwide adoption from companies like Netflix, Amazon, and LinkedIn. In this episode, Martin sits down with Sam to discuss the barriers, advantages, and complications of going open-source. Episode Notes -Guest Quote [33:55]: “What makes Trino powerful is the ecosystem around it. You have integrations with all sorts of data sources and that’s part of the power and magic of Trino. You can pull data from all these data sources using a single interface. On the other end is the integrations with all the tools that everyone uses. Once you put all those pieces together, that’s what gives Trino the power.” -Time Stamps [8:38]: How Martin solved Facebook’s analytics problem [13:00]: How the team adapted to customers’ needs [17:07]: What makes Trino stand out among other query engines [19:42]: Going open-source changes the game [30:14]: Presto becomes Trino [33:24]: What gives Trino its magic [35:19]: What Trino’s community looks like today [38:34]: Advice for those starting open-source projects -Links Blog - Intro to Trino for the Trinewbie Trino Community Broadcast - Subscribe GitHub Trino repository - Give Trino a star LinkedIn - Connect with Martin Trino Meetup - Join Play with Trino Rebrand from Presto to Trino - Learn More Slack - Join Trino Trino: The Definitive Guide (Download a free copy) Twitter - Follow Martin Twitter - Follow Trino See omnystudio.com/listener for privacy information.
undefined
Oct 29, 2021 • 24min

Season Two Finale and Recap with Open||Source||Data Producer Audra Montenegro

Join Open||Source||Data producer Audra Montenegro as she and Sam cover highlights and takeaways from the ten episodes of season two. And get a sneak peak of what's in store for season three!See omnystudio.com/listener for privacy information.
undefined
Oct 14, 2021 • 31min

Embeddings, Feature stores, and MLOps with Simba Khadder

Join CEO of Featureform, Simba Khadder as he talks with Sam about how versioning, immutability, and sharing will accelerate ML workflows. Tune-in on state of the art collaboration in data teams, and the power of focusing on your north star.See omnystudio.com/listener for privacy information.
undefined
Sep 30, 2021 • 29min

Abundance, Metadata, and Automation with Mark Grover

How can we make data 10X more accessible for data-driven people within data-driven companies? Tune in to Mark and Sam discussing probabilistic product management, and the emerging metadata ecosystem.See omnystudio.com/listener for privacy information.
undefined
Sep 16, 2021 • 36min

Metadata, Communities, and Architecture with Shirshanka Das

How can we evolve an expanding ecosystem of data technologies while making sense of the whole? Tune in to LinkedIn DataHub, and Acryl Data founder, Shirshanka Das, as he and Sam have a discussion on metadata at the center and specialization at the edge to sustainably scale data governance.See omnystudio.com/listener for privacy information.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app