#255 Zhamak's Corner 28 - Generative AI and Data Mesh: The Start of a Long Road
Sep 29, 2023
auto_awesome
The podcast discusses the potential of generative AI and its intersection with data mesh. It highlights the importance of democratizing the ability to leverage these tools and the challenges of providing quality data. The speaker also introduces NextData, a startup focused on easing data product creation.
Generative AI should be approached with caution, focusing on responsible and intentional data sharing for high-quality results.
Data mesh and decentralized data ownership are key to shifting the focus towards intentional and high-quality data sharing, avoiding imbalanced power dynamics.
Deep dives
The Vision of DataMesh and Next Data
Jean-Mack Thigani, the founder of DataMesh, envisions a world where data is treated as a product and can be shared independently. The goal is to empower AI, ML, and analytics through responsibly shared and independently owned data. Next Data, Jean-Mack's startup, aims to democratize decentralized data ownership and architectures to enable more companies to leverage the power of data, rather than being monopolized by a few organizations with the resources to collect massive amounts of data. The emphasis is on building a future where technology does not rely on an imbalance of power based on centralized data control. Jean-Mack believes in the importance of critical thinking when applying generative AI, as responsible and intentional data sharing is crucial for achieving high-quality results.
The Potential of Generative AI
Generative AI has immense potential but also presents challenges. While Jean-Mack is optimistic about the positive outcomes it can produce, there is a need for caution. Rushing to feed poorly cleaned and low-quality data into large language models can lead to detrimental outcomes for companies. Instead, a focus on responsible data sharing and high data quality is essential. Data mesh provides building blocks to address these challenges and ensure that generative AI is applied with intentionality and reliability. The discipline and attention given to data quality in enterprises are encouraging signs. Jean-Mack emphasizes that even with advances in generative AI, the importance of timely, high-quality data cannot be overlooked.
Data Quality and the Urgency for Responsible Data Sharing
The seductive nature of generative AI can divert attention and resources from responsible data sharing practices. Jean-Mack stresses the need to avoid using centralized data collection methods and highlights the dangers of relying on imbalanced power dynamics created by only a few organizations having access to vast amounts of data. It is essential to bridge the democratization of technology access with equitable access to data. Data mesh and decentralized data ownership play a pivotal role in shifting the focus towards intentional and high-quality data sharing. The conversation continues around the challenges of implementing generative AI, the importance of clean data, and the ongoing need to explore ethical practices in the application of technology.
This is just scratching the surface of generative AI and data mesh - we will have much deeper discussions in future episodes.
Zhamak believes generative AI has a ton of positive real world potential, especially in data mesh. Scott is more skeptical. But if things like GenAI are only able to be leveraged by a few large companies trying to collect as much information - especially sensitive information - as possible, there are some big potential societal issues that might come from that. We need to democratize the ability to leverage these types of tools.
ChatGPT set off a frenzy. It can be easy to want to move incredibly fast towards implementing generative AI. But companies don't have the vast amount of data where they can throw moderate - or worse - quality data and get something useful out. Garbage in, garbage out is a real concern.
Because they have less data than essentially the sum of the internet like OpenAI used for ChatGPT, companies need to focus on providing quality data into an LLM (large language model) in order for it to actually provide good results. Again, otherwise it is garbage in, garbage out.
Sponsored by NextData, Zhamak's company that is helping ease data product creation.