Data Mesh Radio cover image

Data Mesh Radio

Latest episodes

undefined
Nov 9, 2023 • 20min

#267 The Developer Experience - How Do We Delight Data Developers? - Zhamak's Corner 30

Zhamak and a developer discuss the challenges of data product development and the need for better tooling. They explore the evolution of data tools like Docker and Cloud Foundry, emphasizing the importance of cohesion and simplified management. They also draw lessons from the software industry, highlighting the benefits of abstracting infrastructure and providing opinionated frameworks for developers.
undefined
4 snips
Nov 6, 2023 • 1h 4min

#266 Leveraging Decades of Information Architecture Learnings to Do Data Well - Interview w/ Akins Lawal

Akins Lawal, an Information Architecture expert, shares key takeaways from his interview on leveraging decades of information architecture learnings, highlighting the importance of good information architecture, prioritizing learning capacity, understanding the mindset of leaders and teams, building trust in data, implementing empathy in data literacy, incentivizing and recognizing data professionals, and focusing on user experience in data products.
undefined
Nov 5, 2023 • 12min

Weekly Episode Summaries and Programming Notes – Week of November 5, 2023

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/Please Rate and Review us on your podcast app of choice!If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Nov 3, 2023 • 17min

#265 Are You (Even) Doing Data Mesh™? - Mesh Musings 55

The podcast discusses the importance of moving towards Data Mesh and explores the difference between actively engaging in it and questioning it. They also delve into the semantics and progress of the DataMesh journey, emphasizing the importance of starting somewhere and applying the pillars. Overall, it highlights the significance of decentralized data and invites more people to join the Data Mesh community.
undefined
7 snips
Oct 30, 2023 • 51min

#264 Will GenAI and Data Mesh Really Mix? - Interview w/ Madhav Srinath

Please Rate and Review us on your podcast app of choice!Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereEpisode list and links to all available episode transcripts here.Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.Madhav's LinkedIn: https://www.linkedin.com/in/madhavsrinath/In this episode, Scott interviewed Madhav Srinath, CEO at Nexusleap.Overall, we are super early in the Generative AI cycle and hype is huge. This discussion is one of early impressions, not fully formed answers. It's far too early for that.Also, FYI, there were some technical difficulties in this episode where the recording kept shutting down and had to be restarted. So thanks to Madhav for sticking through and hopefully it isn't too noticeable. Generative AI will mostly be shortened to GenAI throughout these notes. LLM stands for large language models which power GenAI.Some key takeaways/thoughts from Madhav's point of view:?Controversial?: An emerging best practice seems to be having layers of LLMs - one model where you might ask it complicated questions and the second model is trained specifically to vet the answers for correctness and governance concerns.The cost of running many models in production is typically actually quite low, at least infrastructure wise. Instead of an always-on architecture, most organizations are leveraging a serverless architecture - or leverage APIs from others providing the models - so they essentially only pay a few cents per query.?Controversial?: Use GenAI as a "scalpel, not a broadsword". Many are trying to use them in overly broad ways and getting not great results.The ability to take a mountain of data and get something out of it in a structured way isn't a new concept. We've been trying to do that with data mining for years. It's just that it is finally maturing into something more widely useful/usable with GenAI.People are generally still only trying to solve pretty shallow problems with GenAI, e.g. writing an article. Scott note: That's probably good because most aren't ready to do the work necessary to have GenAI be usable for much deeper use cases?Controversial?: We may need human handlers for LLMs to do GenAI well. If we aren't sure of the quality of the answers and we need high quality answers, there needs to be guardrails and probably a human in the loop. Scott note: this might prove to be better than having the human just do the analysis or not, remains to be seen.If you have the right guardrails in place, there isn't really any harm in starting to work with GenAI. But, you have to understand it's early days and there are definite sharp corners for you to cut yourself on - that human in the loop is important for a myriad of reasons and you have to be careful around things like privacy.?Controversial?: It's better to start at domain level questions and focus on domain-specific problems right now with GenAI. That way, you can more easily control the inputs you feed it and it can help with more specific, targeted questions.Look at machine learning use cases. Creating narrow focuses for each model has been proven to be a far better strategy instead of creating one overarching model to try to solve many problems. Why not try the same with LLMs, creating models specific to topics?Relatedly, you can add more focus areas to an LLM as you train it. Trying to get it to understand everything at the start will likely overwhelm the LLM to a point where the quality of your answers will fall.LLMs can be used to infer relationships between domains or data products. You still have to point them at high quality data and you need someone to check their work but they could be used to more easily find out where data products already are or should be interoperable.A potential good use case is to have GenAI models focused on finding those potential relationships and then use a second GenAI model that's more targeted at finding information based on those relationships.?Controversial?: You shouldn't be training GenAI models from scratch. Start with one of the many open source models available and train it on your specifics. Leverage work that others have already done for you. You can train them by having your business specialists share information with the LLMs.Since everyone essentially has access to the same models, companies will differentiate on the information - and especially the quality of the information - they feed their LLMs.?Controversial?: GenAI may be more useful for data producers than data consumers. They still need to focus on the fundamentals but GenAI can really make them more productive. Scott note: imagine being able to get 5 sample data models or be able to ask an LLM to figure out the best way to make your data fit with other data products to be interoperable.For Madhav, a lot of what Generative AI has become is the concept of data mining with a personable interface. We've been trying to create a way to dig into data that is unstructured and get some insights or information - something that is structured - for a while. The concept isn't new but we've finally found something that might actually be able to do it well and make the outputs easy to consume.Right now, in Madhav's view most of the emerging GenAI use cases have been pretty shallow, such as to help write an article. It probably can get far deeper but it's quite early days. However, we probably need to put a human in the loop to actually make sure the answers LLMs are giving are correct. That might be the best option in his view, something like a guide plus guardrails driven by a human to make sure these LLMs don't hallucinate. In a way, this isn't all that different from other machine learning work - black boxes tend to have unexpected consequences.Madhav's view is that it's totally okay to start working with GenAI at your organization as long as you understand GenAI/LLMs have quality issues right now and are really only at the MVP stage in many senses - especially if you are using them internally on your own data. There will probably be good ways to put a wrapper around them to prevent improper data usage/leakage and prevent hallucinations as well. Starting with domain specific questions/problems is where Madhav thinks people should focus their GenAI work. If you try to feed an LLM a ton of information from many sources, you can't really be sure of the logic it uses to generate answers versus keeping the inputs tighter and asking about more specific business areas. Keeping that tighter focus and having many LLMs across the organization gives you an ability to more tightly focus your models on specific topics. You can then attempt to add additional focus areas to those LLMs once you have the model performing well on one topic.While LLMs aren't magic, Madhav is seeing an emerging use case where people point the LLMs at maybe two data products and ask it to infer relationships between them. It might discover something people haven't thought of before. You still need a human in the loop or you end up with something like the Pastafarian 'belief' that global temperature rise since the 1700s is caused by the decreasing number of pirates globally - correlation doesn't equal causation. It's not a magic wand but it could help people find more places where data is already interoperable or should be. Then, he's seeing once those relationships are discovered, a differently tuned GenAI model is used to actually infer some information based on those newly discovered relationships. Again, specialized models.Madhav doesn't believe most organizations should be training their GenAI models from scratch. Instead, go and find the open source models and add your necessary information to the training - basically, why start at zero to train it to one when you can start at 0.7? Leverage the work others are doing so you don't need super expensive LLM training focused engineers. By starting with an existing base model, you can tune it based on your own answers instead of trying to feed it very specific data and supervise the base-level initial training. Leverage your business subject matter experts and get them to share their tribal knowledge with the LLMs as well.Circling back to the idea of layered LLMs, Madhav talked about how some organizations are having models specifically tuned to answer questions about data but then there is a secondary LLM that is focused on checking the answers the first one gives for sanity/correctness as well as around governance, e.g. security and privacy concerns. Again, that separation of the work. And it's not nearly that expensive when it's all done in a serverless way - if your LLMs are becoming cost prohibitive, you are probably not running them in a cost effective way.Madhav has a few strong feelings around what organizations should be doing with LLMs. The first is that most should not be trying to train their own LLMs - the time and cost just don't make that much sense when open source models are advancing at frankly remarkable speeds. The second is that GenAI is probably more helpful for data producers than data consumers. It really can make producers far more productive, e.g. letting them generate insights on their own data or helping them to find good interoperability points with their data and other data products.Learn more about Data Mesh Understanding: https://datameshunderstanding.com/aboutData Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see hereAll music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
undefined
Oct 29, 2023 • 15min

Weekly Episode Summaries and Programming Notes – Week of October 29, 2023

The podcast discusses practical applications of LLMs and the concept of DataMesh. Madoff Shrinoff talks about the early stage of generative AI and challenges of using large language models. The chapter also explores JNI models, layered LLMs, and their cost-effective implementation.
undefined
Oct 27, 2023 • 59min

#263 Panel: Applying Site Reliability Engineering Practices to Data - Led by Emily Gorcenski w/ Amy Tobey and Alex Hidalgo

In this panel discussion, Emily Gorcenski, Amy Tobey, and Alex Hidalgo explore the application of reliability engineering practices to data. They discuss the concept of data mesh, the challenges of natural boundaries in data pipelines, the importance of data ownership and cloud principles, and the complexities of bringing Site Reliability Engineering theory to the data space. They also touch on the positive shift in enterprise mindset, the need for change in ticket closure, and the significance of observability in the data space.
undefined
Oct 23, 2023 • 1h 7min

#262 Setting the Groundwork to Become Data Driven - Interview w/ Corrin Shlomo Goldenberg

Scott interviews Corrin Shlomo Goldenberg, Senior Product Manager at BigPanda. They discuss the importance of data mesh in application development, challenges of data-driven decision-making, communication in data-driven organizations, generating and utilizing data for data products, and striving for improvement in data analysis.
undefined
Oct 22, 2023 • 27min

Weekly Episode Summaries and Programming Notes – Week of October 22, 2023

This weekly summary episode introduces upcoming episodes on becoming data-driven and applying site reliability engineering practices to data. The importance of reliability engineering in data operations is discussed, as well as Big Panda's journey towards becoming data driven. The top eight takeaways from a panel discussion on applying reliability engineering to data are presented, along with a discussion on legacy systems, interconnectedness, and observability in data.
undefined
Oct 20, 2023 • 21min

#261 Just What the Heck is a Data Product Container? - Zhamak's Corner 29

This podcast explores the concept of data product containers and their importance in simplifying the sharing and delivery of data products. It discusses the decentralized architecture in data mesh and features a guest from Nix data, a data mesh technology company. The concept of data containerization and its potential impact on the data industry is also explored, highlighting the need for data product containers as a new atomic unit for data. The speaker discusses the concept of a data product quantum and Next Data's approach, emphasizing the shift away from a centralized approach.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode