
Data Mesh Radio
#262 Setting the Groundwork to Become Data Driven - Interview w/ Corrin Shlomo Goldenberg
Episode guests
Podcast summary created with Snipd AI
Quick takeaways
- Recognizing key indicators for investing in data infrastructure can guide decision-making process.
- Effective communication and collaboration are crucial for a successful data platform.
- Establishing a robust data infrastructure and being mindful in decision-making are essential for effective data utilization and value creation.
Deep dives
Starting the Data Platform Journey
When the podcast guest joined Big Panda, the company had operational databases and some analytics capabilities, but there were many questions that couldn't be answered due to the lack of data. Recognizing the need for a larger and more maintainable data platform, the guest and a task force team embarked on building a centralized data platform. Their first use case was to support a new business model, leveraging the data already in their operational databases for analytics. They started with a basic infrastructure, including Snowflake, Upsolver, and dbt, and focused on democratizing the data while being mindful of costs. As the team grew and defined ownership boundaries, communication became crucial, both within the team and across global locations. Technical working groups, regular presentations, and direct communication channels were established to foster collaboration and ensure everyone had a voice in the data platform journey.
Indicators for Investing in Data
One key indicator for investing in data is when product managers start asking questions about specific data points and there is no easy access to that data. This indicates a need for a data platform to provide answers and insights. Another indicator is when R&D teams are unaware of the scale of data being processed in their systems, signaling a lack of visibility into data flow and the need for better measurement and analysis. Additionally, as a B2B company grows, there comes a point where important questions about the product and its performance cannot be answered without robust data analysis. Deciding the right time to invest in data infrastructure and resources is a challenge, but recognizing these indicators can help guide the decision-making process.
Enabling Communication and Collaboration
Effective communication and collaboration are crucial to the success of the data platform. The podcast guest implemented technical working groups, including representatives from various teams, to facilitate discussions and information exchange. Group presentations and regular updates helped disseminate information and gather feedback. The guest emphasized the importance of clear ownership and accountability for data domains, not only within the data team but also across the organization. Alongside internal communication, the guest highlighted the value of engaging external customers and understanding their needs. Over time, the team learned to adjust the composition of communication groups and involve engineers to ensure a holistic and informed approach to data management and decision-making.
Importance of Building a Strong Data Infrastructure
In this podcast episode, the speaker emphasizes the significance of establishing a robust data infrastructure within a company. They describe the process their team followed, which involved creating a streamlined process for data creation and review. This infrastructure not only allowed for easier data democratization but also ensured that data ownership rested with the respective teams. The speaker acknowledges that not all teams fully owned their objects initially due to readiness factors and challenges in defining the responsible team for certain data objects. Overall, the creation of a solid data infrastructure was seen as a crucial first step in enabling effective data utilization and value creation.
Importance of Mindful Decision-Making and Prioritization in Data Management
Another key point discussed in this podcast episode is the importance of being mindful and intentional in decision-making when it comes to data management. The speaker emphasizes the need to focus on the value and purpose of the data being collected and not simply accumulating data without a clear strategy. They highlight the role of product management in guiding this decision-making process and articulating the reasons behind data initiatives. Additionally, the speaker mentions the challenges of cost management and the need to balance real-time data requirements with the actual value it brings. Overall, the key takeaway is to approach data management with a clear understanding of the goals and value it can provide to both internal and external stakeholders.
Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Corrin's LinkedIn: https://www.linkedin.com/in/corrin/
In this episode, Scott interviewed Corrin Shlomo Goldenberg, Senior Product Manager of the Data Platform at BigPanda.
It's important to note that BigPanda is not at the stage yet where data mesh makes sense but this is a story of getting production of data into the heads and hearts of the application development team, which is a crucial aspect to doing data mesh well, whether it's done pre data mesh or as part of the journey.
Some key takeaways/thoughts from Corrin's point of view:
- When doing data work, it's easy to fall into the trap of trying to do everything. Go back to product basics - start from the why, why are you doing this? If not, we just are creating new forms data swamps.
- It's not uncommon for developers to think of data simply as what's in the database, especially in B2B startups. Sometimes you have to work with them to get them to really understand they need to be creating and storing data to be leveraged for analytics. It's not even data exhaust, sometimes the data doesn't even exist!
- Related, many B2B companies feel they aren't data oriented enough. You can work to change that of course but know that almost everyone else feels the same; we all start our data journey somewhere, get inspired to go forward.
- It's hard to pinpoint the time for a growing B2B company when it's actually time to start collecting and analyzing a lot of their data versus when it would be overkill/too early. Scott note: for larger organizations, look to have the conversation early in the lifecycle of any product - build a data sourcing strategy even if it's not implemented from day 1.
- Obviously doing data work isn't free - make sure you have the conversation about when to flip the switch. It's often driven by someone wanting a report or information, prepare ahead of that.
- An indicator you need to be preparing more data is when the product managers are struggling to answer basic questions. Often it's the 'how many' type questions that shouldn't be hard to know.
- ?Controversial?: When getting started with development teams even understanding data work, it's far easier to have that data work centralized in a data team. You can decentralize over time but introducing them to the idea of data work and a data platform in general while trying to hand over ownership might be too much. Scott note: this probably isn't really controversial as much as an inconvenient truth.
- Prioritization is key - not just what you work on but what is the incremental value of different aspects of work. Look to make sure you can justify what work you are doing - circle back to 'the why'.
- Ownership isn't just about who owns the work but who owns the outcomes. Focusing on the work over the target outcome is not likely to end well.
- Similarly, ownership isn't always black and white. While a team owns their domain, a central data team will often own the data related to the domain. Partnership is crucial, teamwork makes the dream work.
- Use good product management practices - just building something won't automatically create usage. Talk to your constituents and help them understand what you've built and why to drive more usage.
Corrin started with the tale of BigPanda and how she started building out their data, ML, and analytics capabilities. When she came in, they didn't have the infrastructure or really the focus on a scalable platform for storing and analyzing their internal data. They were doing a lot of this for external clients but hadn't moved to doing it internally, which is pretty common in B2B startups. But BigPanda wanted to do a data driven transformation of their business model so they had to change the situation around their internal data.
There is always a balance for when you start collecting data at scale in Corrin's mind. At a B2B startup, you need to ask how early should it be for the company but the same is applicable for an early-stage offering at a larger organization. Most development teams aren't tasked with dealing with creating the necessary data until far later in an offering's lifecycle but it would be nice if you could include it at the start. But it definitely isn't free so there is always a balance and the conversations need happen, hopefully earlier than later.
Corrin's tipping point for when you should really start to press development teams on creating necessary data is when it becomes hard to answer simple 'how many' type questions. It is also an easier conversation than a hypothetical one. If it takes more than a day to get basic information on how your customers are using your product, that's obviously an issue that's only going to grow. It's also a pretty tangible place to start.
When they started to build out the data platform, Corrin said it just made sense to start centralized. If the R&D team wasn't really thinking about data, trying to upskill them enough to take over the work entirely was probably a bridge too far. Plus, if your data requirements aren't complex enough to require decentralization, decentralization is often just an extra layer of complexity. So they moved to a high communication model where people can see what data work is happening even if it's controlled by the central team. They can slowly upskill the development teams to understand data instead of trying to hand over ownership prematurely.
Corrin talked about working with the team to understand the product mindset to data. Start from the why - it's easy to fall into the trap of trying to do everything because it might have value. That's what happened with data lakes that became data swamps. Focus people on the why and you can bring them more and more into working with data.
Similarly, while Corrin and team didn't have a lot of pushback on getting things done, she was very cognizant of prioritization and cost/benefit. Again, focusing on 'the why': what is most important and when? Why are the requirements like this? Can we cut the cost down by storing for less time and/or refreshing less often? When you say 'real time', what do you actually mean? Etc.
Corrin has been seeing good results from having strong ownership conversations. While the central team still owns the data, they are partnering with the domains as the domains still need to own the concepts and the understanding of the information. While this might not work at a large scale, it's perfectly normal and functional at a 300 person company. Scott note: centralization isn't the enemy until it becomes a bottleneck 😎
As with all global companies, BigPanda has some challenges around communication, per Corrin. Time zone differences and of course differences in focus are just two of them. So she recommends spending a lot of time to communicate to stakeholders about what you are building and why. It's easy to assume that because you build out a data product, people will use it but you have to work with people to ensure they actually use what you built.
Corrin pointed to the fact that many companies in the B2B space feel they aren't "data oriented" enough. She gave a few tips for how to become more data oriented but also has empathy for people feeling that - it's pretty common, most B2B companies feels they aren't as data oriented as everyone else. Similar to data mesh, where everyone believes all the other companies are far down their path. It's simply optics - companies project a better image than the reality of their situation with data.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf