#262 Setting the Groundwork to Become Data Driven - Interview w/ Corrin Shlomo Goldenberg

Oct 23, 2023

Scott interviews Corrin Shlomo Goldenberg, Senior Product Manager at BigPanda. They discuss the importance of data mesh in application development, challenges of data-driven decision-making, communication in data-driven organizations, generating and utilizing data for data products, and striving for improvement in data analysis.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

The Importance of Data Mesh in Application Development

01:45 • 33min

The Challenges of Data-Driven Decision-Making

34:47 • 12min

Importance of Communication in Data-Driven Organizations

46:48 • 8min

Generating and Utilizing Data for Data Products

54:50 • 8min

Striving for Improvement: The Concept of 'Good Enough'

01:03:03 • 2min

Thanking the Guest and Promoting Data Mesh Understanding

01:04:36 • 2min

Please Rate and Review us on your podcast app of choice!

Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts here.

Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Corrin's LinkedIn: https://www.linkedin.com/in/corrin/

In this episode, Scott interviewed Corrin Shlomo Goldenberg, Senior Product Manager of the Data Platform at BigPanda.

It's important to note that BigPanda is not at the stage yet where data mesh makes sense but this is a story of getting production of data into the heads and hearts of the application development team, which is a crucial aspect to doing data mesh well, whether it's done pre data mesh or as part of the journey.

Some key takeaways/thoughts from Corrin's point of view:

When doing data work, it's easy to fall into the trap of trying to do everything. Go back to product basics - start from the why, why are you doing this? If not, we just are creating new forms data swamps.
It's not uncommon for developers to think of data simply as what's in the database, especially in B2B startups. Sometimes you have to work with them to get them to really understand they need to be creating and storing data to be leveraged for analytics. It's not even data exhaust, sometimes the data doesn't even exist!
Related, many B2B companies feel they aren't data oriented enough. You can work to change that of course but know that almost everyone else feels the same; we all start our data journey somewhere, get inspired to go forward.
It's hard to pinpoint the time for a growing B2B company when it's actually time to start collecting and analyzing a lot of their data versus when it would be overkill/too early. Scott note: for larger organizations, look to have the conversation early in the lifecycle of any product - build a data sourcing strategy even if it's not implemented from day 1.
Obviously doing data work isn't free - make sure you have the conversation about when to flip the switch. It's often driven by someone wanting a report or information, prepare ahead of that.
An indicator you need to be preparing more data is when the product managers are struggling to answer basic questions. Often it's the 'how many' type questions that shouldn't be hard to know.
?Controversial?: When getting started with development teams even understanding data work, it's far easier to have that data work centralized in a data team. You can decentralize over time but introducing them to the idea of data work and a data platform in general while trying to hand over ownership might be too much. Scott note: this probably isn't really controversial as much as an inconvenient truth.
Prioritization is key - not just what you work on but what is the incremental value of different aspects of work. Look to make sure you can justify what work you are doing - circle back to 'the why'.
Ownership isn't just about who owns the work but who owns the outcomes. Focusing on the work over the target outcome is not likely to end well.
Similarly, ownership isn't always black and white. While a team owns their domain, a central data team will often own the data related to the domain. Partnership is crucial, teamwork makes the dream work.
Use good product management practices - just building something won't automatically create usage. Talk to your constituents and help them understand what you've built and why to drive more usage.

Corrin started with the tale of BigPanda and how she started building out their data, ML, and analytics capabilities. When she came in, they didn't have the infrastructure or really the focus on a scalable platform for storing and analyzing their internal data. They were doing a lot of this for external clients but hadn't moved to doing it internally, which is pretty common in B2B startups. But BigPanda wanted to do a data driven transformation of their business model so they had to change the situation around their internal data.

There is always a balance for when you start collecting data at scale in Corrin's mind. At a B2B startup, you need to ask how early should it be for the company but the same is applicable for an early-stage offering at a larger organization. Most development teams aren't tasked with dealing with creating the necessary data until far later in an offering's lifecycle but it would be nice if you could include it at the start. But it definitely isn't free so there is always a balance and the conversations need happen, hopefully earlier than later.

Corrin's tipping point for when you should really start to press development teams on creating necessary data is when it becomes hard to answer simple 'how many' type questions. It is also an easier conversation than a hypothetical one. If it takes more than a day to get basic information on how your customers are using your product, that's obviously an issue that's only going to grow. It's also a pretty tangible place to start.

When they started to build out the data platform, Corrin said it just made sense to start centralized. If the R&D team wasn't really thinking about data, trying to upskill them enough to take over the work entirely was probably a bridge too far. Plus, if your data requirements aren't complex enough to require decentralization, decentralization is often just an extra layer of complexity. So they moved to a high communication model where people can see what data work is happening even if it's controlled by the central team. They can slowly upskill the development teams to understand data instead of trying to hand over ownership prematurely.

Corrin talked about working with the team to understand the product mindset to data. Start from the why - it's easy to fall into the trap of trying to do everything because it might have value. That's what happened with data lakes that became data swamps. Focus people on the why and you can bring them more and more into working with data.

Similarly, while Corrin and team didn't have a lot of pushback on getting things done, she was very cognizant of prioritization and cost/benefit. Again, focusing on 'the why': what is most important and when? Why are the requirements like this? Can we cut the cost down by storing for less time and/or refreshing less often? When you say 'real time', what do you actually mean? Etc.

Corrin has been seeing good results from having strong ownership conversations. While the central team still owns the data, they are partnering with the domains as the domains still need to own the concepts and the understanding of the information. While this might not work at a large scale, it's perfectly normal and functional at a 300 person company. Scott note: centralization isn't the enemy until it becomes a bottleneck 😎

As with all global companies, BigPanda has some challenges around communication, per Corrin. Time zone differences and of course differences in focus are just two of them. So she recommends spending a lot of time to communicate to stakeholders about what you are building and why. It's easy to assume that because you build out a data product, people will use it but you have to work with people to ensure they actually use what you built.

Corrin pointed to the fact that many companies in the B2B space feel they aren't "data oriented" enough. She gave a few tips for how to become more data oriented but also has empathy for people feeling that - it's pretty common, most B2B companies feels they aren't as data oriented as everyone else. Similar to data mesh, where everyone believes all the other companies are far down their path. It's simply optics - companies project a better image than the reality of their situation with data.

Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf