
The Engineering Leadership Podcast
Building data engineering teams from scratch & transitioning to a full-scale data function w/ Colleen Tartow #160
Episode guests
Podcast summary created with Snipd AI
Quick takeaways
- Building a data engineering team requires aligning hiring with business problems and finding senior individuals comfortable with ambiguity and proposing solutions.
- Defining the charter, hiring experts in SQL and Python, and starting small with senior team members are crucial in the early days of data engineering.
- When data becomes complex and interpretation matters, a data program and dedicated data engineering team are needed, with an owner for the data and a clear understanding of business pain points.
- Collaboration and empathy between data engineering and software engineering are essential to avoid friction points and align organizational goals.
- The rise of AI brings challenges in handling massive data volumes and rethinking existing data pipelines, requiring collaboration and alignment between data engineering and software engineering.
- Data engineering should be considered a core product, with clear expectations, collaboration, and agile methodologies in onboarding and chartering a team.
Deep dives
Building a data engineering org from scratch
When building a data engineering team from scratch, it's important to consider the business problems that need to be solved and align the hiring process with those needs. Finding senior individuals who are comfortable with ambiguity and can propose solutions to the business problems is crucial. Start with a small team and iterate based on company goals, fostering data literacy and bringing data into the conversation. Setting clear expectations and milestones is important in the early stages.
Building a data engineering team in the early days
In the early days of data engineering, when the term wasn't even well-defined, it was important to define the charter of the team and understand the business problems that needed to be solved. Hiring experts in SQL and Python and defining the skill sets required were crucial. Starting small with senior team members who can thrive in an ambiguous environment and gradually adding junior members once the mission of the team is defined is effective. It is important to think through the complexity of the data and the skill sets needed to solve the problems.
Threshold moment for building a data program
The threshold moment when a data program needs to be built is when the data can no longer be managed as a side project. This often happens when the complexity of the data and the business requirements grow. When data is changing frequently and the interpretation of the data becomes important, a data program and dedicated data engineering team are needed. It is crucial to have an owner for the data and a clear understanding of the business pain points that data will help solve.
Collaboration between data engineering and software engineering
Collaboration between data engineering and software engineering is essential as data becomes the product and directly impacts revenue streams. It is crucial to foster empathy and create a collaborative environment. Setting clear expectations and communication channels is important to avoid friction points, such as changing the schema without informing the data engineering team. Leadership alignment and a clear understanding of the organizational goals between the two teams are vital.
AI's impact on data engineering
The rise of AI and ML brings challenges to data engineering, as it requires handling massive data volumes and building a specialized AI stack. The need for GPUs, specialized storage, and rethinking existing data pipelines arise. The complexity and power consumption of AI pose new considerations for data engineering. Evaluating the purpose and feasibility of AI initiatives in relation to data volumes and infrastructure resources is necessary. Collaboration and alignment between data engineering and software engineering in managing data as a product become more essential.
Data engineering and AI sustainability
AI's energy consumption and sustainability are emerging topics. The power consumption of AI poses challenges, and the energy needs of AI should be considered. The trend of making AI more sustainable is gaining attention and may lead to future discussions and innovations in this area.
Data as a product and ideal collaboration
Data is no longer a side project, but a core product that affects other products and revenue streams. Ideal collaboration involves making data engineering an essential part of the engineering function and aligning goals between software engineering and data engineering. Creating a data contract and fostering communication and empathy are key to avoiding friction. The kindest person in the room is often the smartest.
Onboarding and chartering data engineering teams
Onboarding and chartering a data engineering team requires clear expectations and milestones. Understanding the business problems to be solved and hiring individuals who can propose solutions are critical. Defining the team's charter, evaluating existing pain points, and aligning the team's mission with company goals are important steps. It is crucial for data engineering leaders to communicate effectively, foster collaboration, and adapt agile methodologies to data engineering.
The impact of AI on data pipelines
AI's impact on data pipelines and the need for processing massive volumes of data call for rethinking existing data stacks. The complexity of transforming existing transactional and BI stacks to support AI workloads arises. It is important to consider where and how AI needs to be applied and select the right technology stack to handle the requirements. Minimizing data movement and reevaluating existing processes are essential for managing AI-related data workflow effectively.
Power of communication and collaboration
Overall, effective communication, collaboration, and alignment between software engineering and data engineering, understanding business needs, setting clear expectations, and evolving existing processes are key to building successful data engineering organizations. Collaboration should be fostered at all levels, from organizational leadership to individual team members, with a focus on empathy and shared goals.
The future challenges and considerations
The future of data engineering includes addressing emerging challenges in AI, such as power consumption and sustainability. As technology evolves, data engineering must adapt to new requirements and technologies. Constant evaluation, rethinking, and innovation are necessary to meet the evolving demands of data engineering.
As the Field CTO & Head of Strategy @ VAST Data, Colleen Tartow, Ph.D., has a vast resume of building data engineering teams from scratch and beyond. Colleen discusses the necessary components for developing new or reorienting existing data programs, strategies for effective communication & collaboration between data & eng functions, the implications of AI technology on data engineering, and integrating cross-functional partners into the data eng planning process & road map. Plus Colleen shares about building the hiring process for data eng functions, when the “data engineering” term or role didn’t exist yet, and how you can apply that to other emerging or undefined functions!
ABOUT COLLEEN TARTOW
Colleen Tartow, Ph.D. is Field CTO and Head of Strategy at VAST Data and has 20+ years of experience in data, analytics, engineering, and consulting. Adept at assisting organizations in deriving value from a data-driven culture, she has successfully led diverse data, engineering, and analytics teams through the development of complex global data management solutions and architecting enterprise data systems. Her demonstrated excellence in data, engineering, analytics, and diversity leadership makes her a trusted senior advisor among executives. An experienced speaker, author, valued mentor and startup advisor, Colleen holds degrees in astrophysics and lives in Massachusetts.
"Everyone wants to be data driven, right? Like no one's going to say, 'No, we don't want data. We just want to function with opinions.' Like nobody's actually going to say that. But that said, getting started on that can be really challenging...
With anything, you have to go back to what does the business really need. Going back to the revenue drivers and the business pain points that you're going to help solve, whether it's monetizing your data directly or using data as an enablement function to actually help in other areas and so I think getting the organization to understand that data is a product of the business and then sort of working back from there into what does that specifically mean.”
- Colleen Tartow
Interested in joining an ELC Peer Group?
ELCs Peer Groups provide a virtual, curated, and ongoing peer learning opportunity to help you navigate the unknown, uncover solutions and accelerate your learning with a small group of trusted peers.
Apply to join a peer group HERE: sfelc.com/peerGroups
SHOW NOTES:
- Colleen’s experience building a data program from scratch (2:25)
- What it used to be like building a data engineering team (4:43)
- Narrowing to first principles when hiring for / building a data eng team (6:44)
- Frameworks to advocate for more resources to build your org’s data function (7:53)
- Knowing when you need to transition your data side project to a full data program (10:11)
- Building data teams from a zero to one perspective (13:05)
- What “onboarding as discovery” conversations look like (14:38)
- Joining an existing team to implement a defined data-focused function (16:14)
- How to have effective conversations & collaborate with other eng functions (19:19)
- Prioritization strategies when refocusing / creating the data eng org roadmap (21:20)
- How to integrate cross-functional partners into the data eng planning process (22:51)
- The implication of AI on data teams & its intersection with eng teams (24:09)
- Colleen’s decision-making framework (27:54)
- Recommendations for tackling complex data pipelines in different ways (29:27)
- Navigating the paradigm of AI & data eng’s impact on other eng orgs (31:31)
- What the ideal collaboration between data & eng looks like (34:01)
- Recommendations for dealing with points of friction (35:21)
- Steps for aligning data & eng under the same goals (37:16)
- Rapid fire questions (39:04)
LINKS AND RESOURCES
- The Lioness of Boston - Emily Franklin’s deeply evocative novel of the life of Isabella Stewart Gardner, a daring visionary who created an inimitable legacy in American art and transformed the city of Boston itself.
This episode wouldn’t have been possible without the help of our incredible production team:
Patrick Gallagher - Producer & Co-Host
Jerry Li - Co-Host
Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/
Dan Overheim - Audio Engineer, Dan’s also an avid 3D printer - https://www.bnd3d.com/
Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/