
Data Engineering Podcast How Data Engineering Teams Power Machine Learning With Feature Platforms
01:03:30
Features in ML
- Features, derived from raw data, are crucial for training ML models.
- They represent attributes of entities and can be simple or complex, capturing intricate patterns.
Feature Engineering vs. BI
- Feature engineering transforms raw data into usable features for ML algorithms.
- It differs significantly from traditional BI pipelines, as it's for machines, not human consumption.
Building Feature Stores
- Feature stores are crucial for ML architecture, enabling consistent pipeline deployment and low-latency serving.
- Leverage existing modern data stack's compute power to build feature stores, minimizing data inconsistency.
Get the Snipd Podcast app to discover more snips from this episode
Get the app 1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right 7 chevron_right 8 chevron_right 9 chevron_right 10 chevron_right 11 chevron_right 12 chevron_right 13 chevron_right 14 chevron_right 15 chevron_right 16 chevron_right 17 chevron_right 18 chevron_right 19 chevron_right 20 chevron_right 21 chevron_right 22 chevron_right
Introduction
00:00 • 2min
The Power of Analytics
01:39 • 2min
What Is a Feature?
03:16 • 3min
The Different Types of Data Engineering Pipelines
06:42 • 4min
The Importance of a Feature Store in Machine Learning
10:37 • 6min
The Lifecycle of a Machine Learning Pipeline
16:30 • 4min
The Role of the Data Engineer in the Life Cycle of a Data Scientist
20:31 • 3min
Interfaces for Data Scientists and ML Engineers
24:00 • 3min
The Importance of a Simple Declarative Language for Data Science
27:14 • 3min
The Importance of Self-Service Environments in Enterprise AI
30:03 • 2min
The Foundational Components of Data Discovery for ML Engineers
32:29 • 3min
The Importance of Communication in Feature Engineering
35:14 • 2min
The Not Invented Here Syndrome
37:26 • 3min
How to Maintain a Database of Features
40:24 • 3min
The Emergence of Descriptive Frameworks for Data Science
43:35 • 3min
Building a Tool Chain Around the Development and Serving and Maintenance of Features
46:17 • 2min
The Disparity Between Business Intelligence and More Point in Time Analytics
48:05 • 3min
The Unexpected Challenges of ML Model Development
50:50 • 2min
The Power and Criticality of Feature Engineering
52:36 • 2min
The Importance of Feature Engineering in Deep Learning
54:35 • 4min
The Biggest Gap in Data Management Tooling and Platforms
58:07 • 3min
Data Engineering and the Capabilities It Enables
01:00:54 • 2min
Summary
Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
- Your host is Tobias Macey and today I'm interviewing Razi Raziuddin about how data engineers can empower data scientists to develop and deploy better ML models through feature engineering
Interview
- Introduction
- How did you get involved in the area of data management?
- What is feature engineering is and why/to whom it matters?
- A topic that commonly comes up in relation to feature engineering is the importance of a feature store. What are the tradeoffs for that to be a separate infrastructure/architecture component?
- What is the overall lifecycle of a feature, from definition to deployment and maintenance?
- How is this distinct from other forms of data pipeline development and delivery?
- Who are the participants in that workflow?
- What are the sharp edges/roadblocks that typically manifest in that lifecycle?
- What are the interfaces that are needed for data scientists/ML engineers to be able to self-serve their feature management?
- What is the role of the data engineer in supporting those interfaces?
- What are the communication/collaboration channels that are necessary to make the overall process a success?
- From an implementation/architecture perspective, what are the patterns that you have seen teams build around for feature development/serving?
- What are the most interesting, innovative, or unexpected ways that you have seen feature platforms used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature engineering?
- What are the resources that you find most helpful in understanding and designing feature platforms?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Rudderstack:  Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)
