Nicolas Mauti, an MLOps Engineer from Lyon, shares his expertise in transforming BigQuery into a powerful feature management system for AI/ML applications. He discusses the challenges of feature versioning, monitoring, and data quality that his team overcame at Malt. The conversation explores how separating feature creation from model coding streamlined their workflows and enhanced performance. Nicolas also emphasizes the importance of effective data lineage tracking and retraining models to ensure consistent accuracy across machine learning projects.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Nicolas Mauti discusses how utilizing BigQuery as a feature store centralizes feature computation, enhancing consistency and reliability across data science teams.
The implementation of BigQuery decouples feature engineering from model training, allowing data scientists to refine features independently, thereby improving productivity and workflows.
Deep dives
Implementation of BigQuery as a Feature Store
BigQuery was chosen as a feature store to improve the efficiency of model training at the company. Previously, features were created and computed ad hoc, which led to inconsistencies and difficulties in reproducing results across different data science teams. With the implementation of BigQuery, the data scientists now have a centralized table that consolidates feature computations, allowing for reliability and consistency. This shift enables teams to access and use features effectively, organized by timestamps, which simplifies dataset creation for model training.
Decoupling of Feature Engineering and Model Training
One of the significant advantages of using BigQuery is the decoupling of feature engineering from model training processes. This separation allows data scientists to focus on creating and refining features independently of model building, enhancing their productivity and team dynamics. The ability to designate clear phases for feature creation and model training has led to improved workflows, as teams can allocate dedicated time for feature experimentation without impacting model performance. This distinction ultimately helps in fostering a better understanding of the differing tasks involved in machine learning.
Challenges and Solutions Faced After Implementation
While the implementation of BigQuery has brought many benefits, it also introduced challenges, especially regarding the addition of new features and backfilling historical data. A custom solution was developed to automate backfilling of features when new attributes are added, ensuring historical context is maintained. This process involves running SQL scripts within a structured ingestion and transformation pipeline to populate feature tables accurately. Moreover, the management of versioning and change logs allows data scientists to track alterations to features over time, further aiding in data integrity.
Future Considerations and Evolving Needs
Looking ahead, the team recognizes the potential limitation of BigQuery for scenarios requiring real-time feature computations and the management of large data sets. As data volume increases or if the need for live feature updates arises, alternative architectures or databases may become necessary. The use of tools like Kafka for live features is a possibility being explored, although the specific direction remains undecided. The insights gained from the current architecture will guide future adjustments and strategies, ensuring they remain prepared to scale effectively as needs evolve.
Nicolas Mauti is an MLOps Engineer from Lyon (France), Working at Malt.
BigQuery Feature Store // MLOps Podcast #255 with Nicolas Mauti, Lead MLOps at Malt.
// Abstract
Need a feature store for your AI/ML applications but overwhelmed by the multitude of options? Think again. In this talk, Nicolas shares how they solved this issue at Malt by leveraging the tools they already had in place. From ingestion to training, Nicolas provides insights on how to transform BigQuery into an effective feature management system.
We cover how Nicolas' team designed their feature tables and addressed challenges such as monitoring, alerting, data quality, point-in-time lookups, and backfilling. If you’re looking for a simpler way to manage your features without the overhead of additional software, this talk is for you. Discover how BigQuery can handle it all!
// Bio
Nicolas Mauti is the go-to guy for all things related to MLOps at Malt. With a knack for turning complex problems into streamlined solutions and over a decade of experience in code, data, and ops, he is a driving force in developing and deploying machine learning models that actually work in production.
When he's not busy optimizing AI workflows, you can find him sharing his knowledge at the university. Whether it's cracking a tough data challenge or cracking a joke, Nicolas knows how to keep things interesting.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
Nicolas' Medium - https://medium.com/@nmauti
Data Engineering for AI/ML Conference: https://home.mlops.community/home/events/dataengforai
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Nicolas on LinkedIn: https://www.linkedin.com/in/nicolasmauti/?locale=en_US
Timestamps:
[00:00] Nicolas' preferred beverage
[00:35] Takeaways
[02:25] Please like, share, leave a review, and subscribe to our MLOps channels!
[02:57] BigQuery end goal
[05:00] BigQuery pain points
[10:14] BigQuery vs Feature Stores
[12:54] Freelancing Rate Matching issues
[16:43] Post-implementation pain points
[19:39] Feature Request Process
[20:45] Feature Naming Consistency
[23:42] Feature Usage Analysis
[26:59] Anomaly detection in data
[28:25] Continuous Model Retraining Process
[30:26] Model misbehavior detection
[33:01] Handling model latency issues
[36:28] Accuracy vs The Business
[38:59] BigQuery cist-benefit analysis
[42:06] Feature stores cost savings
[44:09] When not to use BigQuery
[46:20] Real-time vs Batch Processing
[49:11] Register for the Data Engineering for AI/ML Conference now!
[50:14] Wrap up
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode