📊 Data-Driven Decisions: ML in E-Commerce Forecasting with Federico Bacci
Aug 15, 2024
auto_awesome
Federico Bacci, a data scientist and ML engineer at Bol, shares his expertise in deploying machine learning models for e-commerce forecasting. He delves into the importance of model explainability and feature engineering over mere model complexity. The discussion highlights the challenges of integrating feedback from stakeholders and the intricacies of demand forecasting. Federico argues that large language models aren't always the answer, advocating instead for tailored solutions that effectively address specific business needs.
Federico Bacci emphasizes the importance of continuous testing and validation of deployed ML models to maintain production readiness and reliability.
He highlights the significance of integrating stakeholder feedback into the ML workflow to enhance model performance and user satisfaction.
Deep dives
The Importance of Machine Learning Production
In machine learning production, the focus is on delivering accurate results to address real business problems. This process involves publishing the outputs of forecasting models, particularly numerical and regression tasks, ensuring that they remain functional and reliable in a dynamic environment. The team emphasizes that maintaining production readiness requires continuous availability and rigorous testing of all deployed models. Any updates or new model deployments are carefully validated to uphold the integrity of the results provided to stakeholders.
Ownership of the Machine Learning Pipeline
The machine learning team at the company maintains full ownership of the entire pipeline, which is perceived by stakeholders as the definitive source of truth. This ownership ensures that input from relevant business teams is seamlessly integrated into the data utilized for model training. The team manages everything from initial discussions with business units to model development and deployment, avoiding the common分task segregation seen in other organizations. This integrated approach enables a more streamlined workflow and stronger collaboration across departments, leading to more efficient problem-solving.
Deployment Strategies: Balancing User Volume and Feedback
The team employs specific deployment strategies based on user volume, utilizing shadow runs to evaluate new models without exposing them to all internal users simultaneously. This method enables the team to compare the performance of the new model against existing ones in real-time, while only merging successful changes into the production flow. By monitoring differences in outcomes, the team can catch any significant shifts in performance quickly and act accordingly. This nuanced approach is crucial since they often deal with a limited number of internal users, making traditional A/B testing less viable.
Feedback Loops and Human Insights
Establishing effective feedback loops is vital for enhancing model performance and user satisfaction, emphasizing the value of human input in the machine learning process. The team recognizes that users sometimes spot patterns in data that could easily be missed during analysis, providing insights that can lead to significant improvements in forecasting accuracy. They encourage open communication with stakeholders and have implemented forms to capture feedback efficiently, ensuring users feel involved in refining the outputs. This collaborative culture not only fosters trust but also aids in prioritizing feature requests based on direct user experiences.
In this episode, Dean speaks with Federico Bacci, a data scientist and ML engineer at Bol, the largest e-commerce company in the Netherlands and Belgium. Federico shares valuable insights into the intricacies of deploying machine learning models in production, particularly for forecasting problems. He discusses the challenges of model explainability, the importance of feature engineering over model complexity, and the critical role of stakeholder feedback in improving ML systems. Federico also offers a compelling perspective on why LLMs aren't always the answer in AI applications, emphasizing the need for tailored solutions. This conversation provides a wealth of practical knowledge for data scientists and ML engineers looking to enhance their understanding of real-world ML operations and challenges in e-commerce.
Join our Discord community: https://discord.gg/tEYvqxwhah
---
Timestamps:
00:00 Introduction and Background
01:59 Owning the ML Pipeline
02:56 Deployment Process
05:58 Testing and Feedback
07:40 Different Deployment Strategies
11:19 Explainability and Feature Importance
13:46 Challenges in Forecasting
22:33 ML Stack and Tools
26:47 Orchestrating Data Pipelines with Airflow
31:27 Exciting Developments in ML
35:58 Recommendations and Closing
Links
Dwarkesh podcast with Anthropic and Gemini team members – https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken
➡️ Federico Bacci on LinkedIn – https://www.linkedin.com/in/federico-bacci/
➡️ Federico Bacci on Twitter – https://x.com/fedebyes
🌐 Check Out Our Website! https://dagshub.com
Social Links:
➡️ LinkedIn: https://www.linkedin.com/company/dagshub
➡️ Twitter: https://x.com/TheRealDAGsHub
➡️ Dean Pleban: https://x.com/DeanPlbn
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode