Tecton, MLOps experts with over 35 years of combined experience, discuss challenges in deploying ML models, evaluating ROI, bridging the gap between batch and streaming systems, maintaining consistency in processing, monitoring feature quality, building recommendation systems, and quantifying ML project value.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Having a centralized data warehouse or data lake is crucial for training accurate models and establishing a data steward to ensure data quality and reliability.
Real-time features provide immediate context for ML models, enhancing performance and user experience, and require integration of streaming infrastructure for up-to-date data.
Data scientists' preference for creating their own features hinders feature sharing, so organizations need standardized processes and tools for feature reuse and sharing.
Deep dives
Importance of Clean Data for ML Models in Production
One of the main challenges in deploying ML models in production is the need for clean data. Having a centralized data warehouse or data lake that preserves the historical data is crucial for training accurate models. The data scientist needs access to this clean data to train their models, and it is essential to establish a data steward to ensure data quality and reliability. Trust in data pipelines and features is crucial, and automated tools for monitoring data quality can help catch any issues or discrepancies that may arise.
Relevance of Real-Time Features
Real-time features play a vital role in ML applications. These features are data points that are only known at the time of inference, such as real-time transaction details or user context. Real-time data provides immediate context for ML models, enhancing their performance and relevance. Developing pipelines for real-time features requires the integration of streaming infrastructure, such as Kafka or Kinesis, to provide up-to-date data for real-time predictions. While batch features can be valuable, real-time features often have a stronger impact on applications' performance and user experience.
Challenges of Feature Reuse and Sharing
Feature reuse and sharing can be challenging due to various factors. Data scientists often prefer to create their own features instead of reusing existing ones, which can hinder feature sharing across the organization. Building trust in shared features is crucial, as data scientists need assurance that the features will remain reliable and available. Organizations need a standardized process and tooling to ensure feature pipelines' reusability, automate dependency management, and monitor feature quality. Establishing data governance, data stewardship, and feature catalogs can facilitate sharing and reuse across teams and applications.
The Importance of Reproducible Data Pipelines
Reproducible data pipelines are a key factor in scaling ML applications. Having a streamlined process for creating and maintaining data pipelines is essential. A repeatable process ensures that pipelines can be easily scaled for multiple use cases, reducing development time and eliminating redundancies. Organizations should prioritize data pipeline automation, including data gathering, transformation, and validation. A clear understanding of the required features and feature definitions is crucial. Starting with batch pipelines and gradually incorporating real-time capabilities can help organizations scale their ML applications effectively.
Getting Started with a Recommender System
When starting from scratch with a recommender system, it is vital to focus on data preparation and access. Ensuring data quality and having a clean data warehouse or data lake is crucial. Starting with a simple heuristic-based approach, like recommending top items based on past purchases, can provide initial results. Leveraging off-the-shelf recommendation APIs from cloud providers can help in the early stages. As the system grows, incorporating personalized ML models can enhance performance. Gradually adding real-time capabilities and scale can be achieved through managed services like SageMaker model serving.
MLOps Coffee Sessions Special episode with Tecton, Get your ML Application Into Production, sponsored by Tecton.
// Abstract
Getting an ML application into production is more difficult than most teams expect—but with the right preparation, it can be done efficiently! Join us for this exclusive roundtable, where 4 machine learning experts from Tecton will discuss some of the most common challenges and best practices to avoid them.
With over 35 years of combined experience in MLOps at companies like AWS, Google, Lyft, and Uber, and 15 years of experience at Tecton spent helping customers like FanDuel, Plaid, and HelloFresh getting ML models into production, the presenters will share how factors like organizational structure, use cases, tech stack, and more, can create different types of bottlenecks. They’ll also share best practices and lessons learned throughout their careers on how to overcome these challenges.
// Bio
Kevin Stumpf
Kevin co-founded Tecton where he leads a world-class engineering team that is building a next-generation feature store for operational Machine Learning. Kevin and his co-founders built deep expertise in operational ML platforms while at Uber, where they created the Michelangelo platform that enabled Uber to scale from 0 to 1000's of ML-driven applications in just a few years. Prior to Uber, Kevin founded Dispatcher, with the vision to build the Uber for long-haul trucking. Kevin holds an MBA from Stanford University and a Bachelor's Degree in Computer and Management Sciences from the University of Hagen. Outside of work, Kevin is a passionate long-distance endurance athlete.
Derek Salama
Derek is currently a Senior Product Manager at Tecton, where he is responsible for security, collaboration experience, and Feature Platform infrastructure. Prior to Tecton, Derek worked at Google and Lyft across both ML infrastructure and ML applications.
Eddie Esquivel
Eddie Esquivel is a Solutions Architect at Tecton, where he helps customers implement feature stores as part of their stack for operational ML. Prior to Tecton, Eddie was a Solutions Architect at AWS. He holds a Bachelor’s Degree in Computer Science & Engineering from the University of California, Los Angeles.
Isaac Cameron
Isaac Cameron is a Consulting Architect at Tecton. Prior to Tecton, he was a Principal Solutions Architect at Slalom Build, focusing on data and machine learning, where he built his own feature platform for a large U.S. airline and has enabled many organizations to build intelligent products leveraging operational ML.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Kevin on LinkedIn: https://www.linkedin.com/in/kevinstumpf/
Connect with Derek on LinkedIn: https://www.linkedin.com/in/dereksalama/
Connect with Eddie on LinkedIn: https://www.linkedin.com/in/eddie-esquivel-2016/
Connect with Isaac on LinkedIn: https://www.linkedin.com/in/isaaccameron/
Timestamps:
[00:00] Introduction to Kevin Stumpf, Derek Salama, Eddie Esquivel, and Isaac Cameron
[02:48] Challenges of traditional classical ML into production
[10:21] Infrastructure cost
[16:50] Bridging Business and Tech
[19:23] ML Infrastructure Essentials
[29:38] Integrated Batch and Stream
[35:12] Scaling AI from Zero
[36:23] Stacks red flags
[45:53] Tecton: Features Quality Monitoring
[49:06] Building Recommender System Tools
[53:19] Quantify business value in ML
[54:40] Wrap up
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode