Josh Tobin from Gantry shares insights on improving ML-powered products, discussing vendor relationships, applying software engineering principles to ML, data observability challenges, and reflections on early business development. The conversation also touches on navigating attention and distractions in management.
Importance of user interaction in model performance for ML-powered products.
Need for rigorous unit testing and monitoring of machine learning models.
Addressing data observability challenges in machine learning by focusing on detecting changes in data statistics and data corruption.
Deep dives
Evolution from Research to Entrepreneurship
Josh Tobin discusses his transition from ML research at OpenAI to founding Gantry, a machine learning tooling company. He highlights his work on teaching the math and algorithms behind training ML models and the challenges faced while working on machine learning advancements in robotics, emphasizing the need for practical applications in real-world scenarios.
Machine Learning Powered Products
Josh describes machine learning powered products as applications where machine learning models directly interact with end users. He explains the importance of considering user interaction in model performance, focusing on the need to address performance outliers and the value of collecting user feedback to enhance model learning. He emphasizes the distinct requirements and opportunities involved in developing machine learning powered products.
Approach to Testing Machine Learning Models
Josh discusses unit testing for machine learning models, emphasizing the need for rigorous testing at various stages of model development, including training and inference. He advocates for both fast and slow tests to ensure the integrity and performance of machine learning models. Josh also touches on the importance of maintaining version control and carefully monitoring model behavior to address issues proactively and ensure safe and effective machine learning applications.
The Challenge of Data Drift in Machine Learning Observability
Monitoring data entropy, detecting changes in data statistics, and addressing data corruption are crucial aspects of data observability in machine learning. While data drift, or changes in input distributions, is highlighted, the traditional approach of setting thresholds based on measures like KL divergence might not be the most effective. Instead, measuring outcomes in ML-powered products and using data drift as a diagnostic tool to understand degraded model performance are recommended strategies.
Challenges Faced by ML Teams in Delivering Value
Achieving value from machine learning initiatives involves overcoming various obstacles. Starting with unclear goals, inadequate data maturity, and challenges in model deployment often hinder ML teams' progress. Developing a balance between software engineering and ML expertise, creating interdisciplinary teams, establishing service level agreements, and focusing on shared goals can enhance the effectiveness of ML implementations and drive value for organizations.
Josh Tobin (Co-founder, CEO @ Gantry) joins the show to chat about the best ways to continuously improve your ML-powered products. Josh has a wealth of knowledge and experience in the field of AI and ML, and this will definitely be a fantastic discussion.