Cody Peterson, Senior Technical Product Manager at Voltron Data, discusses the importance of open standards in MLOps. Topics include challenges with scalability in data tools like Pandas, leveraging the Ibis project for big data processing, and the power of Apache Arrow in data systems. The conversation also covers transitioning between platforms, considerations for data system selection, and future plans for the Ibis project.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Open standards ease MLOps challenges with big data scalability.
IBIS project offers standardized data frame operations across multiple back ends.
Deep dives
Cody Peterson's Background and Work at Azure ML
Cody Peterson, a senior technical product manager at Bulldron Data, shared his experience of not drinking coffee and preferring Diet Coke for caffeine. He discussed his role at Azure ML as a product manager and detailed his work on various teams including data, ML training, and inferencing. Peterson emphasized the importance of handling end-to-end machine learning systems, particularly focusing on ML ops and how experience at Azure ML taught him about the vital role of data in software systems.
ML Ops Standards and Best Practices
Peterson delved into the topic of ML Ops standards and best practices, highlighting the lack of a standardized approach in the ML ops and traditional ML lifecycle space. He mentioned the ongoing debates such as the use of notebooks in production and the need for adaptable best practices based on specific scenarios and verticals. With insights from his days at Azure ML, he underscored the importance of understanding the data layer and managing data effectively in machine learning systems.
The Significance of Database Technology in ML Ops
Peterson discussed the evolving role of database technology, citing the convergence of vector databases and traditional databases. He emphasized learnings from Azure ML about data layer management, versioning, and auditing requirements in certain industries. Peterson highlighted the complexity involved in data handling across ML systems, noting the significance of data engineers in addressing advanced use cases and intricate data problems, especially in traditional ML systems.
The IBIS Project and its Evolution
Peterson transitioned to discussing the IBIS project, an open-source initiative focused on providing a standardized API for data frame operations across various back ends. He detailed the origins of IBIS, its approach to decoupling the data frame API from execution engines, and supporting multiple back ends like Duck DB, Polars, Snowflake, and more. Peterson highlighted the project's growth under Voltron Data's stewardship, mentioning significant enhancements like added back ends, data frame library capabilities, and stability improvements through recent refactoring efforts.
Cody Peterson has a diverse work experience in the field of product management and engineering. Cody is currently working as a Technical Product Manager at Voltron Data, starting from May 2023. Previously, they worked as a Product Manager at dbt Labs from July 2022 to March 2023.
MLOps podcast #234 with Cody Peterson, Senior Technical Product Manager at Voltron Data | Ibis project // Open Standards Make MLOps Easier and Silos Harder.
Huge thank you to Weights & Biases for sponsoring this episode. WandB Free Courses -http://wandb.me/courses_mlops
// Abstract
MLOps is fundamentally a discipline of people working together on a system with data and machine learning models. These systems are already built on open standards we may not notice -- Linux, git, scikit-learn, etc. -- but are increasingly hitting walls with respect to the size and velocity of data.
Pandas, for instance, is the tool of choice for many Python data scientists -- but its scalability is a known issue. Many tools make the assumption of data that fits in memory, but most organizations have data that will never fit in a laptop. What approaches can we take?
One emerging approach with the Ibis project (created by the creator of pandas, Wes McKinney) is to leverage existing "big" data systems to do the heavy lifting on a lightweight Python data frame interface. Alongside other open source standards like Apache Arrow, this can allow data systems to communicate with each other and users of these systems to learn a single data frame API that works across any of them.
Open standards like Apache Arrow, Ibis, and more in the MLOps tech stack enable freedom for composable data systems, where components can be swapped out allowing engineers to use the right tool for the job to be done. It also helps avoid vendor lock-in and keep costs low.
// Bio
Cody is a Senior Technical Product Manager at Voltron Data, a next-generation
data systems builder that recently launched an accelerator-native GPU query
engine for petabyte-scale ETL called Theseus. While Theseus is proprietary,
Voltron Data takes an open periphery approach -- it is built on and interfaces
through open standards like Apache Arrow, Substrait, and Ibis. Cody focuses on the Ibis project, a portable Python dataframe library that aims to be the
standard Python interface for any data system, including Theseus and over 20
other backends.
Prior to Voltron Data, Cody was a product manager at dbt Labs focusing on the open source dbt Core and launching Python models (note: models is a confusing term here). Later, he led the Cloud Runtime team and drastically improved the efficiency of engineering execution and product outcomes.
Cody started his carrer as a Product Manager at Microsoft working on Azure ML. He spent about 2 years on the dedicated MLOps product team, and 2 more years on various teams across the ML lifecycel including data, training, and inferencing.
He is now passionate about using open source standards to break down the silos and challenges facing real world engineering teams, where engineering
increasingly involves data and machine learning.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
Ibis Project: https://ibis-project.org
Apache Arrow and the “10 Things I Hate About pandas”: https://wesmckinney.com/blog/apache-arrow-pandas-internals/
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Cody on LinkedIn: https://linkedin.com/in/codydkdc
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode