scikit-learn & data science you own (Practical AI #296)
Nov 19, 2024
auto_awesome
Yann Lechelle, CEO of Probable, and Guillaume Lemaitre, open-source engineer at Probable, dive into scikit-learn's impact on data science. They discuss its crucial role in machine learning across industries and the importance of open-source innovation. The duo shares insights on the challenges data scientists face, the new features of scikit-learn, including certification, and the future vision for sustainable open-source technologies. They emphasize community engagement and the delicate balance between corporate interests and community needs.
Probable aims to advance open-source projects like Scikit-Learn, focusing on maintaining its accessibility while ensuring sustainable funding through innovative models.
Scikit-Learn remains a vital tool in data science, showcasing its adaptability for various industries and plans to integrate newer technologies for enhanced usability.
Deep dives
Leveraging Postgres for AI Development
Postgres can serve as a powerful foundation for AI application development, enabling developers to utilize familiar tools for creating advanced systems. Timescale provides capabilities that allow developers to build AI applications using existing knowledge of Postgres, thus eliminating the need to learn entirely new technologies. This accessibility allows developers to elevate their skill set and tackle innovative projects without the steep learning curve associated with many AI frameworks. Open-source tools like PGAI simplify the process, providing tutorials and resources that help developers easily get started.
The Emergence of Probable and Scikit-Learn
Probable emerged from the need to commercialize and advance open-source projects like Scikit-Learn, which has become a cornerstone for data scientists globally. As a spinoff from Inria, the company aims to develop a suite of open-source technologies for data science, ensuring the continued evolution of foundational tools used to analyze datasets. Scikit-Learn's widespread adoption demonstrates its critical role in the field; it has been downloaded billions of times, underscoring its significance in machine learning and predictive analytics. The mission is to reinforce open-source contributions while evolving the tools to meet contemporary data science needs.
Sustaining Open-Source Integrity
Maintaining the integrity of open-source projects like Scikit-Learn becomes essential when navigating business models and funding avenues. Probable is committed to preserving Scikit-Learn's open-source status while exploring innovative ways to balance revenue generation with community support. The company leverages sponsorships and public funding, ensuring that the project remains accessible to all while providing valuable services to organizations. Mechanisms are in place to ensure that governance structures safeguard the project's mission and prevent any shift toward proprietary solutions.
Future of Data Science and Machine Learning
Looking forward, the relationship between traditional machine learning models like Scikit-Learn and the emergence of generative AI technologies is expected to continue evolving. Scikit-Learn remains a foundational tool for many practical applications, with its simplistic and transparent approach making it invaluable for a range of industries, including finance and healthcare. Initiatives are underway to integrate newer technologies and frameworks to enhance the user experience, address current challenges in the AI landscape, and foster adoption among data scientists. Ultimately, the goal is to remain adaptable and relevant in a rapidly changing technological landscape, ensuring that robust and reliable tools are accessible to all.
We are at GenAI saturation, so let’s talk about scikit-learn, a long time favorite for data scientists building classifiers, time series analyzers, dimensionality reducers, and more! Scikit-learn is deployed across industry and driving a significant portion of the “AI” that is actually in production. :probabl is a new kind of company that is stewarding this project along with a variety of other open source projects. Yann Lechelle and Guillaume Lemaitre share some of the vision behind the company and talk about the future of scikit-learn!
Changelog++ members save 9 minutes on this episode because they made the ads disappear. Join today!
Sponsors:
Timescale – Purpose-built performance for AI Build RAG, search, and AI agents on the cloud and with PostgreSQL and purpose-built extensions for AI: pgvector, pgvectorscale, and pgai.
WorkOS – A platform that gives developers a set of building blocks for quickly adding enterprise-ready features to their application. Add Single Sign-On (Okta, Azure, Google, Microsoft OAuth), sync users from any SCIM directory, HRIS integration, audit trails (SIEM), free magic link sign-in. WorkOS is designed for developers and offers a single, elegant interface that abstracts dozens of enterprise integrations. Learn more and get started at WorkOS.com