Why he created Pandas, the future of data systems, why he left his CTO role to become a chief architect - Wes McKinney - The Data Scientist Show #086
Mar 22, 2024
auto_awesome
Wes McKinney, creator of Pandas, shares how it started, benefits of user-friendly data tools, challenges in building products, transitioning to a top architect role, improving open source, and using ChatGPT for learning, with a focus on future impact goals and career excitement.
Pandas was born from a need for better Python data tools after the 2007 financial crisis.
Voltron Data aims to boost innovation in Apache Arrow for hardware acceleration technologies.
Modular and composable data tools enhance collaboration and system building in varied environments.
Deep dives
The Origins and Motivation Behind Panda's Library
Wes McKinney, the co-creator of Panda's Library, revealed that the inspiration for developing Panda's stemmed from his early career frustrations in quant finance during the 2007 financial crisis. Observing the lack of data analysis tools in Python similar to those found in R, McKinney sought to build Python tools to facilitate data processing. As Panda's gained popularity among his colleagues, he persuaded his company to open source it in 2009.
Transition from Graduate School to Full-Time Work on Pandas
In 2010, Wes McKinney decided to drop out of grad school to focus full-time on enhancing the Panda's Library, a pivotal moment in Python's importance in statistical computing and data science. He dedicated over a year to the project, authored 'Python for Data Analysis,' and expanded the open-source community. By 2013, McKinney and Chang She established a company around Pandas but later refocused on other projects.
Evolution to Voltron Data and Arrow Project
In 2011, realizing a need for making Python pivotal in data science, Wes McKinney co-founded Voltron Data to drive innovation in Apache Arrow, a universal data interchange format enhancing computing engines. The ambitious vision aimed to support hardware acceleration technologies like GPUs. McKinney emphasized building for a modular and composable data stack to leverage open-source contributions.
Focus on Composable Data Tools and Portability
Addressing the necessity for modular and composable data processing tools, Wes McKinney underscores the value of reusability and standardized interfaces. He highlights the significance of enabling collaboration on shared software components and fostering a composable data stack to facilitate easier system building across various data processing environments. The emphasis is on open standards and interoperability.
Influence on Python Data Science Ecosystem and Future Ventures
Wes McKinney expresses a strong commitment to shaping the future of the open-source data science ecosystem. With investments in emerging technologies like Rust and involvement in Polyglot language support at Posit, McKinney focuses on sustainable open-source funding models. He aspires to leave a lasting impact by fostering sustainable contributions in a collaborative, modular, and open-source-driven environment.
Wes McKinney is the co-creator of pandas library and he is the cofounder of Voltron data. Currently he is a principal Architect at Posit and an investor in data systems.