Guest Catherine Nelson, author of 'Software Engineering for Data Scientists', discusses the importance of data scientists learning software engineering principles. Topics include transitioning to production-ready code, roles in data science, challenges in model evaluation, and the continuous learning journey in data science.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Data scientists need to learn software engineering principles for code quality and readability.
Transitioning from data science to ML engineering requires a shift towards efficiency and standardization in coding practices.
Deep dives
Evolving Skill Sets for Data Scientists
Data scientists are urged to expand their expertise beyond their core tasks and understand system operations comprehensively. The podcast highlights the necessity for data scientists to enhance their skills to grasp system functionality holistically, shifting from a myopic view to a broader understanding. It emphasizes the significance of leveling up in areas like understanding APIs, version control with Git, and system security. Developing a wider skill set ensures data scientists can contribute effectively to the entire project cycle.
Importance of Learning Code Quality
The episode stresses the importance of data scientists grasping software engineering principles, especially code quality and best practices. The guest shares personal experiences to underscore the critical role of clean code in data science projects. Understanding clean coding practices and methods like object-oriented programming helps data scientists produce reliable and scalable code, avoiding technical debt.
Navigating Data Science to ML Engineering Transition
Transitioning from data science to machine learning engineering necessitates a shift in mindset towards standardization and efficiency. The process involves focusing on refining coding skills, understanding scalability requirements, and prioritizing robust production outcomes. The shift entails moving from exploratory work in data science to reliability and scalability in engineering, emphasizing the need for efficient and standardized coding practices.
Challenges and Future of Data Science Evaluation
The podcast delves into the evolving landscape of data science evaluation, especially with the rise of large language models (LLMs). Evaluating LLMs poses significant difficulties due to their versatility and varying performance based on prompt alterations. Addressing this evaluative challenge remains a critical aspect requiring a deep understanding of both statistical principles and machine learning model mechanics. The discourse anticipates a convergence of analytical and machine learning skills to tackle complex evaluation tasks in data science.
Catherine Nelson is a freelance data scientist and writer. She is currently working on the forthcoming O’Reilly book "Software Engineering for Data Scientists”.
Why All Data Scientists Should Learn Software Engineering Principles // MLOps podcast #245 with Catherine Nelson, a freelance Data Scientist.
A big thank you to LatticeFlow AI for sponsoring this episode! LatticeFlow AI - https://latticeflow.ai/
// Abstract
Data scientists have a reputation for writing bad code. This quote from Reddit sums up how many people feel: “It's honestly unbelievable and frustrating how many Data Scientists suck at writing good code.” But as data science projects grow, and because the job now often includes deploying ML models, it's increasingly important for DSs to learn fundamental SWE principles such as keeping your code modular, making sure your code is readable by other people and so on. The exploratory nature of DS projects means that you can't be sure where you will end up at the start of a project, but there's still a lot you can do to standardize the code you write.
// Bio
Catherine Nelson is the author of "Software Engineering for Data Scientists", a guide for data scientists who want to level up their coding skills, published by O'Reilly in May 2024. She is currently consulting for GenAI startups and providing mentorship and career coaching to data scientists. Previously, she was a Principal Data Scientist at SAP Concur. She has extensive experience deploying NLP models to production and evaluating ML systems, and she is also co-author of the book "Building Machine Learning Pipelines", published by O'Reilly in 2020. In her previous career as a geophysicist, she studied ancient volcanoes and explored for oil in Greenland. Catherine has a PhD in geophysics from Durham University and a Masters of Earth Sciences from Oxford University.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
Software Engineering for Data Scientists book by Catherine Nelson:
https://learning.oreilly.com/library/view/software-engineering-for/9781098136192/https://www.amazon.com/Software-Engineering-Data-Scientists-Notebooks/dp/1098136209
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Catherine on LinkedIn: https://www.linkedin.com/in/catherinenelson1/
Timestamps:
[00:00] Catherine's preferred coffee
[00:15] Takeaways
[02:38] Meeting magic: Embracing serenity
[06:23] The Software Engineering for Data Scientists book
[10:41] Exploring ideas rapidly
[12:52] Bridging Data Science gaps
[16:17] Data poisoning concerns
[18:26] Transitioning from a data scientist to a machine learning engineer
[21:53] Rapid Prototyping vs Thorough Development
[23:45] Data scientists take ownership
[25:53] Data scientists' role balance
[30:30] Understanding system design process
[36:00] Data scientists and Kubernetes
[41:33 - 43:03] LatticeFlow AI Ad
[43:05] The Future of Data Science
[45:09] Data scientists analyzing models
[46:46] Tools gaps in prompt tracking
[50:44] Learnings from writing the book
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode