From physics PhD to data science leader, unexpected challenges in survey data, Python vs R, EDA best practices, building MLOps toolkit - Julia Silge - The Data Scientist Show #087
Mar 30, 2024
auto_awesome
Julia Silge, former astrophysicist turned data science leader, discusses challenges in survey data, Python vs. R, EDA best practices, and building MLOps tools. Topics include text analysis, balancing data science and engineering, and her journey from physics PhD to engineering manager at Posit PBC.
Transitioning from astrophysics to data science requires self-study and blogging to showcase skills.
Balancing data analysis and tool development is crucial for Julia's fulfillment in her leadership role.
MLOps tools focus on enhancing data science tasks, emphasizing model development and operationalization.
Deep dives
Julia's Transition to Data Science
Julia shares her journey transitioning from astrophysics to data science, emphasizing the importance of self-study and blogging to showcase her skills. She highlights her experience starting in the non-profit sector, leveraging data analysis skills to secure her first data science title, and eventually transitioning to a leadership role at Positive PBC.
Tool Building vs. Data Analysis
Julia discusses her role shift from a data scientist to engineering manager, outlining her passion for both data analysis and tool building. She explains her preference for a balance between analyzing data and developing effective tools, indicating that a role solely focused on one aspect would not be as fulfilling for her.
Building MLOps Tools
Julia details her current role in building MLOps tools at Positive PBC, emphasizing the shift in focus from data analysis to tool development. She shares insights on the importance of creating tools that enhance people's data science tasks, particularly focusing on model development and operationalization.
Challenges in Survey Data Analysis
Julia reflects on her time working on the Stack Overflow developer survey, highlighting the challenges when dealing with survey data. She discusses the difficulty of accurately representing diverse voices in the data, especially regarding gender representation, and the importance of responsibly collecting and analyzing survey data.
Python vs. R in Data Science
Julia provides insights on choosing between Python and R in data science, emphasizing the importance of personal productivity and specific task requirements. She notes the strengths of each language, such as R's robust statistical packages and Python's diverse tooling, encouraging individuals to use the language that aligns with their skills and project needs.
Julia Silge is an engineering manager at Posit PBC, formerly know as R-studio, where she leads a team of developers building open source software MLOps. Before Posit, she finished a PhD in astrophysics, worked for several years in the nonprofit space, and was a data scientist at Stack Overflow where some of her most public work involved the annual developer survey. We talked about MLOps tools, challenges in survey data, text analysis, and balancing her interests in data science and engineering.
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.