This book provides a comprehensive introduction to the tidymodels framework, focusing on creating robust models using the tidyverse principles. It covers the entire modeling process, from data preparation to model tuning and evaluation, emphasizing good statistical practices and avoiding common pitfalls like overfitting.
The Programmer's Brain offers techniques rooted in cognitive science to enhance learning and thinking about code. It covers topics such as mastering new programming languages, speed reading code, and creating understandable codebases. The book helps programmers optimize their brain's natural processes to read code more easily, write code faster, and pick up new languages quickly.
This book introduces text mining techniques using the tidytext package, developed by the authors. It teaches how to apply tidy data principles to natural language processing (NLP) tasks, such as sentiment analysis, term frequency, and topic modeling. The book includes practical code examples and case studies to help readers generate insights from various text sources.
This book provides practical guidance on using text data for regression and classification tasks, covering preprocessing steps like tokenization and word embeddings, and applying algorithms such as regularized regression, support vector machines, and deep learning approaches. It is designed for data scientists and analysts looking to leverage text data in their modeling workflows.
Dr. Julia Silge, Engineering Manager at Posit, introduces the brand-new Positron IDE, perfect for exploratory data analysis and visualization. She also lays out her top picks for LLMs that boost coding efficiency and discusses when traditional NLP methods might be the smarter choice over LLMs. Plus, Julia highlights some must-know open-source libraries that make managing MLOps easier than ever. Tune in for insights that every data scientist, ML engineer, and developer will find useful.
This episode is brought to you by Gurobi, the Decision Intelligence Leader, and by ODSC, the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
In this episode you will learn:
• Overview of Posit and Positron IDE [05:20]
• How the needs of a data scientist differ from those of a software developer [10:54]
• How to contribute to the open-source Positron [19:50]
• MLOps and Vetiver: Tools for deploying and maintaining ML models [37:01]
• Natural Language Processing (NLP) and the Tidyverse approach [50:34]
• The role of AI and LLMs in data science education [1:24:18]
Additional materials: www.superdatascience.com/817