The Effective Data Scientist

Alexander Schacht and Paolo Eusebi

Do you want to boost your career as a data scientist? Our podcast helps you in achieving this by teaching you relevant knowledge about all the different aspects of becoming a more effective data scientist.

Episodes

Mentioned books

Jul 13, 2023 • 46min

A Picture Says More Than 1000 Tables (Episode 14)

Jun 29, 2023 • 17min

Writing Reproducible Reports using Quarto (Episode 13)

Discussion with Paolo and Thomas Communicating data is so important! Quarto is a fantastic tool for writing reproducible reports using literate programming. Literate programming allows us to incorporate documentation and code in the same program. The data science community has embraced this idea by adopting Rmarkdown and Jupyter Notebooks. Using Quarto efficiently, you can create parametrized reports, write scientific publications, and build data-driven slides. Enjoy this super-interesting conversation with Thomas Neitmann, and be an effective data scientist!

Jun 15, 2023 • 22min

Sharing your Code with R Packages (Episode 12)

Resources: R Packages (2e)

Jun 1, 2023 • 22min

How to Effectively Structure Data Science Projects in R (Episode 11)

In this episode, Paolo and Thomas dive into the fundamental principles for a well-structured data science project. These include practical advice on: • organizing files into folders, • documenting and commenting code, • using version control systems and much more. Although the episode focuses on applying these fundamental principles in R projects, you want to apply the same principles to any Data Science project, regardless of the language used. Further resources: Advanced R 2nd Ed. (http://adv-r.hadley.nz) R for Data Science 2nd Ed. (http://(r4ds.hadley.nz)

May 18, 2023 • 16min

Dichotomization and Proportional Odds Model (Episode 10)

In this episode, we move from the logistic regression model to proportional odds model, with emphasis on interpretation and the checking of assumptions (visually and analytically). We also speak about the opportunities and challenges of dealing with the dichotomization of ordinal or continuous variables. Resources: ● McCullagh, Peter, and John A. Nelder. Generalized linear models. Routledge, 1983. ● Agresti, Alan. Categorical data analysis. John Wiley & Sons, 2003. ● Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Chapman and Hall/CRC, 2016. (http://https://julianfaraway.github.io/faraway/ELM/)

May 8, 2023 • 16min

Logistic regression (Episode 9)

Logistic regression is a beautiful tool for modeling a binary dependent variable, although many more complex extensions exist. In the show, we will speak about the generalized linear model family, logit and probit functions, interpretations, and practicalities. Resources: ● McCullagh, Peter, and John A. Nelder. Generalized linear models. Routledge, 1983. ● Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Chapman and Hall/CRC, 2016. (http://https://julianfaraway.github.io/faraway/ELM/)

Apr 21, 2023 • 34min

3 steps to make your research more reproducible with Heidi Seibold (Episode 8)

Creating reproducible research is crucial for data scientists as it ensures transparency, understanding, and accuracy in the research process. Not only does it help others understand your work, but it also allows for the reproduction and verification of your results in the future. Heidi Seibold, an expert in reproducible research, suggests three steps for achieving reproducibility: Document everything Develop reusable code, and Share results with others. By following these steps, you can ensure that your research is reproducible and accessible to anyone who needs it. Share this resource with your colleagues who want to enhance the quality of their research!

Apr 5, 2023 • 31min

The art of communicating data with Hana Khan

Alexander interviewed Hana Khan about her path from being a data analyst to a data visualizer. Hana runs Hanalytx, her own company, which is specialized in helping others in presenting and visualizing data. Hana also runs the Art of Communicating Data podcast. In this episode, Hana and Alexander discussed super interesting topics like sources of inspiration, optimal workflow, and appropriate tools for producing great data visualizations.

Mar 10, 2023 • 50min

Bayesian inference and probabilistic programming

Interview with Alex Andorra Interviewing Alex Andorra about bayesian inference, probabilistic programming, and more was a pleasure.Alex is a data scientist and modeler at the PyMC Labs consultancy. He's also an open-source enthusiast and core contributor to the python packages PyMC and ArviZ. Alex is also a contributor and instructor in the "Intuitive Bayes Introductory Course". This self-paced course is designed for data scientists and developers, where you'll learn Bayesian modeling with code, not math.Alex also runs the amazing "Learning Bayesian Statistics" podcast!If you love Python and Bayesian inference, catch this episode. How Alex began his path on statistical modeling and how he finds this a challenging, versatile, and creative How Alex and his agency do with clients using multilevel regression and post-stratification and tracking opinion through time How the community around PMYC is growing and contributing to development in an open source format Alex hosts a podcast that covers a wide variety of topics, including political elections, healthcare, neuroscience, and the market Alex's suggestions of what books to read and how to effectively contribute to open source projects, and stay active in the community What are the benefits of joining online communities for learning and more.. Listen to this episode now and share this with your friends and colleagues!

Feb 9, 2023 • 22min

Everything to know to write programs like a pro - Principles for good programming (Episode 5)

Interview with Shafi Chowdhury Click here to get the quick guide! Shafi Chowdhury This image has an empty alt attribute; its file name is shaffi.webp He has have over 20 years of experience as a statistical programmer in the Pharma industry. He worked for Pharma companies and CROs across Europe in many different therapeutic areas and in all phases of clinical trials before setting up his own consultancy firm. He believes knowledge should be shared and therefore he is a regular presenter at PhUSE conferences and regularly attend many other conferences including PSI conferences for Statisticians in the Pharmaceutical Industry. He also provides bespoke training and have a website to allow users to learn just the module they need at that time. He specialise in reviewing processes and developing standards, tools, templates and macros to improve the expertise of individuals and efficiency of processes. As an independent consultant with all the proven experience behind him, he offers unbiased expert opinions which can be used by management to make their decisions. His aim is always to drive up Quality by Design. Specialties: Writing SAS programs to check, modify, analyse and report any kind of data. Developing client specific template programs and generic macros. Developing bespoke training programs to produce well rounded programmers within weeks.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner