
The Effective Data Scientist
Do you want to boost your career as a data scientist? Our podcast helps you in achieving this by teaching you relevant knowledge about all the different aspects of becoming a more effective data scientist.
Latest episodes

Jul 13, 2023 • 46min
A Picture Says More Than 1000 Tables (Episode 14)

Jun 29, 2023 • 17min
Writing Reproducible Reports using Quarto (Episode 13)
Discussion with Paolo and Thomas
Communicating data is so important! Quarto is a fantastic tool for writing reproducible reports
using literate programming. Literate programming allows us to incorporate documentation and
code in the same program. The data science community has embraced this idea by adopting
Rmarkdown and Jupyter Notebooks. Using Quarto efficiently, you can create parametrized
reports, write scientific publications, and build data-driven slides.
Enjoy this super-interesting conversation with Thomas Neitmann, and be an effective data
scientist!

Jun 15, 2023 • 22min
Sharing your Code with R Packages (Episode 12)
Resources:
R Packages (2e)

Jun 1, 2023 • 22min
How to Effectively Structure Data Science Projects in R (Episode 11)
In this episode, Paolo and Thomas dive into the fundamental principles for a well-structured data science project. These include practical advice on:
• organizing files into folders,
• documenting and commenting code,
• using version control systems and much more.
Although the episode focuses on applying these fundamental principles in R projects, you want to apply the same principles to any Data Science project, regardless of the language used.
Further resources:
Advanced R 2nd Ed. (http://adv-r.hadley.nz)
R for Data Science 2nd Ed. (http://(r4ds.hadley.nz)

May 18, 2023 • 16min
Dichotomization and Proportional Odds Model (Episode 10)
In this episode, we move from the logistic regression model to proportional odds model, with emphasis on
interpretation and the checking of assumptions (visually and analytically). We also speak about the
opportunities and challenges of dealing with the dichotomization of ordinal or continuous variables.
Resources:
● McCullagh, Peter, and John A. Nelder. Generalized linear models. Routledge, 1983.
● Agresti, Alan. Categorical data analysis. John Wiley & Sons, 2003.
● Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and
nonparametric regression models. Chapman and Hall/CRC, 2016. (http://https://julianfaraway.github.io/faraway/ELM/)

May 8, 2023 • 16min
Logistic regression (Episode 9)
Logistic regression is a beautiful tool for modeling a binary dependent variable, although many more
complex extensions exist. In the show, we will speak about the generalized linear model family, logit and
probit functions, interpretations, and practicalities.
Resources:
● McCullagh, Peter, and John A. Nelder. Generalized linear models. Routledge, 1983.
● Faraway, Julian J. Extending the linear model with R: generalized linear, mixed effects and
nonparametric regression models. Chapman and Hall/CRC, 2016. (http://https://julianfaraway.github.io/faraway/ELM/)

Apr 21, 2023 • 34min
3 steps to make your research more reproducible with Heidi Seibold (Episode 8)
Creating reproducible research is crucial for data scientists as it ensures transparency, understanding, and accuracy in the research process. Not only does it help others understand your work, but it also allows for the reproduction and verification of your results in the future.
Heidi Seibold, an expert in reproducible research, suggests three steps for achieving reproducibility:
Document everything
Develop reusable code, and
Share results with others.
By following these steps, you can ensure that your research is reproducible and accessible to anyone who needs it. Share this resource with your colleagues who want to enhance the quality of their research!

Apr 5, 2023 • 31min
The art of communicating data with Hana Khan
Alexander interviewed Hana Khan about her path from being a data analyst to a data visualizer. Hana runs Hanalytx, her own company, which is specialized in helping others in presenting and visualizing data. Hana also runs the Art of Communicating Data podcast.
In this episode, Hana and Alexander discussed super interesting topics like sources of inspiration, optimal workflow, and appropriate tools for producing great data visualizations.

Mar 10, 2023 • 50min
Bayesian inference and probabilistic programming
Interview with Alex Andorra
Interviewing Alex Andorra about bayesian inference, probabilistic programming, and more was a pleasure.Alex is a data scientist and modeler at the PyMC Labs consultancy. He's also an open-source enthusiast and core contributor to the python packages PyMC and ArviZ. Alex is also a contributor and instructor in the "Intuitive Bayes Introductory Course". This self-paced course is designed for data scientists and developers, where you'll learn Bayesian modeling with code, not math.Alex also runs the amazing "Learning Bayesian Statistics" podcast!If you love Python and Bayesian inference, catch this episode.
How Alex began his path on statistical modeling and how he finds this a challenging, versatile, and creative
How Alex and his agency do with clients using multilevel regression and post-stratification and tracking opinion through time
How the community around PMYC is growing and contributing to development in an open source format
Alex hosts a podcast that covers a wide variety of topics, including political elections, healthcare, neuroscience, and the market
Alex's suggestions of what books to read and how to effectively contribute to open source projects, and stay active in the community
What are the benefits of joining online communities for learning
and more..
Listen to this episode now and share this with your friends and colleagues!

Feb 9, 2023 • 22min
Everything to know to write programs like a pro - Principles for good programming (Episode 5)
Interview with Shafi Chowdhury
Click here to get the quick guide!
Shafi Chowdhury
This image has an empty alt attribute; its file name is shaffi.webp
He has have over 20 years of experience as a statistical programmer in the Pharma industry. He worked for Pharma companies and CROs across Europe in many different therapeutic areas and in all phases of clinical trials before setting up his own consultancy firm. He believes knowledge should be shared and therefore he is a regular presenter at PhUSE conferences and regularly attend many other conferences including PSI conferences for Statisticians in the Pharmaceutical Industry. He also provides bespoke training and have a website to allow users to learn just the module they need at that time.
He specialise in reviewing processes and developing standards, tools, templates and macros to improve the expertise of individuals and efficiency of processes. As an independent consultant with all the proven experience behind him, he offers unbiased expert opinions which can be used by management to make their decisions. His aim is always to drive up Quality by Design.
Specialties: Writing SAS programs to check, modify, analyse and report any kind of data.
Developing client specific template programs and generic macros.
Developing bespoke training programs to produce well rounded programmers within weeks.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.