

Data Science at Home
Francesco Gadaleta
Technology, AI, machine learning and algorithms. Come join the discussion on Discord!
https://discord.gg/4UNKGf3
https://discord.gg/4UNKGf3
Episodes
Mentioned books

Apr 13, 2021 • 32min
Learning and training in AI times (Ep. 148)
Is there a gap between life sciences and data science?
What's the situation when it comes to interdisciplinary research?
In this episode I am with Laura Harris, Director of Training for the Institute of Cyber-Enabled Research (ICER) at Michigan State University (MSU), and we try to answer some of those questions.
You can contact Laura at training@msu.edu or on LinkedIn

Apr 11, 2021 • 45min
You are the product [RB] (Ep. 147)
In this episode I am with George Hosu from Cerebralab
and we speak about how dangerous it is not to pay for the services you use, and as a consequence how dangerous it is letting an algorithm decide what you like or not.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.
Links
https://cerebralab.com
https://www.eugenewei.com/blog/2019/2/19/status-as-a-service

Apr 8, 2021 • 33min
Polars: the fastest dataframe crate in Rust - with Ritchie Vink (Ep. 146)
Ritchie Vink, author of Polars, the fastest dataframe library in Rust, discusses his background in data science and how he started working on Polars. They also explore the challenges of designing a new data manipulation library, the significance of thread safety and parallelism in Rust, the use of Apache Arrow as a backhand for communication, and the importance of SIMD instructions in optimizing operations.

Mar 26, 2021 • 30min
Apache Arrow, Ballista and Big Data in Rust with Andy Grove (Ep. 145)
Do you want to know the latest in big data analytics frameworks? Have you ever heard of Apache Arrow? Rust? Ballista? In this episode I speak with Andy Grove one of the main authors of Apache Arrow and Ballista compute engine.
Andy explains some challenges while he was designing the Arrow and Ballista memory models and he describes some amazing solutions.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.
References
https://arrow.apache.org/
https://ballistacompute.org/
https://github.com/ballista-compute/ballista

Mar 19, 2021 • 32min
Pandas vs Rust (Ep. 144)
Pandas is the de-facto standard for data loading and manipulation. Python is the de-facto programming language for such operations. Rust is the underdog. Or is it?
In this episode I am showing you why that is no longer the case.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
Useful Links
https://github.com/haixuanTao/Data-Manipulation-Rust-Pandas
https://github.com/ritchie46/polars
https://github.com/rust-ndarray/ndarray

Mar 13, 2021 • 15min
Concurrent is not parallel - Part 2 (Ep. 143)
In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems.
Rock-star data scientists must know how concurrency works and when to use it IMHO.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.
Useful Links
http://web.mit.edu/6.005/www/fa14/classes/17-concurrency/
https://doc.rust-lang.org/book/ch16-00-concurrency.html
https://urban-institute.medium.com/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba

Mar 10, 2021 • 32min
Concurrent is not parallel - Part 1 (Ep. 142)
In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems.
Rock-star data scientists must know how concurrency works and when to use it IMHO.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Mar 2, 2021 • 25min
Backend technologies for machine learning in production (Ep. 141)
This is one of the most dynamic and fascinating topics: API technologies for machine learning.
It's always fun to build ML models. But how about serving them in the real world? In this episode I speak about three must-know technologies to place your model behind an API.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

Feb 22, 2021 • 45min
You are the product (Ep. 140)
In this episode I am with George Hosu from Cerebralab
and we speak about how dangerous it is not to pay for the services you use, and as a consequence how dangerous it is letting an algorithm decide what you like or not.
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.
Links
https://cerebralab.com
https://www.eugenewei.com/blog/2019/2/19/status-as-a-service

Feb 15, 2021 • 37min
How to reinvent banking and finance with data and technology (Ep. 139)
The financial system is changing. It is becoming more efficient and integrated with many more services making our life more... digital. Is the old banking system doomed to fail? Or will it just be disrupted by the smaller players of the fintech industry?
In this episode we answer some of these fundamental questions with Alessandro E. Hatami from Pacemakers
Subscribe to the Newsletter and come chat with us on the official Discord channel
Our Sponsors
This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey.
To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience
Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.