Data Science at Home

Francesco Gadaleta

Cutting through AI bullsh*t.Come join the discussion on Discord! https://discord.gg/4UNKGf3

Episodes

Mentioned books

Aug 12, 2020 • 19min

Why you care about homomorphic encryption (Ep. 116)

After deep learning, a new entry is about ready to go on stage. The usual journalists are warming up their keyboards for blogs, news feeds, tweets, in one word, hype. This time it's all about privacy and data confidentiality. The new words, homomorphic encryption. Join and chat with us on the official Discord channel. Sponsors This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence. References Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector IBM Fully Homomorphic Encryption Toolkit for Linux

Aug 3, 2020 • 20min

Test-First machine learning (Ep. 115)

In this episode I speak about a testing methodology for machine learning models that are supposed to be integrated in production environments. Don't forget to come chat with us in our Discord channel Enjoy the show! -- This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Jul 26, 2020 • 19min

GPT-3 cannot code (and never will) (Ep. 114)

The hype around GPT-3 is alarming and gives and provides us with the awful picture of people misunderstanding artificial intelligence. In response to some comments that claim GPT-3 will take developers' jobs, in this episode I express some personal opinions about the state of AI in generating source code (and in particular GPT-3). If you have comments about this episode or just want to chat, come join us on the official Discord channel. This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Jul 22, 2020 • 21min

Make Stochastic Gradient Descent Fast Again (Ep. 113)

There is definitely room for improvement in the family of algorithms of stochastic gradient descent. In this episode I explain a relatively simple method that has shown to improve on the Adam optimizer. But, watch out! This approach does not generalize well. Join our Discord channel and chat with us. References More descent, less gradient Taylor Series

Jul 19, 2020 • 21min

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code. The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context. In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts. Don't forget to join our Discord channel and comment previous episodes or propose new ones. This episode is supported by Amethix Technologies Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence. References Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/ Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin Dask advanced parallelism for analytics https://dask.org/ Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray RAPIDS - GPU data science https://rapids.ai/

Jul 3, 2020 • 37min

[RB] It’s cold outside. Let’s speak about AI winter (Ep. 111)

In this episode I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future. Join us to our Discord channel to discuss your favorite episode and propose new ones. This episode is brought to you by Protonmail Click on the link in the description or go to protonmail.com/datascience and get 20% off their annual subscription.

Jun 29, 2020 • 24min

Rust and machine learning #4: practical tools (Ep. 110)

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly. To make a comparison with the Python ecosystem I will cover frameworks for linear algebra (numpy), dataframes (pandas), off-the-shelf machine learning (scikit-learn), deep learning (tensorflow) and reinforcement learning (openAI). Rust is the language of the future. Happy coding! Reference BLAS linear algebra https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms Rust dataframe https://github.com/nevi-me/rust-dataframe Rustlearn https://github.com/maciejkula/rustlearn Rusty machine https://github.com/AtheMathmo/rusty-machine Tensorflow bindings https://lib.rs/crates/tensorflow Juice (machine learning for hackers) https://lib.rs/crates/juice Rust reinforcement learning https://lib.rs/crates/rsrl

Jun 22, 2020 • 24min

Rust and machine learning #3 with Alec Mocatta (Ep. 109)

In the 3rd episode of Rust and machine learning I speak with Alec Mocatta. Alec is a +20 year experience professional programmer who has been spending time at the interception of distributed systems and data analytics. He's the founder of two startups in the distributed system space and author of Amadeus, an open-source framework that encourages you to write clean and reusable code that works, regardless of data scale, locally or distributed across a cluster. Only for June 24th, LDN *Virtual* Talks June 2020 with Bippit (Alec speaking about Amadeus)

Jun 19, 2020 • 27min

Rust and machine learning #2 with Luca Palmieri (Ep. 108)

In the second episode of Rust and Machine learning I am speaking with Luca Palmieri, who has been spending a large part of his career at the interception of machine learning and data engineering. In addition, Luca contributed to several projects closer to the machine learning community using the Rust programming language. Linfa is an ambitious project that definitely deserves the attention of the data science community (and it's written in Rust, with Python bindings! How cool??!). References Series Announcement - Zero to Production in Rust https://www.lpalmieri.com/posts/2020-05-10-announcement-zero-to-production-in-rust/ Zero To Production #0: Foreword https://www.lpalmieri.com/posts/2020-05-24-zero-to-production-0-foreword/ Taking ML to production with Rust: a 25x speedup https://www.lpalmieri.com/posts/2019-12-01-taking-ml-to-production-with-rust-a-25x-speedup/

Jun 17, 2020 • 22min

Rust and machine learning #1 (Ep. 107)

This is the first episode of a series about the Rust programming language and the role it can play in the machine learning field. Rust is one of the most beautiful languages I have ever studied so far. I personally come from the C programming language, though for professional activities in machine learning I had to switch to the loved and hated Python language. This episode is clearly not providing you with an exhaustive list of the benefits of Rust, nor its capabilities. For this you can check the references and start getting familiar with what I think it's going to be the language of the next 20 years. Sponsored This episode is supported by Pryml Technologies. Pryml offers secure and cost effective data privacy solutions for your organisation. It generates a synthetic alternative without disclosing you confidential data. References The Rust Programming Language Cookin' with Rust

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner