Data Science at Home

Francesco Gadaleta
undefined
Sep 4, 2020 • 18min

Testing in machine learning: checking deeplearning models (Ep. 118)

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning. We cover testing with deep learning (neuron coverage, threshold coverage, sign change coverage, layer coverage, etc.), combinatorial testing and their practical aspects. On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more. If you want to meet the tribe, tune in september 15th to the live@manning rust conference.    
undefined
Aug 29, 2020 • 20min

Testing in machine learning: generating tests and data (Ep. 117)

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning.   On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more. If you want to meet the tribe, tune in september 15th to the live@manning rust conference.    
undefined
Aug 12, 2020 • 19min

Why you care about homomorphic encryption (Ep. 116)

After deep learning, a new entry is about ready to go on stage. The usual journalists are warming up their keyboards for blogs, news feeds, tweets, in one word, hype. This time it's all about privacy and data confidentiality. The new words, homomorphic encryption.   Join and chat with us on the official Discord channel.   Sponsors This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.   References Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector IBM Fully Homomorphic Encryption Toolkit for Linux
undefined
Aug 3, 2020 • 20min

Test-First machine learning (Ep. 115)

In this episode I speak about a testing methodology for machine learning models that are supposed to be integrated in production environments. Don't forget to come chat with us in our Discord channel   Enjoy the show!   -- This episode is supported by Amethix Technologies.   Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.
undefined
Jul 26, 2020 • 19min

GPT-3 cannot code (and never will) (Ep. 114)

The hype around GPT-3 is alarming and gives and provides us with the awful picture of people misunderstanding artificial intelligence. In response to some comments that claim GPT-3 will take developers' jobs, in this episode I express some personal opinions about the state of AI in generating source code (and in particular GPT-3).   If you have comments about this episode or just want to chat, come join us on the official Discord channel.     This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.
undefined
Jul 22, 2020 • 21min

Make Stochastic Gradient Descent Fast Again (Ep. 113)

There is definitely room for improvement in the family of algorithms of stochastic gradient descent. In this episode I explain a relatively simple method that has shown to improve on the Adam optimizer. But, watch out! This approach does not generalize well. Join our Discord channel and chat with us.   References More descent, less gradient Taylor Series  
undefined
Jul 19, 2020 • 21min

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code. The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context.  In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts. Don't forget to join our Discord channel and comment previous episodes or propose new ones.   This episode is supported by Amethix Technologies Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence.   References Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/ Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin Dask advanced parallelism for analytics https://dask.org/ Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray RAPIDS - GPU data science https://rapids.ai/
undefined
Jul 3, 2020 • 37min

[RB] It’s cold outside. Let’s speak about AI winter (Ep. 111)

In this episode I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future. Join us to our Discord channel to discuss your favorite episode and propose new ones.   This episode is brought to you by Protonmail Click on the link in the description or go to protonmail.com/datascience and get 20% off their annual subscription.
undefined
Jun 29, 2020 • 24min

Rust and machine learning #4: practical tools (Ep. 110)

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly. To make a comparison with the Python ecosystem I will cover frameworks for linear algebra (numpy), dataframes (pandas), off-the-shelf machine learning (scikit-learn), deep learning (tensorflow) and reinforcement learning (openAI). Rust is the language of the future. Happy coding!  Reference BLAS linear algebra https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms Rust dataframe https://github.com/nevi-me/rust-dataframe Rustlearn https://github.com/maciejkula/rustlearn Rusty machine https://github.com/AtheMathmo/rusty-machine Tensorflow bindings https://lib.rs/crates/tensorflow Juice (machine learning for hackers) https://lib.rs/crates/juice Rust reinforcement learning https://lib.rs/crates/rsrl
undefined
Jun 22, 2020 • 24min

Rust and machine learning #3 with Alec Mocatta (Ep. 109)

In the 3rd episode of Rust and machine learning I speak with Alec Mocatta. Alec is a +20 year experience professional programmer who has been spending time at the interception of distributed systems and data analytics. He's the founder of two startups in the distributed system space and author of Amadeus, an open-source framework that encourages you to write clean and reusable code that works, regardless of data scale, locally or distributed across a cluster. Only for June 24th, LDN *Virtual* Talks June 2020 with Bippit (Alec speaking about Amadeus)  

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app