The Data Scientist Show cover image

The Data Scientist Show

Latest episodes

undefined
Jun 16, 2022 • 1h 58min

Weather forecasting with AI, Kaggle tips and tricks, dealing with missing data, deep learning with Jesper Dramsch, The Data Scientist Show #040

Jesper Dramsch is a scientist for machine learning at the European Centre for Medium-Range Weather forecasts. They have a phd in applied Machine Learning to Geoscience from Technical University of Denmark. They are a Kaggle Kernals Expert and TPU star, ranking at top 81/100k worldwide. We talked about weather forecasting, things they learned from Kaggle, how to deal with missing data and ourliers, deep learning, Keras vs Pytorch, XGBoost, their struggles as a phd student, working in the EU vs US. Follow @DalianaLiu for more updates on data science and this show. (00:01:27) how he got into in ML  (00:09:10) how he handled missing data  (00:28:34) Transformers are eating the world  (00:49:36) Hoover Loss is a fantastic metric to deal with extreme values  (00:54:48) his experience with Kaggle competition  (01:02:59) Kaggle tricks that helped his models perform better  (01:08:18) PyTorch vs Keras  (01:30:30) working in different countries and cultures  Resources shared by Jesper: The newsletter with missing data: https://buttondown.email/jesper/archive/towels-have-quite-a-dry-sense-of-humor/ The paper by Gael about missing data: https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giac013/6568998 The Huber Loss: https://en.wikipedia.org/wiki/Huber_loss Skill Scores: https://en.wikipedia.org/wiki/Forecast_skill Brier Skill in Weather: https://www.dwd.de/EN/ourservices/seasonals_forecasts/forecast_reliability.html CRPS Continuous Ranked Probability Score https://datascience.stackexchange.com/questions/63919/what-is-continuous-ranked-probability-score-crps ConvNext, Convnets for the 2020s: https://arxiv.org/abs/2201.03545 Transformers for ensemble forecasts: https://arxiv.org/abs/2106.13924 Books I recommend: https://www.amazon.com/shop/jesperdramsch/list/2DYS5KVR5TX0E Blog posts I wrote about these books: https://dramsch.net/tags/books/ Short I made about Test-Time Augmentation https://www.youtube.com/shorts/w4sAh9lKyls Their links: https://dramsch.net/links Their open PhD thesis: https://dramsch.net/phd Newsletter: https://dramsch.net/newsletter Twitter: https://dramsch.net/twitter Youtube: https://dramsch.net/youtube Linkedin: https://dramsch.net/linkedin Kaggle: https://dramsch.net/
undefined
Jun 8, 2022 • 1h 53min

Reinforcement learning common use cases, recommendation engine, productivity - Susan Shu Chang the data scientist show#039

Susan Shu Chang is a principal data scientist at clearco, helping ecommerce founders' by building machine learning-powered investing. In her previous role, she developed the company’s very first ML powered website recommender system, deployed to millions of customers, and created a custom OpenAI Gym environment for a reinforcement learning project in production. She is also the founder and developer of Quill Game Studios, selling ~10k copies of the debut game in 6 months. She has given talks at PyCon Canada,Toronto Machine Learning Summit (TMLS), and more. She writes about her career journey and learning on https://www.susanshu.com/ If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu Highlights  (00:00) Intro  (00:01:29) from economics to data science  (00:07:23) reinforcement learning (RL)  (00:20:00) recent reinforcement learning use cases  (00:27:28) reinforcement learning for social media's recommender system  (01:04:42) common mistakes when productionizing models  (01:08:30) principal data scientist's day-to-day (01:14:05) what productivity really means  (01:21:04) productivity tips  (01:41:48) books and blogs on productivity
undefined
May 31, 2022 • 2h 2min

User-centric data science, design thinking, from UX researcher to data science manager@Visa - Laura Gabrysiak - the data scientist show #038

Laura Gabrysiak is a senior manager of data products and solutions at Visa. Previously, she's a data scientist, building machine learning models and decision tools to enable Visa clients. She has a college degree in computational and linguistics and has masters in design thinking. She's building the local data science community in Miami, and a co-founder of our Ladies. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu Laura's Linkedin:https://www.linkedin.com/in/lauragabrysiak/ (00:02:43) her journey into data science  (00:20:28) anecdotes vs big data  (00:27:05) the power of small data  (00:30:41) design thinking key elements  (00:47:25) mindset shift from a user researcher to a data scientist  (01:00:51) how to improve customer engagement  (01:02:10) how to make data visualization effective  (01:27:21) mindset shift from an individual contributor to a manager  (01:40:43) advices for people who are on PIP 
undefined
May 24, 2022 • 2h 10min

A/B testing and growth analytics at Airbnb, building data science tools and metrics store with Nick Handel, the data scientist show#037

Nick Handel was a senior data scientist leading the launch of the data side of this Airbnb Trips and later built a team that designed aribnb’s end-to-end machine learning platform, bighead. Currently, he is the cofounder and CEO of Transform, he first centralized 'metrics store' that empowers data analysts to deliver insights. He was recognized as 30 under 30 by Forbes in 2018. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu Nick's Linkedin:https://www.linkedin.com/in/nicholashandel/ Highlights:  (00:00) intro and career journey  (00:10:58) common mistakes in A/B testing (00:25:48) how to do A/B testing deep dives (00:27:32) surprising A/B testing results (00:29:18) facts vs opinions (00:33:55) A/B testing best practices (00:55:01) how he built a new data schema for Airbnb Trips  (01:00:43) how to collect data when building data science tools (01:38:53) trend of data science tools 
undefined
7 snips
May 17, 2022 • 1h 51min

Becoming a superforecaster, decision science for better human predictions - Pavel Atanasov-the data scientist show#036

Pavel is a decision scientist and co-founder at Pytho, using decision science to measure and improve human judgment & prediction. He has a phd in psychology and decision science from the University of Pennsylvania, focusing on crowd predictions. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu Pavel's twitter: https://twitter.com/PavelDAtanasov Superforecasting book, based on the Good Judgment Project: https://www.amazon.com/Superforecasting-Science-Prediction-Philip-Tetlock/dp/0804136718 Blogs about forecasting:   Vox's Future Perfect series: https://www.vox.com/future-perfect Astral Codex Ten: https://astralcodexten.substack.com/ Highlights:  (00:01:10) how he got into decision science  (00:14:38) what makes someone a super forecaster  (00:16:20) three elements of becoming a super forecaster  (00:24:37) how to effectively update our opinions  00:30:05 how he designed experiments to find out what was a better system  (00:48:27) why humans sometimes are better than algorithm  (01:14:50) how to collect data and information better  (01:33:25) why you should quit  (01:42:30) the future of decision science  
undefined
May 10, 2022 • 1h 36min

Using AI to detect online abuse, from physics PhD to staff ML engineer@Linkedin, persuasion at work with James Verbus - the data scientist show #035

James Verbus is Staff Machine Learning Engineer at LinkedIn. He has a PhD in Physics from Brown university. He is the tech lead of the Anti-Scraping and Automation AI Team, working on protecting LinkedIn's Members from bots and abusive scripted behavior, pioneering the use of deep learning to detect abusive automated sequences of user activity (blog post). If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu (00:01:14) from physic to data science  (00:16:37) background of online abuse detection  (00:24:40) Isolation Forest Algorithm (00:42:59) his day-to-day as a staff ML Engineer  (00:52:57) how to persuade stakeholders  (00:58:17) how to build influence at work  (01:00:22) how he grew to staff engineer  (01:13:48) what he learned from his mentor 
undefined
May 5, 2022 • 2h 46min

The golden age of AI and neuroscience, brain computer interface (BCI), from academia to FAANG with Patrick Mineault - The Data Scientist Show #034

Patrick Mineault is a neural data scientist. He has worked at Google and Facebook after he did a postdoc at UCLA. He worked on Brain Computer Interface (BCI) at Facebook Reality Labs, building a BCI that allows you to type with your brain. He tweets about neuro-AI @patrickmineault, and writes a blog (https://xcorr.net) sharing his career journey and learnings along the way. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu How he got into data science (00:02:41) His work at Google on A/B testing (00:04:17) How he joined Facebook Reality Lab(00:23:53) Projects on neuro-AI and brain computer interface (BCI) (00:27:13) Skills needed for BCI research (00:34:37) How AI influence neuroscience (01:34:28) computer vision VS human vision (01:39:57) model vs data, nature vs nurture(01:45:32)
undefined
Apr 6, 2022 • 1h 25min

From biostatistician to the 'artist of data science', how he turned his life around, philosophy - Harpreet Sahota - The Data Scientist Show#033

Harpreet Sahota is a data scientist and ML developer advocate, he is also the host of “artist of the data science” podcast and weekly data science happy hours, he is the principal data science mentor at data science dream job. He is also a philosophy nerd. He had some struggles when he tried to get into data science, and today we’ll talk about his experience as a biostatistician, data scientist, lessons he learned from his journey and from mentoring other people, and how he turned his life around.  If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu Harpreet's Linkedin: https://www.linkedin.com/in/harpreetsahota204/?originalSubdomain=ca The artist of data science podcast: https://theartistsofdatascience.fireside.fm/
undefined
22 snips
Mar 31, 2022 • 2h 4min

How he built the best Covid forecasting model, lessons learned and how to improve model performance with Youyang Gu - The Data Scientist Show#032

Youyang Gu, creator of covid19-projections.com, shares how he built an accurate Covid forecasting model using the SCIR model. He discusses the challenges faced when working with Covid data, the process of adjusting and tweaking the model, and the inclusion of additional features like school reopenings. Gu's model gained attention, leading to its inclusion on the CDC's website. He explores the effectiveness of crowdsourcing and the importance of diverse sources in forecasting. Gu is currently working on a project analyzing Covid mortality and inequalities within and between countries.
undefined
Mar 24, 2022 • 1h 36min

Feature engineering, ML models in production, new trend for ML tools, day-to-day of a principal engineer with Willem Pienaar - The Data Scientist Show #031

Willem is the creator of Feast, an open-source feature store (feast.dev), building tools at the intersection of engineering, data, and ML. Currently, he work as a Principal engineer at Tecton, Leading the development of Feast, an open source feature store. Previously, he has worked in South Africa, Thailand, Singapore before he moved to San Francisco in the US. Today we’ll talk about machine learning in production, cool projects he worked, machine learning in startup and how to pick the right data science track for your career. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science. Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/ Daliana's Twitter: https://twitter.com/DalianaLiu Willem's Linkedin:https://www.linkedin.com/in/willempienaar/

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode