
The Data Scientist Show - Daliana Liu
A deep dive into data scientists' day-to-day work, tools and models they use, how they tackle problems, and their career journeys. This podcast helps you grow a successful career in data science. Listening to an episode is like having lunch with an experienced mentor. Guests are data science practitioners from various industries, AI researchers, economists, and CTOs of AI companies. Host: Daliana Liu, an ex-Amazon senior data scientist with 180k followers on Linkedin.
Join 20k subscribers at www.dalianaliu.com to learn more about data science, career, and this show. Twitter @DalianaLiu.
Latest episodes

Nov 12, 2023 • 55min
Machine learning in cybersecurity, computer vision in sports, from business analyst to ML engineer - Betty Zhang - The Data Scientist Show #072
Betty Zhang is a data scientist currently working at a cloud security company, previously she was a data scientist at Amazon Web Services. Today we’ll talk about her computer vision projects in Sports, data science use cases in cyber security, from business major to data scientist, what’s her experience working in startups vs big tech companies. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Betty’s Linkedin: https://www.linkedin.com/in/betty-zhang-0bb63731/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana’s LinkedIn: https://www.linkedin.com/in/dalianaliu/
(00:00:00) Introduction
(00:01:21) Computer Vision Project in Sports at AWS
(00:12:28) Challenges in computer vision
(00:14:02) Time allocation for ML projects
(00:15:22) 3 key skills for computer vision
(00:17:20) From business analyst to ML engineer
(00:18:14) How she got her data scientist job through Linkedin
(00:21:32) How she got into Amazon
(00:22:17) Three tech skills needed during Amazon interviews
(00:26:11) Why she joined a Cyber Security startup
(00:27:22) Three cybersecurity use cases
(00:29:47) Anomaly detection
(00:30:40) ML for cybersecurity
(00:34:43) Tech stacks Amazon vs Startups
(00:39:35) Startups vs big tech
(00:45:56) Balance learning and impact
(00:48:35) Advice for new data scientists

17 snips
Nov 4, 2023 • 1h 4min
Stop abusing A/B testing, toxic experimentation culture, how to run A/B tests with rigor - Che Sharma - The Data Scientist Show #071
Che Sharma, former data scientist at Airbnb and founder of Eppo, discusses toxic behaviors in experimentation culture, A/B testing best practices, and A/B testing for ML models on The Data Scientist Show. Topics include statistical power, effect size, monitoring metrics, alternative methods to A/B testing, difference in differences method, and A/B testing in ML and AI.

10 snips
Oct 23, 2023 • 1h 16min
Academia vs. Industry for Machine Learning, Research at Uber AI Labs, ML for Wind Farms - Jason Yosinski - The Data Scientist Show #070
Jason Yosinski, founding member of Uber AI Labs and co-founder of WinscapeAI, discusses academia vs. industry in machine learning, challenges of understanding neural networks, ML for wind farms, and the significance of metrics in evaluating models. They also explore hobbies, personal development retreats, and the power and pitfalls of patterns in behavior.

25 snips
Sep 14, 2023 • 1h 26min
Ads forecasting at Netflix and Spotify, how to build your personal moat - Jeff Li - The Data Scientist Show #069
Jeff Li, a senior data scientist at Netflix and former data science manager at Spotify, discusses ads forecasting, career paths as a manager vs IC, and the culture differences at Spotify, Netflix, and Doordash. They also talk about the challenges of forecasting in finance and ads, detecting and accounting for seasonality and black swan events in advertising, transitioning from manager to senior data scientist, comparing company cultures, changes in tech stacks and data visualization tools, the future of forecasting, the importance of mentors in career growth, and the role of communication skills for data scientists.

19 snips
Aug 25, 2023 • 1h 14min
A/B testing at Airbnb, building next-gen experimentation platform at Eppo - Che Sharma - The Data Scientist Show #068
Che Sharma, former data scientist at Airbnb and founder of Eppo, talks about A/B testing best practices, A/B testing for ML models, and his career journey. They discuss successful A/B testing, interpreting and communicating test results, A/B testing best practices for ML models, centralizing experiment analysis, preparing data scientists for the future, developing communication skills, transitioning to a manager role, and the future of experimentation.

Aug 10, 2023 • 1h 55min
From data scientist@Meta to full-time YouTuber (500k+ sub), AI engineering, future of work - Tina Huang - The Data Scientist Show #067
We talked about self-learning, productivity, how Tina navigates her career change and how she thinks AI could change the future of work.
Tina's YouTube: www.youtube.com/@TinaHuang1
Lonely Octopus: www.lonelyoctopus.com
Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Tina Huang is a data scientist turned YouTube creator with 500k subscribers. She is the founder of Lonely Octopus, an online program helping people gain data science, AI, and freelancing skills. She originally studied pharmacology before transitioning into tech, completing a master's degree in computer science at UPenn.
(00:02:38) Transitioning from Data Science to Content Creation
(00:06:29) Preparing for Data Science Interviews
(00:10:59) Starting a YouTube Channel
(00:14:18) Building Multiple Income Streams
(00:17:35) Getting Started with AI Skills
(00:29:29) Advice for Starting YouTube
(00:34:47) Improving Storytelling Skills
(00:36:58) Overcoming Procrastination
(00:42:33) The Future of Work
(01:47:08) Looking to the Future
(01:26:49) Income Breakdown

Aug 1, 2023 • 1h 27min
Making LLMs hallucinate less, how to diagnose ML models, from PM in Google AI to CEO of Galileo - Vikram Chatterji - The Data Scientist Show #066
Vikram is the co-founder of Galileo – an AI diagnostics and explainability platform used by data science teams building NLP, LLMs and Computer Vision models across the Fortune 500 and high growth startups.
Prior to Galileo, Vikram led Product Management at Google AI, where his team built models for the Fortune 2000 across retail, financial services, healthcare and contact centers. He has a master degree from Carnegie Mellon University from the school of computer science. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Resources:LLM Studio: https://www.rungalileo.io/blog/announcing-llm-studio
Galileo: https://www.rungalileo.io/
Blog on LLM Hallucination: https://thesequence.substack.com/p/guest-post-stop-hallucinations-from
Vikram Chatterji’s LinkedIn: https://www.linkedin.com/in/vikram-chatterji/
"The Mom Test": https://www.amazon.com/The-Mom-Test-Rob-Fitzpatrick-audiobook/dp/B07RJZKZ7F
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:04:24) How he got into machine learning
(00:06:53) Diagnosing large language models
(00:09:56) Addressing model hallucination
(00:12:46) Metrics for measuring hallucination
(00:17:30) From Google AI to starting Galileo
(00:24:08) Developing LLMs and putting them into production
(00:32:51) Galileo's diagnostics and explainability platform
(00:43:16) Advice for data scientists when joining a startup

Jul 28, 2023 • 1h 53min
Data Science "Mix Martial Arts", applied re-inforcement learning, scaling AI workloads using Ray - Max Pumperla - The Data Scientist Show #065
Max Pumperla designed his own career path in data science. He is a freelance software engineer at AnyScale, and also a data science professor. We talked about reinforcement learning, open source contributions, Ray for data scientists, and his view on the data scientists role. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Max’s LinkedIn: https://www.linkedin.com/in/max-pumperla-a8099354/
Max's GitHub: https://github.com/maxpumperla
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:09:19) How he got a remote job through Twitter
(00:14:06) Introduction to Ray
(00:18:52) Reinforcement learning
(00:23:56) Key lessons on integrating customer feedback
(00:35:12) Flaws in data science job titles
(00:45:51) How to be irreplaceable as a data scientist
(00:48:55) An unconventional career path as a data scientist
(01:12:24) Productivity and work-life balance
(01:28:10) Advice for building a personal brand

6 snips
Jul 4, 2023 • 1h 50min
Uber's ML Systems (Uber Eats, Customer Support), Declarative Machine Learning - Piero Molino - The Data Scientist Show #064
Piero Molino was one of the founding members of Uber AI Labs. He worked on several deployed ML systems, including an NLP model for Customer Support, and the Uber Eats Recommender System. He is the author of Ludwig , an open source declarative deep learning framework. In 2021 he co-founded Predibase, the low-code declarative machine learning platform built on top of Ludwig.
Piero's LinkedIn: https://www.linkedin.com/in/pieromolino
Predibase free access: bit.ly/3PCeqqw
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:54) Journey to machine learning
(00:03:51) Recommending system at Uber Eats
(00:04:13) Projects at Uber AI
(00:09:34) Uber's customer obsession ticket system
(00:16:01) How to evaluate online-offline business and model performance metrics
(00:17:16) Customer Satisfaction
(00:28:38) When do you know whether a project is good enough
(00:41:50) Declarative machine learning and Ludwig
(00:45:32) Ludwig vs AutoML
(00:54:44) Working with Professor Chris Re
(00:58:32) Why he started Predibase
(01:07:56) LLM and GenAI
(01:10:17) Challenges for LLMs
(01:22:36) Advice for data scientists
(01:34:29) Career advice to his younger self

10 snips
Jun 26, 2023 • 47min
Data science in transportation, the intersection of operations research and ML - Holger Teichgraeber - The Data Scientist Show #063
Holger Teichgraeber is a Data Science Manager at Archer Aviation. Previously, he worked at Convoy as a Research Scientist on their trucking marketplace, and at various companies in the energy space. Holger has a Bachelor's degree in Mechanical Engineering from Aachen, Germany, and a Masters and Ph.D. with research focus on machine learning and optimization applied to energy systems from Stanford University. He regularly writes on LinkedIn, with the goal to show how to build valuable products at the intersection of machine learning and optimization in production. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science and career.
Holger's LinkedIn: https://www.linkedin.com/in/holgerteichgraeber/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:28) How he got into operations research
(00:02:39) Operation research vs data science
(00:04:37) Trucking optimization at Convoy
(00:08:42) Optimization problem
(00:10:18) Strategic planning on air mobility at Archer
(00:13:50) Using simulation and solving a problem
(00:16:45) Big data science work vs smaller data science work
(00:21:23) Stakeholder management
(00:29:28) IC vs Manager
(00:32:04) Advice on promotion
(00:39:12) Work cultures in Germany and the US
(00:41:16) How to handle tight deadlines
(00:43:21) Important feedback from his work
(00:44:14) How to plan projects
(00:44:45) Next big challenge for data science teams
(00:45:40) Career growth in the next few years
(00:46:01) Connect with Holger