
The Data Scientist Show - Daliana Liu
A deep dive into data scientists' day-to-day work, tools and models they use, how they tackle problems, and their career journeys. This podcast helps you grow a successful career in data science. Listening to an episode is like having lunch with an experienced mentor. Guests are data science practitioners from various industries, AI researchers, economists, and CTOs of AI companies. Host: Daliana Liu, an ex-Amazon senior data scientist with 180k followers on Linkedin.
Join 20k subscribers at www.dalianaliu.com to learn more about data science, career, and this show. Twitter @DalianaLiu.
Latest episodes

May 18, 2023 • 1h 22min
Tackling data quality issues, 5 pillars of data observability, from management consultant to CEO of Monte Carlo - Barr Moses -The Data Scientist Show #062
Barr Moses is a consultant turned CEO & Co-Founder of Monte Carlo, a data reliability company. She started her career as a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford University. Later, she became VP of Customer Operations at customer success company Gainsight, where she built the data and analytics team. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science. Today, we’ll talk about Barr’s career journey, data reliability and observability, and what it means for data teams. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.
Barr's LinkedIn: https://www.linkedin.com/in/barrmoses/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:24) How did she got into data science
(00:08:26) Frameworks for data-driven decisions
(00:11:20) Is customer support ticket always bad?
(00:15:20) How to quickly find out what is true
(00:20:17) Struggles in the data team
(00:23:37) Daliana’s story about lineage
(00:28:00) People stressed about data
(00:28:09) Netflix was down because of wrong data
(00:30:40) Common issues with data quality
(00:33:14) 5 pillars of data observability
(00:39:14) How does Monte Carlo help data scientists
(00:43:08) Build in-house vs adopt tools
(00:45:48) How Daliana fixed a data quality issue
(01:02:44) How to measure the impact of the data team
(01:09:09) Mistakes she made
(01:15:28) Beat the odds

Feb 21, 2023 • 1h 27min
Is search dead? Google vs ChatGPT, from Google Search to enterprise search at Glean, machine learning in search, tech layoffs - Deedy Das - The Data Scientist Show #061
Deedy Das is a founding engineer at Glean, an enterprise search startup. Previously, he was a Tech Lead at Google Search working on query understanding and the sports product in New York, Tel Aviv, and Bangalore. Before that, he was an engineer at Facebook New York and graduated from Cornell University. Outside of work, Deedy writes on his blog. He published a viral resume template and his work on exposing grading flaws in the Indian education system. He also enjoys running marathons, road cycling, and playing cricket. Today we’ll talk about the search projects he worked on at Google, why he left Google, his current work at Glean, and his thoughts on whether Google is doomed because of ChatGPT. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.
Deedy's Twitter: https://twitter.com/debarghya_das?s=20
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
(00:00:00) Introduction
(00:01:52) What is search
(00:04:33) Query understanding
(00:12:46) Google vs ChatGPT
(00:18:24) Fixing bug for Sundar Pichai
(00:27:33) Why he left google
(00:30:32) How to get into search
(00:34:38) Enterprise search at Glean
(00:46:55) Advice for people who got laid off
(00:48:41) What do search engineers do
(00:51:37) How he evaluates candidates
(00:53:58) Future of search
(00:57:16) Why the web is declining
(00:59:25) Copilot and AI-powered developer tools
(01:03:46) Indian startup ecosystem
(01:07:45) India vs Silicon Valley
(01:09:48) How he grew 30k followers on Twitter
(01:13:28) Daliana and Deedy’s challenge with social media
(01:19:31) Career mistakes he made

22 snips
Feb 20, 2023 • 1h 43min
The 100-hour work week of an self-taught machine learning researcher, how he got into Google Brain, why he started Omni - Jeremy Nixon - The Data Scientist Show #060
Jeremy Nixon is a machine learning researcher, software engineer, and startup founder. Previously he was a software engineer at Google Brain working on deep learning. Now, he is the co-founder and CEO of Omni, building an immersive information retrieval system for you and your team. He studied applied math at Harvard University. Today we’ll talk about how he got into Google brain, his 3-month self-learning plan to learn machine learning, his startup, and how he executed his goal relentlessly since 2016. If you enjoy the show, subscribe to the channel and leave a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com for more on data science.
Jeremy's Twitter: https://twitter.com/JvNixon
Jeremy's Blog: https://jeremynixon.github.io/
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu
Jeremy's LinkedIn: https://www.linkedin.com/in/jeremyvnixon
(00:00:00) Introduction
(00:01:50) Research in Google Brain
(00:03:37) How he got into Google Brain
(00:07:56) His 3-month plan to learn ML
(00:17:55) The 100-hour workweek
(00:33:26) What if he is tired
(00:39:59) Why he found Omni
(00:44:24) Data science problems in Omni
(00:54:42) Future of machine learning
(00:57:51) Silicon Valley is very accessible
(00:59:47) The golden handcuffs
(01:06:58) From data scientist to full-stack engineer
(01:09:06) Close-minded data scientists
(01:24:10) Advice to ML learners
(01:29:41) Something he wished that he did when he was younger
(01:37:25) The future of his career
(01:42:17) Connect with Jeremy

Jan 24, 2023 • 1h 20min
The power of error analysis, tree models for search relevancy, what ChatGPT means for data scientists - Sergey Feldman - The Data Scientist Show #059
Sergey Feldman is the head of AI at Alongside, providing mental health support for students. He is also a Lead Applied Research Scientist at Allen Institute for AI, where he built an ML model that improved search relevancy for scientific literature. Sergey has a PhD in Electrical and Electronics Engineering from the University of Washington. Today we’ll talk about machine learning for search, his consulting project for the Gates Foundation, AI for mental health, and career lessons. Make sure you listen till the end. If you like the show, subscribe, leave a comment, and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Sergey's LinkedIn: https://www.linkedin.com/in/sergey-feldman-6b45074b/
Data Cowboys: http://www.data-cowboys.com/
Sergey Feldman: You Should Probably Be Doing Nested Cross-Validation | PyData Miami 2019: https://www.youtube.com/watch?v=DuDtXtKNpZs
December 4th, 2018 - Breakfast with WACh with Dr. Sergey Feldman, PhD: https://www.youtube.com/watch?v=vA_czRcCpvQ
(00:00:00) Introduction
(00:01:24) Machine learning skeptic
(00:03:02) Tree-based models for search relevance
(00:14:34) How to do error analysis
(00:19:20) Nested cross-validation
(00:21:34) Model evaluation
(00:30:43) Error analysis common mistakes
(00:33:37) How to avoid overfitting
(00:35:56) Consulting project with Gates Foundation
(00:41:16) Tree-based models vs linear models
(00:45:19) Working with non-tech stakeholders
(00:50:20) Chatbot for teen’s mental health
(00:54:32) Can ChatGPT provide therapy?
(00:58:12) How he got into machine learning
(01:02:12) How to not have a boss
(01:03:46) Feelings vs Facts
(01:09:02) Future of machine learning
(01:11:30) How to prepare for the future
(01:13:39) AutoML
(01:17:12) His passion for large language models

6 snips
Dec 7, 2022 • 1h 9min
How to build data science muscle memory, DeepChecks -- an open source ML testing suite - Philip Tannor - The Data Scientist Show #058
Philip Tannor is the Co-Founder and CEO of Deepchecks, a python package to run checks for machine learning models. Previously, he was the head of data science group at the Isreal Defense Force. He has a master's degree from Tel Aviv University in engineering, his thesis was about a new algorithm that combines neural networks with gradient-boosting decision trees. Today we’ll talk about his career journey, how to build your data science muscle memory, the algorithm he worked on, and how to check ML models. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.
Daliana's Twitter: https://twitter.com/DalianaLiuDaliana's
LinkedIn: https://www.linkedin.com/in/dalianaliu/
Philip’s LinkedIn: https://www.linkedin.com/in/philip-tannor-a6a910b7/?originalSubdomain=il
Augboost: https://medium.com/@ptannor/augboost-like-xgboost-but-with-few-twists-e4df4017a5c4
(00:00:00) Introduction
(00:01:17) How did he get into ML
(00:02:52) Data science in the military
(00:08:15) How to take feedback
(00:13:24) Handling criticism
(00:15:12) What he worked on
(00:18:18) testing deployment
(00:21:28) How to build the data science muscle memory
(00:27:09) Improving the skills of data scientists
(00:30:42) His thesis in grad school
(00:36:59) Combine NN and gradient boosting
(00:40:05) Aug boost
(00:41:15)Tools he uses
(00:45:58) Deepchecks
(00:50:46) Most challenging part of building Deepchecks
(00:52:05) How can people contribute
(00:53:40) Behind the scenes
(00:56:09) Deciding how to fix or improve the model
(01:00:49) Advise for those who wanna create open-source projects
(01:04:07) Features to add for the enterprise product
(01:06:57) About his life and career right now
(01:08:27) Connect with Philip

Nov 24, 2022 • 1h 15min
The Daliana Special: how did I got into data science, 5 things only experienced data scientists know, and why I started "The Data Scientist Show" - Daliana Liu #057
Who is Daliana? This is a conversation I had in 2021 with Harpreet Sahota. I talked about my unexpected journey to data science all the way back in high school, things I wish I could know earlier about my career, the projects I worked on, what is like to be a quote-and-unquote influencer on Linkedin, and more. If you want more content from me, I write about data science and career nerdy jokes, on my Linkedin and you can subscribe to my very infrequent newsletter at dalianaliu.com. I’m curious what you think about this episode, leave a comment on YouTube or send a DM on Linkedin. Hope you enjoy the Daliana special!
Daliana's Newsletter: https://dalianaliu.com
Daliana's Twitter: https://twitter.com/DalianaLiu
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Harpreet's LinkedIn: https://www.linkedin.com/in/harpreetsahota204/
The artist of the data science podcast: https://theartistsofdatascience.fireside.fm/
(00:00:00) Introduction
(00:02:52) Where did Daliana grow up
(00:05:19) Daliana in highschool
(00:07:11) How did she got into data science
(00:11:36) Why is writing important for data scientist
(00:15:51) How to write better
(00:20:56) Career lessons you didn't learn in school
(00:27:40) Imposter syndrome
(00:31:29) Day-to-day work as a data scientist
(00:36:16) Most common mistakes data scientists make
(00:39:41) Data Analyst vs. Data Scientist
(00:42:30) What is the science in data science?
(00:44:51) Can everyone be a data scientist
(00:49:21) Linkedin profile tips for job search
(00:52:59) How she creates content
(00:54:11) Being a data scientist "influencer"
(00:56:04) Why she started "the data scientist show"
(01:01:16) Women in data science
(01:06:39) What's her legacy
(01:09:43) What is she reading
(01:14:21) Connect with Daliana

5 snips
Nov 8, 2022 • 1h 8min
How he carved his own path at Airbnb, from data engineer to CEO of Mage - Tommy Dang - the data scientist show #056
Tommy Dang is the Co-founder and CEO of Mage, a data ingestion and transformation pipeline for data engineers (https://github.com/mage-ai/mage-ai). Previously, he was working on data engineering and machine learning engineering at Airbnb. He has a bachelor degree of science in UC Berkeley studying economic, history, and sociology. Today we’ll talk about how he learned engineering and machine learning after college, data tools and ML tools he built at Airbnb, performance review, and how he navigates his career. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.
Tommy’s LinkedIn: https://www.linkedin.com/in/dangtommy/
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
(00:00:00) Introduction
(00:01:28) Get into computer science from non-tech background
(00:03:08) How he started his first project
(00:04:07) Projects at Airbnb
(00:06:09) Speed vs Quality when building data pipelines
(00:16:34) How to deal with AdHoc requests
(00:21:00) How did he learn machine learning
(00:24:04) How he convinced data scientists to teach him ML
(00:25:15) Performance review
(00:27:11) Don’t let your job title limit your career
(00:28:29) Why he started his company
(00:31:38) Build your own tool vs use open source solutions
(00:33:12) Transitioning from an engineer to a CEO
(00:34:50) Earn trust from internal stakeholders
(00:36:27) Career advice
(00:41:31) How he carved his own path at Airbnb
(00:46:00) How did he learn to be a good engineer
(00:47:10) Best advice for data scientists or engineers
(00:48:41) Most important quality of data scientists or engineers
(00:51:51) Design principles
(00:58:51) Future of tools
(01:01:00) What does he think about his future career
(01:05:05) Inspiration of Tommy

9 snips
Oct 24, 2022 • 1h 24min
How to effectively test and debug machine learning models, from ML engineer@Apple to startup founder - Gabriel Bayomi - the data scientist show #055
Gabriel Bayomi is the Co-Founder at OpenLayer, a tool that tests & debugs machine learning models. OpenLayer was in the YCombinator’s batch in 2021, building tools for machine learning model testing. Previously he was a machine learning engineer at Apple working on Siri. He has a master degree in computer science from Carnegie Mellon. He is passionate about Natural Language Processing, Machine Learning, and Computational Social Science. We talked about how to test and debug machine learning models, his experience at Apple, and career lessons. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science and career.
Gabriel’s LinkedIn: https://www.linkedin.com/in/gbayomi
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
(0:00) Intro
(01:01:39) How he got into machine learning
(01:06:43) His experience at Apple, Siri
(01:15:55) How to validate the solution
(01:19:39) Benefits of using external error analysis framework
(01:21:30) How to build a model evaluation pipeline
(01:28:26) Don’t overfit the subset of data
(01:33:19) Your validation set shouldn’t be fixed
(01:41:03) Become one with data
(01:44:05) Three model interpretability library you should use
(01:50:47) Common mistakes people made in model validation
(01:53:33) How to create an adversarial test
(01:55:43) How to check data quality
(01:06:46) Transition from engineer to executive
(01:10:04) Things he learnt from his favorite coworker
(01:17:57) how job roles would evolve

10 snips
Oct 19, 2022 • 2h 12min
From Amazon research scientist to head of data product at Vestiaire Collective, why data science projects fail, how to be a good communicator - Alisa Kim - the data scientist show #054
Alisa Kim is the head of data product at Vestiaire Collective. Previously, she was a research scientist at Amazon Web Services. We used to work on the same team in Machine Learning Solutions Lab and Amazon Web Services. We have collaborated on projects before and previously she was a consultant and worked on analytics and investment banking. She has a Ph.D. in Econ AI and she has worked on various industries and multiple continents. She's someone I really enjoyed working with. We talked about her journey, the projects she worked on and the lessons she learnt. If you like the show subscribe to the channel and give us a 5 star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Alisa's LinkedIn: https://de.linkedin.com/in/alisakolesnikova
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's twitter: https://twitter.com/DalianaLiu
(0:00) Intro
(00:01:38) how she got into data science
(00:04:38) day-to-day at AWS ML Solutions Lab
(00:08:00) AWS leadership principles
(00:16:34) challenges the consultant faces when working with external customers
(00:23:36) from AWS to Vestiaire Collective
(00:37:54) how to build a better data product
(00:44:17) how data scientist can align with business stakeholders
(00:57:52) from tech to business
(01:01:33) how to develop communication skills
(01:09:17) increase visibility of the data science team
(01:17:22) being proactive vs being passive in chasing opportunities
(01:24:06) get feedback from your "nearest neighbors"
(01:25:37) how to set boundary at work
(01:38:48) mistakes she made in her career
(01:48:25) how to manage disagreement
(01:57:53) future of data science

Oct 15, 2022 • 1h 33min
The lessons from almost losing a million dollars for his company, how to build good data assets and get buy-in from the leadership - Mark Freeman - the data scientist show#053
Mark Freeman is a community health advocate turned data scientist His mission is to improve the well-being of people, especially among those marginalized. He is currently a senior data scientist at Humu where he builds data tools that drive behavior change to make work better. He has a master degree from the Stanford School of Medicine in clinical research, experimental design and statistics. He also has a certificate in entrepreneurship from the Business School of Stanford. In his free time, he volunteers with a Bay Area Community Health Advisory Council. He also plays Men's Division III Rugby. We talked about the building data tools, data engineering skills for data scientist, how to pitch a projects, and his career journey. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.
Daliana's LinkedIn: https://www.linkedin.com/in/dalianaliu/
Daliana's Twitter: https://twitter.com/DalianaLiu
Mark's LinkedIn: https://www.linkedin.com/in/mafreeman2/
Chapters:
(0:00) Intro
(00:03:05) Our experience using R - 1000 lines of code
(00:09:22) Entrepreneurship within a company
(00:16:25) DBT and modern data stack
(00:20:15) Tools don’t matter (in interviews)
(00:21:09) Things DE enjoys but DS doesn’t
(00:24:55) How to work with different stakeholders
(00:30:32) Common SQL mistakes
(00:33:34) SQL vs Python vs R
(00:35:26) T.R.I.B.E framework for projects
(00:40:43) Meet the stakeholders where they at
(00:42:40) Use feedback to get buy-in from collaborator
(00:46:36) How to pitch a new idea
(00:49:45) Don’t lead with solution, lead with the problem
(00:51:03) How to get buy-in from the leadership
(00:57:56) Present an idea as if the audience came up with it
(00:58:41) How to iterate a project
(01:00:27) How he almost lost 1 Million dollar for his company
(01:02:07) Things he learned from his manager
(01:04:19) Things that help people make changes effectively
(01:06:05) Things he learned from mentoring
(01:12:19) Mental Health and anxiety
(01:17:12) Web3
(01:20:14) Why he cares about community health
(01:25:40) "Soul - searching" on his future
(01:28:36) Why he write on LinkedIn
(01:30:04) Future of data science