
DataTalks.Club
DataTalks.Club - the place to talk about data!
Latest episodes

Jul 30, 2021 • 58min
Humans in the Loop - Lina Weichbrodt
We talked about:
Lina’s background
What we need to remember when starting a project (checklists)
Make sure the problem is formalized and close to the core business
Get the buy-in with stakeholders
Building trust with stakeholders
Don’t just focus on upsides – ask about concerns
Turning a concert into a metric
What happens when something goes wrong?
Post mortem reporting
Apply the 5 why’s
If a lot of users say it’s a bug – it’s worth investigating
Post mortem format
Action points
Debugging vs explaining the model
Are there online versions of checklists?
Make sure to log your inputs
Talking to end-users and using your own service
Your ideas vs Stakeholder ideas
Should data practitioners educate the team about data?
People skills and ‘dirty’ hacks
Where to find Lina
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 23, 2021 • 1h 12min
Running from Complexity - Ben Wilson
We talked about:
Ben’s Background
Building solutions for customers
Why projects don’t make it to production
Why do people choose overcomplicated solutions?
The dangers of isolating data science from the business unit
The importance of being able to explain things
Maximizing chances of making into production
The IKEA effect
Risks of implementing novel algorithms
If it can be done simply – do that first
Don’t become the guinea pig for someone’s white paper
The importance of stat skills and coding skills
Structuring an agile team for ML work
Timeboxing research
Mentoring
Ben’s book
‘Uncool techniques’ at AI-First companies
Should managers learn data science?
Do data scientists need to specialize to be successful?
Links:
Ben's book: https://www.manning.com/books/machine-learning-engineering-in-action (get 35% off with code "ctwsummer21")
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 16, 2021 • 58min
I Want to Build a Machine Learning Startup! - Elena Samuylova
We talked about:
Elena’s background
Why do a startup instead of being an employee?
Where to get ideas for your startup
Finding a co-founder
What should you consider before starting a startup?
Vertical startup vs infrastructure startup
‘AI First’ startups
Building tools for engineers
What skills do you need to start a startup?
Startup risks
How to be prepared to fail
Work-life balance
The part-time startup approach
Startup investment models
No resources and no technical expertise – what to do?
Productionizing your services
When to hire an expert
Talking to people with a problem before solving the problem
Starting Elena’s startup, Evidently
Elena’s role at Evidently
Why is Evidently open source?
“People will just copy my open source code. Should I be concerned?”
Bottom-up adoption
Creating value so that clients engage with your product
Is there a difference between countries when creating a startup?
Does open source mean the data is safer?
When should you hire engineers?
Following the market
Startups out of genuine interest vs Just for money and for fun
Links:
EvidentlyAI: https://evidentlyai.com/
Elena's LinkedIn: https://www.linkedin.com/in/elenasamuylova/
Elena's Twitter: https://twitter.com/elenasamuylova/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 9, 2021 • 1h 2min
Big Data Engineer vs Data Scientist - Roksolana Diachuk
Links:
Twitter: https://twitter.com/dead_flowers22
LinkedIn: https://www.linkedin.com/in/roksolanadiachuk/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 2, 2021 • 1h 2min
Build Your Own Data Pipeline - Andreas Kretz
We talked about:
Andreas’s background
Why data engineering is becoming more popular
Who to hire first – a data engineer or a data scientist?
How can I, as a data scientist, learn to build pipelines?
Don’t use too many tools
What is a data pipeline and why do we need it?
What is ingestion?
Can just one person build a data pipeline?
Approaches to building data pipelines for data scientists
Processing frameworks
Common setup for data pipelines — car price prediction
Productionizing the model with the help of a data pipeline
Scheduling
Orchestration
Start simple
Learning DevOps to implement data pipelines
How to choose the right tool
Are Hadoop, Docker, Cloud necessary for a first job/internship?
Is Hadoop still relevant or necessary?
Data engineering academy
How to pick up Cloud skills
Avoid huge datasets when learning
Convincing your employer to do data science
How to find Andreas
Links:
LinkedIn: https://www.linkedin.com/in/andreas-kretz
Data engieering cookbook: https://cookbook.learndataengineering.com/
Course: https://learndataengineering.com/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 25, 2021 • 60min
From Software Engineering to Machine Learning - Santiago Valdarrama
We talked about:
Santiago’s background
“Transitioning to ML” vs “Adding ML as a skill”
Getting over the fear of math for software developers
Learning by explaining
Seven lessons I learned about starting a career in machine learning
Lesson 1 – Take the first step
Lesson 2 – Learning is a marathon, not a sprint
Lesson 3 – If you want to go quickly, go alone. If you want to go far, go together.
Lesson 4 – Do something with the knowledge you gain
Lesson 5 – ML is not just math. Math is not scary.
Lesson 6 – Your ability to analyze a problem is the most important skill. Coding is secondary.
Lesson 7 – You don’t need to know every detail
Tools and frameworks needed to transition to machine learning
Problem-based learning vs Top-down learning
Learning resources
Santiago’s favorite books
Santiago’s course on transitioning to machine learning
Improving coding skills
Building solutions without machine learning
Becoming a better engineer
What is the difference between machine learning and data science?
Getting into machine learning - Reiteration
Getting past the math
Links:
Santiago's Twitter: https://twitter.com/svpino
Santiago's course: https://gumroad.com/svpino#kBjbC
Pinned tweet with a roadmap: https://twitter.com/svpino/status/1400798154732212230
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 18, 2021 • 60min
Analytics Engineer: New Role in a Data Team - Victoria Perez Mola
Links:
https://www.notion.so/Analytics-Engineer-New-Role-in-a-Data-Team-9decbf33825c4580967cf3173eb77177
https://www.linkedin.com/in/victoriaperezmola/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Conference: https://datatalks.club/conferences/2021-summer-marathon.html

Jun 11, 2021 • 58min
Data Governance - Jessi Ashdown, Uri Gilad
We talked about:
Jessi’s background
Uri’s background
Data governance
Implementing data governance: policies and processes
Reasons not to have data governance
Start with “why”
Cataloging and classifying our data
Let data work for you
The human component
Data quality
Defining policies
Implementing policies
Shopping-card experience for requesting data
Proving the value of data catalog
Using data catalog
Data governance = data catalog?
Links:
Book: https://www.oreilly.com/library/view/data-governance-the/9781492063483/
Jessi’s LinkedIn: https://www.linkedin.com/in/jashdown/
Uri’s LinkedIn: https://linkedin.com/in/ugilad
Uri’s Twitter: https://twitter.com/ugilad
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Conference: https://datatalks.club/conferences/2021-summer-marathon.html

Jun 4, 2021 • 60min
What Data Scientists Don’t Mention in Their LinkedIn Profiles - Yury Kashnitsky
We talked about:
Yury’s background
Failing fast: Grammarly for science
Not failing fast: Keyword recommender
Four steps to epiphany
Lesson learned when bringing XGBoost into production
When data scientists try to be engineers
Joining a fintech startup: Doing NLP with thousands of GPUs
Working at a Telco company
Having too much freedom
The importance of digital presence
Work-life balance
Quantifying impact of failing projects on our CVs
Business trips to Perm: don’t work on the weekend
What doesn’t kill you makes you stronger
Links:
Yury's course: https://mlcourse.ai/
Yury's Twitter: https://twitter.com/ykashnitsky
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 28, 2021 • 1h
Becoming a Data-led Professional - Arpit Choudhury
We talked about:
Data-led academy
Arpit’s background
Growth marketing
Being data-led
Data-led vs data-driven
Documenting your data: creating a tracking plan
Understanding your data
Tools for creating a tracking plan
Data flow stages
Tracking events — examples
Collecting the data
Storing and analyzing the data
Data activation
Tools for data collection
Data warehouses
Reverse ETL tools
Customer data platforms
Modern data stack for growth
Buy vs build
People we need to in the data flow
Data democratization
Motivating people to document data
Product-led vs data-led
Links:
https://dataled.academy/
Join our Slack: https://datatalks.club/slack.html