
DataTalks.Club
DataTalks.Club - the place to talk about data!
Latest episodes

Sep 30, 2022 • 50min
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 23, 2022 • 17sec
No episode this week
Have a great weekend!

Sep 16, 2022 • 59min
Leading Data Research - David Bader
We talked about:
David’s background
A day in the life of a professor
David’s current projects
Starting a school
The different types of professors
David’s recent papers
Similarities and differences between research labs and startups
Finding (or creating) good datasets
David’s lab
Balancing research and teaching as a professor
David’s most rewarding research project
David’s most underrated research project
David’s virtual data science seminars on YouTube
Teaching at universities without doing research
Staying up-to-date in research
David’s favorite conferences
Selecting topics for research
Convincing students to stay in academia and competing with industry
Finding David online
Links:
David A. Bader: https://davidbader.net/
NJIT Institute for Data Science: https://datascience.njit.edu/
Arkouda: https://github.com/Bears-R-Us/arkouda
NJIT Data Science YouTube Channel: https://www.youtube.com/c/NJITInstituteforDataScience
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 9, 2022 • 56min
Dataset Creation and Curation - Christiaan Swart
We talked about:
Christiaan’s background
Usual ways of collecting and curating data
Getting the buy-in from experts and executives
Starting an annotation booklet
Pre-labeling
Dataset collection
Human level baseline and feedback
Using the annotation booklet to boost annotation productivity
Putting yourself in the shoes of annotators (and measuring performance)
Active learning
Distance supervision
Weak labeling
Dataset collection in career positioning and project portfolios
IPython widgets
GDPR compliance and non-English NLP
Finding Christiaan online
Links:
My personal blog: https://useml.net/
Comtura, my company: https://comtura.ai/
LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 2, 2022 • 54min
Data Mesh 101 - Zhamak Dehghani
We talked about:
Zhamak’s background
What is Data Mesh?
Domain ownership
Determining what to optimize for with Data Mesh
Decentralization
Data as a product
Self-serve data platforms
Data governance
Understanding Data Mesh
Adopting Data Mesh
Resources on implementing Data Mesh
Links:
Free 30-day code from O'Reilly: https://learning.oreilly.com/get-learning/?code=DATATALKS22
Data Mesh book: https://learning.oreilly.com/library/view/data-mesh/9781492092384/
LinkedIn: https://www.linkedin.com/in/zhamak-dehghani
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 26, 2022 • 53min
Growing Data Engineering Team in a Scale-Up - Mehdi OUAZZA
We talked about:
Mehdi’s background
The difference between startup, scale-up and enterprise
Hypergrowth
Data platform engineers in a scale-up environment
What a data platform is and who builds it
Managing the fast pace of a scale-up while ensuring personal growth
Should a senior data person consider a scale-up or an enterprise?
Should a junior data person consider a scale-up or an enterprise?
Sourcing talent for hyper-growth companies and developing a community culture
Generating content and getting feedback
Generalization vs specialization for data engineers in a scale-up
The ratio of work between platform building and use case pipelines
Being proactive in order to progress to mid or senior level
Caps and bass guitars
MehdiO DataTV and DataCreators.Club (Mehdi’s YouTube Channel and podcast)
Links:
Mehdi's YouTube channel: https://www.youtube.com/channel/UCiZxJB0xWfPBE2omVZeWPpQ
Mehdi's Linkedin: https://linkedin.com/in/mehd-io/
Mehdi's Medium Blog: https://medium.com/@mehdio
Mehdi's data creators club: https://datacreators.club/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 19, 2022 • 54min
Lessons Learned About Data & AI at Enterprises - Alexander Hendorf
We talked about:
Alexander’s background
The role of Partner at Königsweg
Being part of the data and AI community
How Alexander became chair at PyData
Alexander’s many talks and advice on giving them
Explaining AI to managers
Why being able to explain machine learning to managers is important
The experimentational nature of AI and why it’s not a cure-all
Innovation requires patience
Convincing managers not to use AI or ML when there are better (simpler) solutions
The role of MLOps in enterprises
Thinking about the mid- and long-term when considering solutions
Finding Alexander online
Links:
Alexander's Twitter: https://twitter.com/hendorf
Alexander's LinkedIn: https://www.linkedin.com/in/hendorf/
Königsweg: https://www.koenigsweg.com
PyData Südwest: https://www.meetup.com/pydata-suedwest/
PyData Frankfurt: https://www.meetup.com/pydata-frankfurt/
PyConDE & PyData Berlin: https://pycon.de
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 12, 2022 • 54min
MLOps Architect - Danny Leybzon
We talked about:
Danny’s background
What an MLOps Architect does
The popularity of MLOps Architect as a role
Convincing an employer that you can wear many different hats
Interviewing for the role of an MLOps Architect
How Danny prioritizes work with data scientists
Coming to WhyLabs when you’ve already got something in production vs nothing in production
Market awareness regarding the importance of model monitoring
How Danny (WhyLabs) chooses tools
ONNX
Common trends in tooling setups
The most rewarding thing for Danny in ML and data science
Danny’s secret for staying sane while wearing so many different hats
T-shaped specialist, E-shaped specialist, and the horizontal line
The importance of background for the role of an MLOps Architect
Key differences for WhyLogs free vs paid
Conclusion and where to find Danny online
Links:
Matt Turck: https://mattturck.com/data2021/
AI Observability Platform: https://whylabs.ai/observability
Danny's LinkedIn: https://www.linkedin.com/in/dleybz/
Whylabs' website: https://whylabs.ai/
AI Infrastructure Alliance: https://ai-infrastructure.org/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 5, 2022 • 49min
Decoding Data Science Job Descriptions - Tereza Iofciu
We talked about:
DataTalks.Club intro
Tereza’s background
Working as a coach
Identifying the mismatches between your needs and that of a company
How to avoid misalignments
Considering what’s mentioned in the job description, what isn’t, and why
Diversity and culture of a company
Lack of a salary in the job description
Way of doing research about the company where you will potentially work
How to avoid a mismatch with a company other than learning from your mistakes
Before data, during data, after data (a company’s data maturity level)
The company’s tech stack
Finding Tereza online
Links:
Decoding Data Science Job Descriptions (talk): https://www.youtube.com/watch?v=WAs9vSNTza8
Talk at ConnectForward: https://www.youtube.com/watch?v=WAs9vSNTza8
Slides: https://www.slideshare.net/terezaif/decoding-data-science-job-descriptions-250687704
Talk at DataLift: https://www.youtube.com/watch?v=pCtQ0szJiLA
Slides: https://www.slideshare.net/terezaif/lessons-learned-from-hiring-and-retaining-data-practitioners
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 29, 2022 • 48min
Data Science for Social Impact - Christine Cepelak
We talked about:
Christine’s Background
Private sector vs Public sector
Public policy
The challenges of being a community organizer
How public policy relates to political science
Programs that teach data science for public policy
Data science for public policy vs regular data science
The importance of ethical data science in public policy
How data science in social impact project differs from other projects
Other resources to learn about data science for public policy
Challenges with getting data in data science for public policy
The problems with accessing public datasets about recycling
Christine’s potential projects after Master’s degree
Gender inequality in STEM fields
Corporate responsibility and why organizations need social impact data scientists
What you need to start making a social impact with data science
80,000 hours
Other use cases for public policy data science
Coffee, Ethics & AI
Finding Christine online
Links:
Explore some Data Science for Social Good projects: http://www.dssgfellowship.org/projects/
Bi-weekly Ethics in AI Coffee Chat: https://www.meetup.com/coffee-ethics-ai/
Make a Social Impact with your Job: https://tinyurl.com/80khours
Course in Data Ethics: https://ethics.fast.ai/
Data Science for Social Good Berlin: https://dssg-berlin.org/
CorrelAid: https://correlaid.org/
DataKind: https://www.datakind.org/
Christine's LinkedIn: https://www.linkedin.com/in/christinecepelak/
Christine's Twitter: https://twitter.com/CLcep
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html