

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Oct 21, 2022 • 51min
From Data Science to DataOps - Tomasz Hinc
We talked about:
Tomasz’s background
What Tomasz did before DataOps (Data Science)
Why Tomasz made the transition from Data science to DataOps
What is DataOps?
How is DataOps related to infrastructure?
How Tomasz learned the skills necessary to become DataOps
Becoming comfortable with terminal
The overlap between DataOps and Data Engineering
Suitable/useful skills for DataOps
Minimal operational skills for DataOps
Similarities between DataOps and Data Science Managers
Tomasz’s interesting projects
Confidence in results and avoiding going too deep with edge cases
Conclusion
Links:
Terminal setup video, 19 minutes long: https://www.youtube.com/watch?v=D2PSsnqgBiw
Command line videos, one and a half hour to become somewhat comfy with the terminal: https://www.youtube.com/playlist?list=PLIhvC56v63IKioClkSNDjW7iz-6TFvLwS
Course from MIT talking about just that (command line, git, storing secrets): https://missing.csail.mit.edu/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 14, 2022 • 54min
Data Science Career Development - Katie Bauer
We talked about:
Katie’s background
What is a data scientist?
What is a data science manager?
Quality of the craft
How data leaders promote career growth
Supporting senior data professionals
Choosing the IC route vs the management route
Managing junior data professionals
Talking to senior stakeholders and PMs as a junior
The importance of hiring juniors
What skills do data scientist managers need to get hired?
How juniors that are just starting out can set themselves apart from the competition
Asking senior colleagues for help and the rubber duck channel
The challenges of the head of data
Conclusion
Links:
Jobs at Gloss Genius: https://boards.greenhouse.io/glossgenius
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 7, 2022 • 49min
From Testing Phones to Managing NLP Projects - Alvaro Navas Peire
We talked about:
Alvaro’s background
Working as a QA (Quality Assurance) engineer
Transitioning from QA to Machine Learning
Gathering knowledge about ML field
Searching for an ML job (improving soft skills and CV)
Data science interview skills
Zoomcamp projects
Zoomcamp project deployment
How to not undersell yourself during interviews
Alvaro’s experience with interviews during his transition
Alvaro’s Zoomcamp notes
Alvaro’s coach
The importance of mathematical knowledge to a transition into ML
Preparing for technical interviews
Alvaro’s typical workday
Alvaro’s team’s tech stack
The importance of a technical background to transitioning into ML
Links:
Alvaro's CV: https://www.dropbox.com/s/89hkt3ug0toqa2n/CV%20nou%20-%20angl%C3%A8s.pdf?dl=0
Github profile: https://github.com/ziritrion
LinkedIn profile: https://www.linkedin.com/in/alvaronavas/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcampJoin
DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 30, 2022 • 53min
Responsible and Explainable AI - Supreet Kaur
We talked about:
Supreet’s background
Responsible AI
Example of explainable AI
Responsible AI vs explainable AI
Explainable AI tools and frameworks (glass box approach)
Checking for bias in data and handling personal data
Understanding whether your company needs certain type of data
Data quality checks and automation
Responsibility vs profitability
The human touch in AI
The trade-off between model complexity and explainability
Is completely automated AI out of the question?
Detecting model drift and overfitting
How Supreet became interested in explainable AI
Trustworthy AI
Reliability vs fairness
Bias indicators
The future of explainable AI
About DataBuzz
The diversity of data science roles
Ethics in data science
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/supreet-kaur1995/
Databuzz page: https://www.linkedin.com/company/databuzz-club/
Medium Blog Page: https://medium.com/@supreetkaur_66831
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 30, 2022 • 50min
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 23, 2022 • 17sec
No episode this week
Have a great weekend!

Sep 16, 2022 • 59min
Leading Data Research - David Bader
We talked about:
David’s background
A day in the life of a professor
David’s current projects
Starting a school
The different types of professors
David’s recent papers
Similarities and differences between research labs and startups
Finding (or creating) good datasets
David’s lab
Balancing research and teaching as a professor
David’s most rewarding research project
David’s most underrated research project
David’s virtual data science seminars on YouTube
Teaching at universities without doing research
Staying up-to-date in research
David’s favorite conferences
Selecting topics for research
Convincing students to stay in academia and competing with industry
Finding David online
Links:
David A. Bader: https://davidbader.net/
NJIT Institute for Data Science: https://datascience.njit.edu/
Arkouda: https://github.com/Bears-R-Us/arkouda
NJIT Data Science YouTube Channel: https://www.youtube.com/c/NJITInstituteforDataScience
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 9, 2022 • 56min
Dataset Creation and Curation - Christiaan Swart
We talked about:
Christiaan’s background
Usual ways of collecting and curating data
Getting the buy-in from experts and executives
Starting an annotation booklet
Pre-labeling
Dataset collection
Human level baseline and feedback
Using the annotation booklet to boost annotation productivity
Putting yourself in the shoes of annotators (and measuring performance)
Active learning
Distance supervision
Weak labeling
Dataset collection in career positioning and project portfolios
IPython widgets
GDPR compliance and non-English NLP
Finding Christiaan online
Links:
My personal blog: https://useml.net/
Comtura, my company: https://comtura.ai/
LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 2, 2022 • 54min
Data Mesh 101 - Zhamak Dehghani
We talked about:
Zhamak’s background
What is Data Mesh?
Domain ownership
Determining what to optimize for with Data Mesh
Decentralization
Data as a product
Self-serve data platforms
Data governance
Understanding Data Mesh
Adopting Data Mesh
Resources on implementing Data Mesh
Links:
Free 30-day code from O'Reilly: https://learning.oreilly.com/get-learning/?code=DATATALKS22
Data Mesh book: https://learning.oreilly.com/library/view/data-mesh/9781492092384/
LinkedIn: https://www.linkedin.com/in/zhamak-dehghani
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 26, 2022 • 53min
Growing Data Engineering Team in a Scale-Up - Mehdi OUAZZA
We talked about:
Mehdi’s background
The difference between startup, scale-up and enterprise
Hypergrowth
Data platform engineers in a scale-up environment
What a data platform is and who builds it
Managing the fast pace of a scale-up while ensuring personal growth
Should a senior data person consider a scale-up or an enterprise?
Should a junior data person consider a scale-up or an enterprise?
Sourcing talent for hyper-growth companies and developing a community culture
Generating content and getting feedback
Generalization vs specialization for data engineers in a scale-up
The ratio of work between platform building and use case pipelines
Being proactive in order to progress to mid or senior level
Caps and bass guitars
MehdiO DataTV and DataCreators.Club (Mehdi’s YouTube Channel and podcast)
Links:
Mehdi's YouTube channel: https://www.youtube.com/channel/UCiZxJB0xWfPBE2omVZeWPpQ
Mehdi's Linkedin: https://linkedin.com/in/mehd-io/
Mehdi's Medium Blog: https://medium.com/@mehdio
Mehdi's data creators club: https://datacreators.club/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html