

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Jan 13, 2023 • 50min
Doing Software Engineering in Academia - Johanna Bayer
We talked about:
Johanna’s background
Open science course and reproducible papers
Research software engineering
Convincing a professor to work on software instead of papers
The importance of reproducible analysis
Why academia is behind on software engineering
The problems with open science publishing in academia
The importance of standard coding practices
How Johanna got into research software engineering
Effective ways of learning software engineering skills
Providing data and analysis for your project
Johanna’s initial experience with software engineering in a project
Working with sensitive data and the nuances of publishing it
How often Johanna does hackathons, open source, and freelancing
Social media as a source of repos and Johanna’s favorite communities
Contributing to Git repos
Publishing in the open in academia vs industry
Johanna’s book and resource recommendations
Conclusion
Links:
The Society of Research Software Engineering, plus regional chapters: https://society-rse.org/
The RSE Association of Australia and New Zealand: https://rse-aunz.github.io/
Research Software Engineers (RSEs) The people behind research software: https://de-rse.org/en/index.html
The software sustainability institute: https://www.software.ac.uk/
The Carpentries (beginner git and programming courses): https://carpentries.org/
The Turing Way Book of Reproducible Research: https://the-turing-way.netlify.app/welcome
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

7 snips
Jan 6, 2023 • 53min
Data-Centric AI - Marysia Winkels
We talked about:
Marysia’s background
What data-centric AI is
Data-centric Kaggle competitions
The mindset shift to data-centric AI
Data-centric does not mean you should not iterate on models
How to implement the data-centric approach
Focusing on the data vs focusing on the model
Resources to help implement the data-centric approach
Data-centric AI vs standard data cleaning
Making sure your data is representative
Knowing when your data is good enough
The importance of user feedback
“Shadow Mode” deployment
What to do if you have a lot of bad data or incomplete data
Marysia’s role at PyData
How Marysia joined PyData
The difference between PyData and PyCon
Finding Marysia online
Links:
Embetter & Bulk Demo: https://www.youtube.com/watch?v=L---nvDw9KU
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Dec 16, 2022 • 54min
Business Skills for Data Professionals - Loris Marini
We talked about:
Loris’ background
Transitioning from physics to data
Aligning people on concepts
Lead indicators and stickiness
Context, semantics, and meaning
Communication and being memorable
Making data digestible for business and building trust
The importance of understanding the language of business
Stakeholder mapping
Attending business meetings as a data professional
Organizing your stakeholder map
Prioritizing
How to support the business strategy
Learning to speak online
Resource recommendations from Loris
Links:
Discovering Data Discord server: https://bit.ly/discovering-data-discord
Loris' LinkedIn: https://www.linkedin.com/in/lorismarini/
Loris' Twitter: https://twitter.com/LorisMarini

Dec 9, 2022 • 53min
From Software Engineer to Data Science Manager - Sadat Anwar
We talked about:
Sadat’s background
Sadat’s backend engineering experience
Sadat’s pivot point as a backend engineer
Sadat’s exposure to ML and Data Science
Sadat’s Act Before you Think approach (with safety nets)
Sadat’s street cred and transition into management
The hiring process as an internal candidate
The importance of people management skills
The Brag List
The most difficult part of transitioning to management
Focusing on projects and setting milestones
Sadat’s transition from EM to data science management
How much domain knowledge is needed for management?
The main difference between engineering and management
How being an EM helped Sadat transition no DS management
53:32 Transitioning to DS management from other roles
How to feel accomplished as a manager
Sadat’s book recommendations
Sadat’s meetups
Links:
Sadat's Meetup page: https://www.meetup.com/berlin-search-technology-meetup/
Meetup event "Bias in AI: how to measure it and how to fix it event": https://www.meetup.com/data-driven-ai-berlin-meetup/events/289927565/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Dec 2, 2022 • 54min
Teaching and Mentoring in Data Analytics - Irina Brudaru
We talked about:
Irina’s background
Irina as a mentor
Designing curriculum and program management at AI Guild
Other things Irina taught at AI Guild
Why Irina likes teaching
Students’ reluctance to learn cloud
Irina as a manager
Cohort analysis in a nutshell
How Irina started teaching formally
Irina’s diversity project in the works
How DataTalks.Club can attract more female students to the Zoomcamps
How to get technical feedback at work
Antipatterns and overrated/overhyped topics in data analytics
Advice for young women who want to get into data science/engineering
Finding Irina online
Fundamentals for data analysts
Suggestions for DataTalks.club collaborations
Conclusions
Links:
LinkedIn Account: https://www.linkedin.com/in/irinabrudaru/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Nov 25, 2022 • 51min
Technical Writing and Data Journalism - Angelica Lo Duca
We talked about:
Angelica’s background
Angelica’s books
Data journalism
How Angelica got into data journalism
The field of digital humanities and Angelica’s data journalism course
Technical articles vs data journalism articles
Transforming reports into data storytelling
Are reports to stakeholders considered technical writing?
Data visualization in articles
Article length
The process of writing an article
Finding writing topics
How Angelica got into writing a book (communication with publishers)
The process for writing a book
Brainstorming
Reviews and revisions
Conclusion
Links:
Data Journalism examples (FENCED OUT): https://www.washingtonpost.com/graphics/world/border-barriers/europe-refugee-crisis-border-control/??noredirect=on
Data Journalism examples (La tierra esclava): https://latierraesclava.eldiario.es/
Small medium publication aiming at being Stack Overflow of Medium: https://medium.com/syntaxerrorpub
Example of a self-published book on Data Visualization: https://www.amazon.com/Introduction-Data-Visualization-Storytelling-Scientist-ebook/dp/B07VYCR3Z6/ref=sr_1_4?crid=4JRJ48O7K8TK&keywords=joses+berengueres&qid=1668270728&sprefix=joses+beremguere%2Caps%2C273&sr=8-4
My novels (in Italian) La bambina e il Clown: https://www.amazon.it/Bambina-Clown-Angelica-Lo-Duca/dp/1500984515/ref=sr_1_9?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=2KGK9GMN0FAHI&keywords=la+bambina+e+il+clown&qid=1668270769&sprefix=la+bambina+e+il+clown%2Caps%2C88&sr=8-9
My novels (in Italian) Il Violinista: https://www.amazon.it/Violinista-1-Angelica-Lo-Duca/dp/1501009672/ref=sr_1_1?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=12KTF9EF5UKIG&keywords=il+violinista+lo+duca&qid=1668270791&sprefix=il+violinista+lo+duca%2Caps%2C81&sr=8-1
Course on Data Journalism: https://www.coursera.org/learn/visualization-for-data-journalism
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Nov 18, 2022 • 47min
From Digital Marketing to Analytics Engineering - Nikola Maksimovic
We talked about:
Nikola’s background
Making the first steps towards a transition to BI and Analytics Engineering
Learning the skills necessary to transition to Analytics Engineering
The in-between period – from Marketing to Analytics Engineering
Nikola’s current responsibilities
Understanding what a Data Model is
Tools needed to work as an Analytics Engineer
The Analytics Engineering role over time
The importance of DBT for Analytics Engineers
Where can one learn about data modeling theory?
Going from Ancient Greek and Latin to understanding Data (Just-In-Time Learning)
The importance of having domain knowledge to analytics engineering
Suggestion for those wishing to transition into analytics engineering
The importance of having a mentor when transitioning
Finding a mentor
Helpful newsletters and blogs
Finding Nikola online
Links:
Nikola's LinkedIn account: https://www.linkedin.com/in/nikola-maksimovic-40188183/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Nov 11, 2022 • 54min
Product Owners in Data Science - Anna Hannemann
We talked about:
About Anna and METRO
Anna’s background
The importance of a technical background for data product owners
What are product owners?
Product owners vs product managers
Anna’s work on recommender systems at METRO
Expanding the data team
Types of algorithms used for recommender systems
What kind of knowledge and skills data product owners need to have
Problems and ideas should come from the business
How Anna handles all her responsibilities
The process for starting work on new domains
Product portfolio management
ProductTank and Anna’s role in it
Anna’s resource recommendations
Links:
Data Science for Business Book: https://www.amazon.de/-/en/Foster-Provost/dp/1449361323/ref=sr_1_1?keywords=data+science+for+business&qid=1666404807&qu=eyJxc2MiOiIxLjg3IiwicXNhIjoiMS41MiIsInFzcCI6IjEuNDYifQ%3D%3D&sr=8-1
Article on Data Science Products: https://www.linkedin.com/pulse/way-create-data-science-products-lessons-learnt-anna-hannemann-phd/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Nov 4, 2022 • 50min
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 28, 2022 • 53min
Large-Scale Entity Resolution - Sonal Goyal
We talked about:
Sonal’s background
How the idea for Zingg came about
What Zingg is
The difference between entity resolution and identity resolution
How duplicate detection relates to entity resolution
How Sonal decided to start working on Zingg
How Zingg works
What Zingg runs on
Switching from consultancy to working on a new open source solution
Why Zingg is open source
Open source licensing
Working on Zingg initially vs now
Zingg’s current and future team
Sonal’s biggest current challenge
Avoiding problems with entity/identity resolution through database design
Identity resolution vs basic joins, data fusions, and fuzzy joins
Deterministic matching vs probabilistic machine learning
Identity and entity resolution applications for fraud detection
Graph algorithms vs classic ML in entity resolution
Identity resolution success stories
What Sonal would do differently given the chance to start over with Zingg
Advice for those seeking to realize their own solution to a data problem
Reading suggestion from Sonal
Conclusion
Links:
Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html