

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Apr 21, 2023 • 56min
Building an Open-Source NLP Tool - Johannes Hötter
We talked about:
Johannes’s background
Johannes’s Open Source Spotlight demos – Refinery and Bricks
The difficulties of working with natural language processing (NLP)
Incorporating ChatGPT into a process as a heuristic
What is Bricks?
The process of starting a startup – Kern
Making the decision to go with open source
Pros and cons of launching as open source
Kern’s business model
Working with enterprises
Johannes as a salesperson
The team at Kern
Johannes’s role at Kern
How Johannes and Henrik separate responsibilities at Kern
Working with very niche use cases
The short story of how Kern got its funding
Johannes’s resource recommendation
Links:
Refinery's GitHub repo: https://github.com/code-kern-ai/refinery
Bricks' Github repo: https://github.com/code-kern-ai/bricks
Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U
Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg
Discord: https://discord.com/invite/qf4rGCEphW
Ker's Website: https://www.kern.ai
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 14, 2023 • 53min
Navigating Industrial Data Challenges - Rosona Eldred
We talked about:
Rosona’s background
How mathematics knowledge helps in industry
What is industrial data?
Setting up an industrial process using blue paint
Internet companies’ data vs industrial data
Explaining industrial processes using packing peanuts
Why productive industry needs data
Measuring product qualities
How data specialists use industrial data
Defining and measuring sustainability
Using data in reactionary measures to changing regulations
Types of industrial data
Solving problems and optimizing with industrial data
Industrial solvers
Tiny data vs Big data in productive industry
The advantages of coming from academia into productive industry
Materials and resources for industrial data
Women in industry
Why Rosona decided to shift to industrial data
Links:
Kaggle dataset: https://www.kaggle.com/datasets/paresh2047/uci-semcom

Apr 7, 2023 • 51min
Mastering Self-Learning in Machine Learning - Aaisha Muhammad
We talked about:
Aaisha’s background
How homeschooling affects self-study
Deciding on what to learn about
Establishing whether a resource is good
How Aaisha focuses on learning
Deciding on what kind of project to build
Find research materials
Aaisha’s experience with the Data Talks Club ML Zoomcamp
ML Zoomcamp projects
Aaisha’s interest in bioinformatics
Keeping motivated with deadlines
Notes and time-tracking tools
Drawbacks to self-studying
Aaisha’s interest in machine learning
Aaisha’s least favorable part of ML Zoomcamp
Helping people as a way to learn
Using ChatGPT as a “study group”
Is it possible to use self-studying to learn high-level topics
Switching topics to avoid burnout
Aaisha’s resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/
Twitter: https://twitter.com/ZealousMushroom
Github: https://github.com/AaishaMuhammad
Website: http://www.aaishamuhammad.co.za/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Mar 31, 2023 • 49min
The Secret Sauce of Data Science Management - Shir Meir Lador
We talked about:
Shir’s background
Debrief culture
The responsibilities of a group manager
Defining the success of a DS manager
The three pillars of data science management
Managing up
Managing down
Managing across
Managing data science teams vs business teams
Scrum teams, brainstorming, and sprints
The most important skills and strategies for DS and ML managers
Making sure proof of concepts get into production
Links:
The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38
Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/
How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/
How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/
Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG
Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Mar 24, 2023 • 54min
SE4ML - Software Engineering for Machine Learning - Nadia Nahar
We talked about:
Nadia’s background
Academic research in software engineering
Design patterns
Software engineering for ML systems
Problems that people in industry have with software engineering and ML
Communication issues and setting requirements
Artifact research in open source products
Product vs model
Nadia’s open source product dataset
Failure points in machine learning projects
Finding solutions to issues using Nadia’s dataset and experience
The problem of siloing data scientists and other structure issues
The importance of documentation and checklists
Responsible AI
How data scientists and software engineers can work in an Agile way
Links:
Model Card: https://arxiv.org/abs/1810.03993
Datasheets: https://arxiv.org/abs/1803.09010
Factsheets: https://arxiv.org/abs/1808.07261
Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf
Arxiv version: https://arxiv.org/pdf/2110.
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Mar 17, 2023 • 52min
Starting a Consultancy in the Data Space - Aleksander Kruszelnicki
We talked about:
Aleksander’s background
The difficulty of selling data stack as a service
How Aleksander got into consulting
The Mom Test – extracting feedback from people
User interviews
Why Aleksander’s data stack as a service startup was not viable
How Aleksander decided to switch to consulting
Finding clients to consult
Figuring out how to position your services
Geographical limitations
Figuring out your target audience
The importance of networking and marketing
Pricing your services
The pitfalls of daily and hourly pricing and how to balance incentives
Is Germany a good place to found a company?
Aleksander’s book recommendations
Links:
LinkedIn: https://www.linkedin.com/in/alkrusz/
Twitter: https://twitter.com/alkrusz
Website: www.leukos.io
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Mar 10, 2023 • 53min
Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin
We talked about:
Ruslan’s background
Fighting procrastination and perfectionism
What is biohacking?
The role of dopamine and other hormones in daily life
How meditation can help
The influence light has on our bodies
Behavioral biohacking
Daylight lamps and using light to wake up
Sleep cycles
How nutrition affects productivity
Measuring productivity
Examples of unsuccessful biohacking attempts
Stoicism, voluntary discomfort, and self-challenges
Biohacking risks and ways to prevent them
Coffee and tea biohacking
Using self-reflection and tracking to measure results
Mindset shifting
Stoicism book recommendation
Work/life balance
Ruslan’s biohacking resource recommendation
Links:
LinkedIn: https://www.linkedin.com/in/ruslanshchuchkin/
ree data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Mar 3, 2023 • 55min
Analytics for a Better World - Parvathy Krishnan
We talked about:
Parvathy’s background
Brainstorming sessions with nonprofits to establish data maturity
Example of an Analytics for a Better World project
The overall data maturity situation of nonprofits vs private sector
Solving the skill gap
Publicly available content
The Analytics for a Better World Academy
The Academy’s target audience
How researchers can work with Analytics for a Better World
Improving data maturity in nonprofit organizations
People, processes, and technology
Typical tools that Analytics for a Better World recommends to nonprofits
Profiles in nonprofits
Does Analytics for a Better World has a need for data engineers?
The Analytics for a Better World team
Factors that help organizations become more data-driven
Parvathy’s resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/parvathykrishnank/
Twitter: https://twitter.com/ABWInstitute
Github: https://github.com/Analytics-for-a-Better-World
Website: https://analyticsbetterworld.org/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Feb 24, 2023 • 57min
Accelerating the Adoption of AI through Diversity - Dânia Meira
We talked about:
Dania’s background
Founding the AI Guild
Datalift Summit
Coming up with meetup topics
Diversity in Berlin
Other types of diversity besides gender
The pitfalls of lacking diversity
Creating an environment where people can safely share their experiences
How the AI Guild helps organizations become more diverse
How the AI guild finds women in the fields of AI and data science
Advice for people in underrepresented groups
Organizing a welcoming environment and creating a code of conduct
AI Guild’s consulting work and community
AI Guild team
Dania’s resource recommendations
Upcoming Datalift Summit
Links:
Call for Speakers for the #datalift summit (Berlin, 14 to 16 June 2023): https://eu1.hubs.ly/H02RXvX0
Coded Bias documentary on Netflix: https://www.netflix.com/de/title/81328723#:~:text=This%20documentary%20investigates%20the%20bias,flaws%20in%20facial%20recognition%20technology.
Book Weapons of Math Destruction by Cathy O'Neil: https://en.wikipedia.org/wiki/Weapons_of_Math_Destruction
Book Lean In by Sheryl Sandberg: https://en.wikipedia.org/wiki/Lean_In
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Feb 17, 2023 • 55min
Staff AI Engineer - Tatiana Gabruseva
We talked about:
Tatiana’s background
Going from academia to healthcare to the tech industry
What staff engineers do
Transferring skills from academia to industry and learning new ones
The importance of having mentors
Skipping junior and mid-level straight into the staff role
Convincing employers that you can take on a lead role
Seeing failure as a learning opportunity
Preparing for coding interviews
Preparing for behavioral and system design interviews
The importance of having a network and doing mock interviews
How much do staff engineers work with building pipelines, data science, ETC, MPOps, etc.?
Context switching
Advice for those going from academia to industry
The most exciting thing about working as an AI staff engineer
Tatiana’s book recommendations
Links:
LinkedIn: https://www.linkedin.com/in/tatigabru/
Twitter: https://twitter.com/tatigabru
Github: https://github.com/tatigabru
Website: http://tatigabru.com/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html


