

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Jun 10, 2022 • 48min
Getting a Data Engineering Job (Summary and Q&A) - Jeff Katz
We talked about:
Summary of “Getting a Data Engineering Job” webinar
Python and engineering skills
Interview process
Behavioral interviews
Technical interviews
Learning Python and SQL from scratch
Is having non-coding experience a disadvantage?
Analyst or engineer?
Do you need certificates?
Do I need a master’s degree?
Fully remote data engineering jobs
Should I include teaching on my resume?
Object-oriented programming for data engineering
Python vs Java/Scala
SQL and Python technical interview questions
GCP certificates
Is commercial experience really necessary?
From sales to engineering
Solution engineers
Wrapping up
Links:
Getting a Data Engineering Job (webinar): https://www.youtube.com/watch?v=yvEWG-S1F_M
The Flask Mega-Tutorial Part I - Hello, World! blog: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
Mode SQL Tutorial: https://mode.com/sql-tutorial/
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 3, 2022 • 53min
Using Data for Asteroid Mining - Daynan Crull
We talked about:
Daynan’s background
Astronomy vs cosmology
Applications of data science and machine learning in astronomy
Determining signal vs noise
What the data looks like in astronomy
Determining the features of an object in space
Ground truth for space objects
Why water is an important resource in the space economy
Other useful resources that can be found in asteroids
Sources of asteroids
The data team at an asteroid mining company
Open datasets for hobbyists
Mission and hardware design for asteroid mining
Partnerships and hires
Links:
LinkedIn: https://www.linkedin.com/in/daynan/
We're looking for a Sr Data Engineer: https://boards.eu.greenhouse.io/karmanplus/jobs/4027128101?gh_jid=4027128101
Minor Planet Center: https://minorplanetcenter.net/- JPL Horizons has a nice set of APIs for accessing data related to small bodies (including asteroids): https://ssd.jpl.nasa.gov/api.html
ESA has NEODyS: https://newton.spacedys.com/neodys
IRSA catalog that contains image and catalog data related to the WISE/NEOWISE data (and other infrared platforms): https://irsa.ipac.caltech.edu/frontpage/
NASA also has an archive of data collected from their various missions, including a node related to small bodies: https://pds-smallbodies.astro.umd.edu/
Sub-node directly related to asteroids: https://sbn.psi.edu/pds/
Size, Mass, and Density of Asteroids (SiMDA) is a nice catalog of observed asteroid attributes (and an indication of how small our sample size is!): https://astro.kretlow.de/?SiMDA
The source survey data, several are useful for asteroids: Pan-STARRS (https://outerspace.stsci.edu/display/PANSTARRS)
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 27, 2022 • 53min
Machine Learning in Marketing - Juan Orduz
We talked about:
Juan’s background
Typical problems in marketing that are solved with ML
Attribution model
Media Mix Model – detecting uplift and channel saturation
Changes to privacy regulations and its effect on user tracking
User retention and churn prevention
A/B testing to detect uplift
Statistical approach vs machine learning (setting a benchmark)
Does retraining MMM models often improve efficiency?
Attribution model baselines
Choosing a decay rate for channels (Bayesian linear regression)
Learning resource suggestions
Bayesian approach vs Frequentist approach
Suggestions for creating a marketing department
Most challenging problems in marketing
The importance of knowing marketing domain knowledge for data scientists
Juan’s blog and other learning resources
Finding Juan online
Links:
Juan's PyData talk on uplift modeling: https://youtube.com/watch?v=VWjsi-5yc3w
Juan's website: https://juanitorduz.github.io
Introduction to Algorithmic Marketing book: https://algorithmic-marketing.online
Preventing churn like a bandit: https://www.youtube.com/watch?v=n1uqeBNUlRM
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 20, 2022 • 49min
From Academia to Data Analytics and Engineering - Gloria Quiceno
We talked about:
Gloria’s background
Working with MATLAB, R, C, Python, and SQL
Working at ICE
Job hunting after the bootcamp
Data engineering vs Data science
Using Docker
Keeping track of job applications, employers and questions
Challenges during the job search and transition
Concerns over data privacy
Challenges with salary negotiation
The importance of career coaching and support
Skills learned at Spiced
Retrospective on Gloria’s transition to data and advice
Top skills that helped Gloria get the job
Thoughts on cloud platforms
Thoughts on bootcamps and courses
Spiced graduation project
Standing out in a sea of applicants
The cohorts at Spiced
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/gloria-quiceno/
Github: https://github.com/gdq12
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 13, 2022 • 53min
Teaching Data Engineers - Jeff Katz
We talked about:
Jeff’s background
Getting feedback to become a better teacher
Going from engineering to teaching
Jeff on becoming a curriculum writer
Creating a curriculum that reinforces learning
Jeff on starting his own data engineering bootcamp
Shifting from teaching ML and data science to teaching data engineering
Making sure that students get hired
Screening bootcamp applicants
Knowing when it’s time to apply for jobs
The curriculum of JigsawLabs.io
The market demand of Spark, Kafka, and Kubernetes (or lack thereof)
Advice for data analysts that want to move into data engineering
The market demand of ETL/ELT and DBT (or lack thereof)
The importance of Python, SQL, and data modeling for data engineering roles
Interview expectations
How to get started in teaching
The challenges of being a one-person company
Teaching fundamentals vs the “shiny new stuff”
JigsawLabs.io
Finding Jeff online
Links:
Jigsaw Labs: https://www.jigsawlabs.io/free
Teaching my mom to code: https://www.youtube.com/watch?v=OfWwfTXGjBM
Getting a Data Engineering Job Webinar with Jeff Katz: https://www.eventbrite.de/e/getting-a-data-engineering-job-tickets-310270877547
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 6, 2022 • 53min
From Roasting Coffee to Backend Development - Jessica Greene
We talked about:
Jessica’s background
Giving a talk at a tech conference about coffee
Jessica’s transition into tech (How to get started)
Going from learning to actually making money
Landing your first job in tech
Does your age matter when you’re trying to get a job?
Challenges that Jessica faced in the beginning of her career
Jessica’s role at PyLadies
Fighting the Imposter Syndrome
Generational differences in digital literacy and how to improve it
Events organized by PyLadies
Jessica’s beginnings at PyLadies (organizing events)
Jessica’s experience with public speaking
The impact of public speaking on your career
Tips for public speaking
Jessica’s work at Ecosia
Discrimination in the tech industry (and in general)
Finding Jessica online
Links:
Ecosia's website: https://www.ecosia.org/
Ecosia's blog: https://blog.ecosia.org/ecosia-financial-reports-tree-planting-receipts/
PyLadies Berlin: https://berlin.pyladies.com/
PyLadies' Meetup: https://meetup.com/PyLadies-Berlin
Code Academy: https://www.codecademy.com/
Freecodecamp: https://www.freecodecamp.org/
Coursera Machine Learning: https://www.coursera.org/learn/machine-learning
ML Bookcamp code: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Google Summer code: https://summerofcode.withgoogle.com/
Outreachy website: https://www.outreachy.org/
Alumni Interview: https://railsgirlssummerofcode.org/blog/2020-03-17-alumni-interview-jessica
Python pizza: https://python.pizza/
Pycon: https://pycon.it/en
Pycon 2022: https://2022.pycon.de/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 29, 2022 • 50min
Recruiting Data Engineers - Nicolas Rassam
We talked about:
Nicolas’ background
The tech talent market in different countries
Hiring data scientists vs data engineers
A spike in interest for data engineering roles
The importance of recruiters having technical knowledge
The main challenges of hiring data engineers
The difference in hiring junior, mid, and senior level data engineers
Things recruiters look for in people who switch to a data engineering role
The importance of knowing cloud tools
The importance of knowing infrastructure tools
Preparing for the interview
The importance of a formal education
The importance having a project portfolio
How your current domain influence the interview
Conclusion
Links:
Nicolas' Twitter: https://twitter.com/n_rassam
Nicolas' LinkedIn: https://www.linkedin.com/in/nicolasrassam/
Onfido is hiring: https://onfido.com/engineering-technology/
Interview with Alicja about recruiting data scientists: https://datatalks.club/podcast/s07e02-recruiting-data-professionals.html
Webinar "Getting a Data Engineering Job" with Jeff Katz: https://eventbrite.com/e/310270877547
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 22, 2022 • 52min
Storytime for DataOps - Christopher Bergh
We talked about:
Christopher’s background
The essence of DataOps
Also known as Agile Analytics Operations or DevOps for Data Science
Defining processes and automating them (defining “done” and “good”)
The balance between heroism and fear (avoiding deferred value)
The Lean approach
Avoiding silos
The 7 steps to DataOps
Wanting to become replaceable
DataOps is doable
Testing tools
DataOps vs MLOps
The Head Chef at Data Kitchen
What’s grilling at Data Kitchen?
The DataOps Cookbook
Links:
DataOps Manifesto website: https://dataopsmanifesto.org/en/
DataOps Cookbook: https://dataops.datakitchen.io/pf-cookbook
Recipes for DataOps Success: https://dataops.datakitchen.io/pf-recipes-for-dataops-success
DataOps Certification Course: https://info.datakitchen.io/training-certification-dataops-fundamentals
DataOps Blog: https://datakitchen.io/blog/
DataOps Maturity Model: https://datakitchen.io/dataops-maturity-model/
DataOps Webinars: https://datakitchen.io/webinars/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 15, 2022 • 52min
Machine Learning and Personalization in Healthcare - Stefan Gudmundsson
We talked about:
Stefan’s background
Applications of machine learning in healthcare
Sidekick Health – gamified therapeutics
How is working for King different from Sidekick Health?
The rewards systems in gamified apps
The importance of building a strong foundation for a data science team
The challenges of building an app in the healthcare industry
Dealing with ethics issues
Sidekick Health’s personalized recommendations and content
The importance of having the right approach in A/B tests (strong analytics and good data)
The importance of having domain knowledge to work as a data professional in the healthcare industry
Making a data-driven company
Risks for Sidekick Health
Sidekick Health growth strategy
Using AI to help people live better lives
Links:
LinkedIn: https://www.linkedin.com/in/stefanfreyrgudmundsson/
Job listings: https://sidekickhealth.bamboohr.com/jobs/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 8, 2022 • 56min
Innovation and Design for Machine Learning - Liesbeth Dingemans
We talked about:
Liesbeth’s background
What is design?
The importance of interaction in design
Design as a process (Double Diamond technique)
How long does it take to go from an idea to finishing the second diamond?
Design thinking (Google’s PAIR)
What is a Design Sprint and who should participate in it?
Why should data specialists care about design?
Challenging your task-giver (asking “why”)
How to avoid the “Chinese whisper game” (reiterating the problem)
Defining the roadmap for data science teams
What is innovation?
Bringing innovation to your management
Task force-team approach to solving problems
Innovation, resource management issues, and using data to back your ideas
Words of advice for those interested in design and innovation
Links:
LinkedIn: https://www.linkedin.com/in/liesbeth-dingemans/
Medium posts on design, innovation, art and AI: https://medium.com/@liesbethmd
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html