
DataTalks.Club
DataTalks.Club - the place to talk about data!
Latest episodes

Jul 22, 2022 • 53min
Hiring Data Science Talent - Olga Ivina
We talked about:
Olga’s career journey
Hiring data scientists now vs 7 years ago
The two qualities of an excellent data scientist
What makes Alexey do this podcast
How Alexey get the latest information on data science
How Olga checks a candidate’s technical skills
How to make an answer stand out (showing your depth of knowledge)
A strong mathematical background vs a strong engineering background
When Auto ML will replace the need to have data scientists
Should data scientists transition into management? (the importance of communication in an organization)
Switching from a data analyst role to a data scientist
Attracting female talent in data science
Changing a job description to find talent
Long gaps in the CV
Eierlegende Wollmilchsau
Links:
Olga's LinkedIn: https://www.linkedin.com/in/olgaivina/
Olga's Twitter: https://twitter.com/olgaivina
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 15, 2022 • 50min
From Open-Source Maintainer to Founder - Will McGugan
We talked about:
Will’s background
Will’s open source projects
S3Fs and PyFile systems
Inspiration for open source projects
Will as a freelancer
Starting a company from a tweet (Rich and Textual)
Building in public (Will’s approach to social media)
The workforce and roadmap of Textualize.io
The importance of working on open source for Textualize employees
The workflow of and contributions to Textualize
Getting your first thousand GitHub Stars (going viral)
Suggestions for those who wish to start in the open-source space
Finding Will online
Links:
Twitter: https://twitter.com/willmcgugan
Textualize website: https://www.textualize.io/
Textualize GitHub: https://github.com/textualize
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 8, 2022 • 51min
Designing a Data Science Organization - Lisa Cohen
We talked about:
Lisa’s background
Centralized org vs decentralized org
Hybrid org (centralized/decentralized)
Reporting your results in a data organization
Planning in a data organization
Having all the moving parts work towards the same goals
Which approach Twitter follows (centralized vs decentralized)
Pros and cons of a decentralized approach
Pros and cons of a centralized approach
Finding a common language with all the functions of an org
Finding the right approach for companies that want to implement data science
How many data scientists does a company need?
Who do data scientists report huge findings to?
The importance of partnering closely with other functions of the org
The role of Product Managers in the org and across functions
Who does analytics at Twitter (analysts vs data scientists)
The importance of goals, objectives and key results
Conflicting objectives
The importance of research
Finding Lisa online
Links:
LinkedIn: https://www.linkedin.com/in/cohenlisa/
Twitter: https://twitter.com/lisafeig
Medium: https://medium.com/@lisa_cohen
Lisa Cohen's YouTube videos: https://www.youtube.com/playlist?list=PLRhmnnfr2bX7-GAPHzvfUeIEt2iYCbI3w
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 1, 2022 • 51min
Developer Advocacy Engineer for Open-Source - Merve Noyan
We talked about:
Merve’s background
Merve’s first contributions to open source
What Merve currently does at Hugging Face (Hub, Spaces)
What is means to be a developer advocacy engineer at Hugging Face
The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints)
The peculiarities of hiring as it relates to code contributions
Best resources to learn about NLP besides Hugging Face
Good first projects for NLP
The most important topics in NLP right now
NLP ML Engineer vs NLP Data Scientist
Project recommendations and other advice to catch the eye of recruiters
Merve on Twitch and her podcast
Finding Merve online
Merve and Mario Kart
Links:
Hugging Face Course: https://hf.co/course
Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow
Github ML Poetry: https://github.com/merveenoyan/ML-poetry
Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp
Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 24, 2022 • 58min
Data Scientists at Work - Mısra Turp
We talked about:
Misra’s background
What data scientists do
Consultant data scientists vs in-house data scientists (and freelancers)
Expectations for data scientists
The importance of keeping up to date with AI developments (FOMA)
How does DALL·E 2 work and should you care?
Going to conferences to stay up to date
The most pressing issue for data scientists
Fighting FOMA and imposter syndrome
Knowing when you have enough knowledge of a framework
The “best” type of data scientist
Being a generalist vs a specialist
Advice for entry-level data entering an oversaturated market
Catching the eye of big AI companies
Choosing a project for your portfolio
The importance of having a Ph.D. or Master’s degree in data science
Finding Misra online
Links:
Mısra's YouTube channel: https://www.youtube.com/channel/UCpNUYWW0kiqyh0j5Qy3aU7w
Twitter: https://twitter.com/misraturp
Hands-on Data Science: Complete Your First Portfolio Project: https://www.soyouwanttobeadatascientist.com/hods
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.htm

Jun 17, 2022 • 52min
Freelancing and Consulting with Data Engineering - Adrian Brudaru
We talked about:
Adrian’s background
Freelancing vs Employment
Risk and occupancy rate in freelancing
The scariest part of freelancing
Adrian’s first projects
Freelancing 5 years later
Pay rates in freelancing
Acquiring skills while freelancing
Working with recruitment agencies and networking
Looking for projects and getting clients
Freelancing vs consulting
Clarity in clients’ expectations (scope of work)
Building your network
Freelancing platforms
Adrian’s data loading prototype
Going from freelancing to making your own product (and other investments)
The usefulness of a portfolio
Introverts in freelancing
Is it possible to work for 3 months a year in freelancing?
Choosing projects and skill-building strategy (focusing on interests)
Freelancing in Berlin
Clients’ expectations for freelancers vs employees
Working with more than one client at the same time
Adrian’s freelance cooperative on Slack
Other advice for novice freelancers (networking)
Finding Adrian online
Links:
Github: https://github.com/scale-vector
Slack Community: https://join.slack.com/t/berlindatacol-szn7050/shared_invite/zt-19dp8msp0-pP4Av3_fVFBbsdrzPROEAg
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 10, 2022 • 48min
Getting a Data Engineering Job (Summary and Q&A) - Jeff Katz
We talked about:
Summary of “Getting a Data Engineering Job” webinar
Python and engineering skills
Interview process
Behavioral interviews
Technical interviews
Learning Python and SQL from scratch
Is having non-coding experience a disadvantage?
Analyst or engineer?
Do you need certificates?
Do I need a master’s degree?
Fully remote data engineering jobs
Should I include teaching on my resume?
Object-oriented programming for data engineering
Python vs Java/Scala
SQL and Python technical interview questions
GCP certificates
Is commercial experience really necessary?
From sales to engineering
Solution engineers
Wrapping up
Links:
Getting a Data Engineering Job (webinar): https://www.youtube.com/watch?v=yvEWG-S1F_M
The Flask Mega-Tutorial Part I - Hello, World! blog: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
Mode SQL Tutorial: https://mode.com/sql-tutorial/
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 3, 2022 • 53min
Using Data for Asteroid Mining - Daynan Crull
We talked about:
Daynan’s background
Astronomy vs cosmology
Applications of data science and machine learning in astronomy
Determining signal vs noise
What the data looks like in astronomy
Determining the features of an object in space
Ground truth for space objects
Why water is an important resource in the space economy
Other useful resources that can be found in asteroids
Sources of asteroids
The data team at an asteroid mining company
Open datasets for hobbyists
Mission and hardware design for asteroid mining
Partnerships and hires
Links:
LinkedIn: https://www.linkedin.com/in/daynan/
We're looking for a Sr Data Engineer: https://boards.eu.greenhouse.io/karmanplus/jobs/4027128101?gh_jid=4027128101
Minor Planet Center: https://minorplanetcenter.net/- JPL Horizons has a nice set of APIs for accessing data related to small bodies (including asteroids): https://ssd.jpl.nasa.gov/api.html
ESA has NEODyS: https://newton.spacedys.com/neodys
IRSA catalog that contains image and catalog data related to the WISE/NEOWISE data (and other infrared platforms): https://irsa.ipac.caltech.edu/frontpage/
NASA also has an archive of data collected from their various missions, including a node related to small bodies: https://pds-smallbodies.astro.umd.edu/
Sub-node directly related to asteroids: https://sbn.psi.edu/pds/
Size, Mass, and Density of Asteroids (SiMDA) is a nice catalog of observed asteroid attributes (and an indication of how small our sample size is!): https://astro.kretlow.de/?SiMDA
The source survey data, several are useful for asteroids: Pan-STARRS (https://outerspace.stsci.edu/display/PANSTARRS)
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 27, 2022 • 53min
Machine Learning in Marketing - Juan Orduz
We talked about:
Juan’s background
Typical problems in marketing that are solved with ML
Attribution model
Media Mix Model – detecting uplift and channel saturation
Changes to privacy regulations and its effect on user tracking
User retention and churn prevention
A/B testing to detect uplift
Statistical approach vs machine learning (setting a benchmark)
Does retraining MMM models often improve efficiency?
Attribution model baselines
Choosing a decay rate for channels (Bayesian linear regression)
Learning resource suggestions
Bayesian approach vs Frequentist approach
Suggestions for creating a marketing department
Most challenging problems in marketing
The importance of knowing marketing domain knowledge for data scientists
Juan’s blog and other learning resources
Finding Juan online
Links:
Juan's PyData talk on uplift modeling: https://youtube.com/watch?v=VWjsi-5yc3w
Juan's website: https://juanitorduz.github.io
Introduction to Algorithmic Marketing book: https://algorithmic-marketing.online
Preventing churn like a bandit: https://www.youtube.com/watch?v=n1uqeBNUlRM
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 20, 2022 • 49min
From Academia to Data Analytics and Engineering - Gloria Quiceno
We talked about:
Gloria’s background
Working with MATLAB, R, C, Python, and SQL
Working at ICE
Job hunting after the bootcamp
Data engineering vs Data science
Using Docker
Keeping track of job applications, employers and questions
Challenges during the job search and transition
Concerns over data privacy
Challenges with salary negotiation
The importance of career coaching and support
Skills learned at Spiced
Retrospective on Gloria’s transition to data and advice
Top skills that helped Gloria get the job
Thoughts on cloud platforms
Thoughts on bootcamps and courses
Spiced graduation project
Standing out in a sea of applicants
The cohorts at Spiced
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/gloria-quiceno/
Github: https://github.com/gdq12
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html