

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Jul 8, 2022 • 51min
Designing a Data Science Organization - Lisa Cohen
We talked about:
Lisa’s background
Centralized org vs decentralized org
Hybrid org (centralized/decentralized)
Reporting your results in a data organization
Planning in a data organization
Having all the moving parts work towards the same goals
Which approach Twitter follows (centralized vs decentralized)
Pros and cons of a decentralized approach
Pros and cons of a centralized approach
Finding a common language with all the functions of an org
Finding the right approach for companies that want to implement data science
How many data scientists does a company need?
Who do data scientists report huge findings to?
The importance of partnering closely with other functions of the org
The role of Product Managers in the org and across functions
Who does analytics at Twitter (analysts vs data scientists)
The importance of goals, objectives and key results
Conflicting objectives
The importance of research
Finding Lisa online
Links:
LinkedIn: https://www.linkedin.com/in/cohenlisa/
Twitter: https://twitter.com/lisafeig
Medium: https://medium.com/@lisa_cohen
Lisa Cohen's YouTube videos: https://www.youtube.com/playlist?list=PLRhmnnfr2bX7-GAPHzvfUeIEt2iYCbI3w
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 1, 2022 • 51min
Developer Advocacy Engineer for Open-Source - Merve Noyan
We talked about:
Merve’s background
Merve’s first contributions to open source
What Merve currently does at Hugging Face (Hub, Spaces)
What is means to be a developer advocacy engineer at Hugging Face
The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints)
The peculiarities of hiring as it relates to code contributions
Best resources to learn about NLP besides Hugging Face
Good first projects for NLP
The most important topics in NLP right now
NLP ML Engineer vs NLP Data Scientist
Project recommendations and other advice to catch the eye of recruiters
Merve on Twitch and her podcast
Finding Merve online
Merve and Mario Kart
Links:
Hugging Face Course: https://hf.co/course
Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow
Github ML Poetry: https://github.com/merveenoyan/ML-poetry
Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp
Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 24, 2022 • 58min
Data Scientists at Work - Mısra Turp
We talked about:
Misra’s background
What data scientists do
Consultant data scientists vs in-house data scientists (and freelancers)
Expectations for data scientists
The importance of keeping up to date with AI developments (FOMA)
How does DALL·E 2 work and should you care?
Going to conferences to stay up to date
The most pressing issue for data scientists
Fighting FOMA and imposter syndrome
Knowing when you have enough knowledge of a framework
The “best” type of data scientist
Being a generalist vs a specialist
Advice for entry-level data entering an oversaturated market
Catching the eye of big AI companies
Choosing a project for your portfolio
The importance of having a Ph.D. or Master’s degree in data science
Finding Misra online
Links:
Mısra's YouTube channel: https://www.youtube.com/channel/UCpNUYWW0kiqyh0j5Qy3aU7w
Twitter: https://twitter.com/misraturp
Hands-on Data Science: Complete Your First Portfolio Project: https://www.soyouwanttobeadatascientist.com/hods
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.htm

Jun 17, 2022 • 52min
Freelancing and Consulting with Data Engineering - Adrian Brudaru
We talked about:
Adrian’s background
Freelancing vs Employment
Risk and occupancy rate in freelancing
The scariest part of freelancing
Adrian’s first projects
Freelancing 5 years later
Pay rates in freelancing
Acquiring skills while freelancing
Working with recruitment agencies and networking
Looking for projects and getting clients
Freelancing vs consulting
Clarity in clients’ expectations (scope of work)
Building your network
Freelancing platforms
Adrian’s data loading prototype
Going from freelancing to making your own product (and other investments)
The usefulness of a portfolio
Introverts in freelancing
Is it possible to work for 3 months a year in freelancing?
Choosing projects and skill-building strategy (focusing on interests)
Freelancing in Berlin
Clients’ expectations for freelancers vs employees
Working with more than one client at the same time
Adrian’s freelance cooperative on Slack
Other advice for novice freelancers (networking)
Finding Adrian online
Links:
Github: https://github.com/scale-vector
Slack Community: https://join.slack.com/t/berlindatacol-szn7050/shared_invite/zt-19dp8msp0-pP4Av3_fVFBbsdrzPROEAg
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 10, 2022 • 48min
Getting a Data Engineering Job (Summary and Q&A) - Jeff Katz
We talked about:
Summary of “Getting a Data Engineering Job” webinar
Python and engineering skills
Interview process
Behavioral interviews
Technical interviews
Learning Python and SQL from scratch
Is having non-coding experience a disadvantage?
Analyst or engineer?
Do you need certificates?
Do I need a master’s degree?
Fully remote data engineering jobs
Should I include teaching on my resume?
Object-oriented programming for data engineering
Python vs Java/Scala
SQL and Python technical interview questions
GCP certificates
Is commercial experience really necessary?
From sales to engineering
Solution engineers
Wrapping up
Links:
Getting a Data Engineering Job (webinar): https://www.youtube.com/watch?v=yvEWG-S1F_M
The Flask Mega-Tutorial Part I - Hello, World! blog: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
Mode SQL Tutorial: https://mode.com/sql-tutorial/
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 3, 2022 • 53min
Using Data for Asteroid Mining - Daynan Crull
We talked about:
Daynan’s background
Astronomy vs cosmology
Applications of data science and machine learning in astronomy
Determining signal vs noise
What the data looks like in astronomy
Determining the features of an object in space
Ground truth for space objects
Why water is an important resource in the space economy
Other useful resources that can be found in asteroids
Sources of asteroids
The data team at an asteroid mining company
Open datasets for hobbyists
Mission and hardware design for asteroid mining
Partnerships and hires
Links:
LinkedIn: https://www.linkedin.com/in/daynan/
We're looking for a Sr Data Engineer: https://boards.eu.greenhouse.io/karmanplus/jobs/4027128101?gh_jid=4027128101
Minor Planet Center: https://minorplanetcenter.net/- JPL Horizons has a nice set of APIs for accessing data related to small bodies (including asteroids): https://ssd.jpl.nasa.gov/api.html
ESA has NEODyS: https://newton.spacedys.com/neodys
IRSA catalog that contains image and catalog data related to the WISE/NEOWISE data (and other infrared platforms): https://irsa.ipac.caltech.edu/frontpage/
NASA also has an archive of data collected from their various missions, including a node related to small bodies: https://pds-smallbodies.astro.umd.edu/
Sub-node directly related to asteroids: https://sbn.psi.edu/pds/
Size, Mass, and Density of Asteroids (SiMDA) is a nice catalog of observed asteroid attributes (and an indication of how small our sample size is!): https://astro.kretlow.de/?SiMDA
The source survey data, several are useful for asteroids: Pan-STARRS (https://outerspace.stsci.edu/display/PANSTARRS)
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 27, 2022 • 53min
Machine Learning in Marketing - Juan Orduz
We talked about:
Juan’s background
Typical problems in marketing that are solved with ML
Attribution model
Media Mix Model – detecting uplift and channel saturation
Changes to privacy regulations and its effect on user tracking
User retention and churn prevention
A/B testing to detect uplift
Statistical approach vs machine learning (setting a benchmark)
Does retraining MMM models often improve efficiency?
Attribution model baselines
Choosing a decay rate for channels (Bayesian linear regression)
Learning resource suggestions
Bayesian approach vs Frequentist approach
Suggestions for creating a marketing department
Most challenging problems in marketing
The importance of knowing marketing domain knowledge for data scientists
Juan’s blog and other learning resources
Finding Juan online
Links:
Juan's PyData talk on uplift modeling: https://youtube.com/watch?v=VWjsi-5yc3w
Juan's website: https://juanitorduz.github.io
Introduction to Algorithmic Marketing book: https://algorithmic-marketing.online
Preventing churn like a bandit: https://www.youtube.com/watch?v=n1uqeBNUlRM
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 20, 2022 • 49min
From Academia to Data Analytics and Engineering - Gloria Quiceno
We talked about:
Gloria’s background
Working with MATLAB, R, C, Python, and SQL
Working at ICE
Job hunting after the bootcamp
Data engineering vs Data science
Using Docker
Keeping track of job applications, employers and questions
Challenges during the job search and transition
Concerns over data privacy
Challenges with salary negotiation
The importance of career coaching and support
Skills learned at Spiced
Retrospective on Gloria’s transition to data and advice
Top skills that helped Gloria get the job
Thoughts on cloud platforms
Thoughts on bootcamps and courses
Spiced graduation project
Standing out in a sea of applicants
The cohorts at Spiced
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/gloria-quiceno/
Github: https://github.com/gdq12
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 13, 2022 • 53min
Teaching Data Engineers - Jeff Katz
We talked about:
Jeff’s background
Getting feedback to become a better teacher
Going from engineering to teaching
Jeff on becoming a curriculum writer
Creating a curriculum that reinforces learning
Jeff on starting his own data engineering bootcamp
Shifting from teaching ML and data science to teaching data engineering
Making sure that students get hired
Screening bootcamp applicants
Knowing when it’s time to apply for jobs
The curriculum of JigsawLabs.io
The market demand of Spark, Kafka, and Kubernetes (or lack thereof)
Advice for data analysts that want to move into data engineering
The market demand of ETL/ELT and DBT (or lack thereof)
The importance of Python, SQL, and data modeling for data engineering roles
Interview expectations
How to get started in teaching
The challenges of being a one-person company
Teaching fundamentals vs the “shiny new stuff”
JigsawLabs.io
Finding Jeff online
Links:
Jigsaw Labs: https://www.jigsawlabs.io/free
Teaching my mom to code: https://www.youtube.com/watch?v=OfWwfTXGjBM
Getting a Data Engineering Job Webinar with Jeff Katz: https://www.eventbrite.de/e/getting-a-data-engineering-job-tickets-310270877547
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 6, 2022 • 53min
From Roasting Coffee to Backend Development - Jessica Greene
We talked about:
Jessica’s background
Giving a talk at a tech conference about coffee
Jessica’s transition into tech (How to get started)
Going from learning to actually making money
Landing your first job in tech
Does your age matter when you’re trying to get a job?
Challenges that Jessica faced in the beginning of her career
Jessica’s role at PyLadies
Fighting the Imposter Syndrome
Generational differences in digital literacy and how to improve it
Events organized by PyLadies
Jessica’s beginnings at PyLadies (organizing events)
Jessica’s experience with public speaking
The impact of public speaking on your career
Tips for public speaking
Jessica’s work at Ecosia
Discrimination in the tech industry (and in general)
Finding Jessica online
Links:
Ecosia's website: https://www.ecosia.org/
Ecosia's blog: https://blog.ecosia.org/ecosia-financial-reports-tree-planting-receipts/
PyLadies Berlin: https://berlin.pyladies.com/
PyLadies' Meetup: https://meetup.com/PyLadies-Berlin
Code Academy: https://www.codecademy.com/
Freecodecamp: https://www.freecodecamp.org/
Coursera Machine Learning: https://www.coursera.org/learn/machine-learning
ML Bookcamp code: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Google Summer code: https://summerofcode.withgoogle.com/
Outreachy website: https://www.outreachy.org/
Alumni Interview: https://railsgirlssummerofcode.org/blog/2020-03-17-alumni-interview-jessica
Python pizza: https://python.pizza/
Pycon: https://pycon.it/en
Pycon 2022: https://2022.pycon.de/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html


