

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Jun 16, 2023 • 51min
Data Developer Relations - Hugo Bowne-Anderson
We talked about:
Hugo's background
Why do tools and the companies that run them have wildly different names
Hugo's other projects beside Metaflow
Transitioning from educator to DevRel
What is DevRel?
DevRel vs Marketing
How DevRel coordinates with developers
How DevRel coordinates with marketers
What skills a DevRel needs
The challenges that come with being an educator
Becoming a good writer: nature vs nurture
Hugo's approach to writing and suggestions
Establishing a goal for your content
Choosing a form of media for your content
Is DevRel intercompany or intracompany?
The Vanishing Gradients podcast
Finding Hugo online
Links:
Hugo Browne's github: http://hugobowne.github.io/
Vanishing Gradients: https://vanishinggradients.fireside.fm/
MLOps and DevOps: Why Data Makes It Differenthttps://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/
Evaluate Metaflow for free, right from your Browser: https://outerbounds.com/sandbox/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 9, 2023 • 51min
Lessons Learned from Freelancing and Working in a Start-up - Antonis Stellas
We talked about;
Antonis' background
The pros and cons of working for a startup
Useful skills for working at a startup and the Lean way to work
How Antonis joined the DataTalks.Club community
Suggestions for students joining the MLOps course
Antonis contributing to Evidently AI
How Antonis started freelancing
Getting your first clients on Upwork
Pricing your work as a freelancer
The process after getting approved by a client
Wearing many hats as a freelancer and while working at a startup
Other suggestions for getting clients as a freelancer
Antonis' thoughts on the Data Engineering course
Antonis' resource recommendations
Links:
Lean Startup by Eric Ries: https://theleanstartup.com/
Lean Analytics: https://leananalyticsbook.com/
Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
Kafka Streaming with python by Khris Jenkins tutorial video: https://youtu.be/jItIQ-UvFI4
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

13 snips
Jun 2, 2023 • 50min
Data Access Management - Bart Vandekerckhove
We talked about:
Bart's background
What is data governance?
Data dictionaries and data lineage
Data access management
How to learn about data governance
What skills are needed to do data governance effectively
When an organization needs to start thinking about data governance
Good data access management processes
Data masking and the importance of automating data access
DPO and CISO roles
How data access management works with a data mesh approach
Avoiding the role explosion problem
The importance of data governance integration in DataOps
Terraform as a stepping stone to data governance
How Raito can help an organization with data governance
Open-source data governance tools
Links:
LinkedIn: https://www.linkedin.com/in/bartvandekerckhove/
Twitter: https://twitter.com/Bart_H_VDK
Github: https://github.com/raito-io
Website: https://www.raito.io/
Data Mesh Learning Slack: https://data-mesh-learning.slack.com/join/shared_invite/zt-1qs976pm9-ci7lU8CTmc4QD5y4uKYtAA#/shared-invite/email
DataQG Website: https://dataqg.com/
DataQG Slack: https://dataqgcommunitygroup.slack.com/join/shared_invite/zt-12n0333gg-iTZAjbOBeUyAwWr8I~2qfg#/shared-invite/email
DMBOK (Data Management Book of Knowledge): https://www.dama.org/cpages/body-of-knowledge
DMBOK Wheel describing the data governance activities: https://www.dama.org/cpages/dmbok-2-wheel-images
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

11 snips
May 26, 2023 • 56min
Data Strategy: Key Principles and Best Practices - Boyan Angelov
We talked about:
Boyan's background
What is data strategy?
Due diligence and establishing a common goal
Designing a data strategy
Impact assessment, portfolio management, and DataOps
Data products
DataOps, Lean, and Agile
Data Strategist vs Data Science Strategist
The skills one needs to be a data strategist
How does one become a data strategist?
Data strategist as a translator
Transitioning from a Data Strategist role to a CTO
Using ChatGPT as a writing co-pilot
Using ChatGPT as a starting point
How ChatGPT can help in data strategy
Pitching a data strategy to a stakeholder
Setting baselines in a data strategy
Boyan's book recommendations
Links:
LinkedIn: https://www.linkedin.com/in/angelovboyan/
Twitter: https://twitter.com/thinking_code
Github: https://github.com/boyanangelov
Website: https://boyanangelov.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 19, 2023 • 58min
Practical Data Privacy - Katharine Jarmul
We talked about:
Katharine's background
Katharine's ML privacy startup
GDPR, CCPA, and the “opt-in as the default” approach
What is data privacy?
Finding Katharine's book – Practical Data Privacy
The various definitions of data privacy and “user profiles”
Privacy engineering and privacy-enhancing technologies
Why data privacy is important
What is differential privacy?
The importance of keeping privacy in mind when designing systems
Data privacy on the example of ChatGPT
Katharine's resource suggestions for learning about data privacy
Links:
LinkedIn: https://www.linkedin.com/in/katharinejarmul/
Twitter: https://twitter.com/kjam
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

May 12, 2023 • 51min
Building Scalable and Reliable Machine Learning Systems - Arseny Kravchenko
We talked about:
Arseny's background
Working on machine learning in startups
What is Machine Learning System Design?
Constraints and requirements
Known unknowns vs unknown unknowns (Design stage)
Writing a design document
Technical problems vs product-oriented problems
The solution part of the Design Document
What motivated Arseny to write a book on ML System Design
Examples of a Design Document in the book
The types of readers for ML System Design
Working with the co-author
Reacting to constraints and feedback when writing a book
Arseny's favorite chapter of the book
Other resources where you can learn about ML System Design
Twitter Giveaway
Links:
Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
Discount: poddatatalks21 (35% off)
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 21, 2023 • 56min
Building an Open-Source NLP Tool - Johannes Hötter
We talked about:
Johannes’s background
Johannes’s Open Source Spotlight demos – Refinery and Bricks
The difficulties of working with natural language processing (NLP)
Incorporating ChatGPT into a process as a heuristic
What is Bricks?
The process of starting a startup – Kern
Making the decision to go with open source
Pros and cons of launching as open source
Kern’s business model
Working with enterprises
Johannes as a salesperson
The team at Kern
Johannes’s role at Kern
How Johannes and Henrik separate responsibilities at Kern
Working with very niche use cases
The short story of how Kern got its funding
Johannes’s resource recommendation
Links:
Refinery's GitHub repo: https://github.com/code-kern-ai/refinery
Bricks' Github repo: https://github.com/code-kern-ai/bricks
Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U
Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg
Discord: https://discord.com/invite/qf4rGCEphW
Ker's Website: https://www.kern.ai
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Apr 14, 2023 • 53min
Navigating Industrial Data Challenges - Rosona Eldred
We talked about:
Rosona’s background
How mathematics knowledge helps in industry
What is industrial data?
Setting up an industrial process using blue paint
Internet companies’ data vs industrial data
Explaining industrial processes using packing peanuts
Why productive industry needs data
Measuring product qualities
How data specialists use industrial data
Defining and measuring sustainability
Using data in reactionary measures to changing regulations
Types of industrial data
Solving problems and optimizing with industrial data
Industrial solvers
Tiny data vs Big data in productive industry
The advantages of coming from academia into productive industry
Materials and resources for industrial data
Women in industry
Why Rosona decided to shift to industrial data
Links:
Kaggle dataset: https://www.kaggle.com/datasets/paresh2047/uci-semcom

Apr 7, 2023 • 51min
Mastering Self-Learning in Machine Learning - Aaisha Muhammad
We talked about:
Aaisha’s background
How homeschooling affects self-study
Deciding on what to learn about
Establishing whether a resource is good
How Aaisha focuses on learning
Deciding on what kind of project to build
Find research materials
Aaisha’s experience with the Data Talks Club ML Zoomcamp
ML Zoomcamp projects
Aaisha’s interest in bioinformatics
Keeping motivated with deadlines
Notes and time-tracking tools
Drawbacks to self-studying
Aaisha’s interest in machine learning
Aaisha’s least favorable part of ML Zoomcamp
Helping people as a way to learn
Using ChatGPT as a “study group”
Is it possible to use self-studying to learn high-level topics
Switching topics to avoid burnout
Aaisha’s resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/
Twitter: https://twitter.com/ZealousMushroom
Github: https://github.com/AaishaMuhammad
Website: http://www.aaishamuhammad.co.za/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Mar 31, 2023 • 49min
The Secret Sauce of Data Science Management - Shir Meir Lador
We talked about:
Shir’s background
Debrief culture
The responsibilities of a group manager
Defining the success of a DS manager
The three pillars of data science management
Managing up
Managing down
Managing across
Managing data science teams vs business teams
Scrum teams, brainstorming, and sprints
The most important skills and strategies for DS and ML managers
Making sure proof of concepts get into production
Links:
The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38
Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/
How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/
How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/
Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG
Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html