
DataTalks.Club
DataTalks.Club - the place to talk about data!
Latest episodes

Jul 28, 2023 • 55min
LLMs for Everyone - Meryem Arik
We talked about:
Meryam's background
The constant evolution of startups
How Meryam became interested in LLMs
What is an LLM (generative vs non-generative models)?
Why LLMs are important
Open source models vs API models
What TitanML does
How fine-tuning a model helps in LLM use cases
Fine-tuning generative models
How generative models change the landscape of human work
How to adjust models over time
Vector databases and LLMs
How to choose an open source LLM or an API
Measuring input data quality
Meryam's resource recommendations
Links:
Website: https://www.titanml.co/
Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml...
Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57
Discord: https://discord.gg/83RmHTjZgf
Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 21, 2023 • 55min
Investing in Open-Source Data Tools - Bela Wiertz
Bela Wiertz, investor in open-source data tools, talks about the viability of open source as a go-to-market strategy, the differences between angel investors, VC funds, and family offices, and the use of GitHub stars as a metric for investment. They also discuss the future of open source, recent successes of open source companies, and Bela's resource recommendations.

Jul 14, 2023 • 51min
Why Machine Learning Design is Broken - Valerii Babushkin
Links:
Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
Discount: poddatatalks21 (35% off)
Evidently: https://www.evidentlyai.com/
Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 7, 2023 • 53min
Interpretable AI and ML - Polina Mosolova
We talked about:
Polina's background
How common it is for PhD students to build ML pipelines end-to-end
Simultaneous PhD and industry experience
Support from both the academic and industry sides
How common the industrial PhD setup is and how to get into one
Organizational trust theory
How price relates to trust
How trust relates to explainability
The importance of actionability
Explainability vs interpretability vs actionability
Complex glass box models
Does the explainability of a model follow explainability?
What explainable AI bring to customers and end users
Can all trust be turned into KPI?
Links:
LinkedIn: https://www.linkedin.com/in/polina-mosolova/
Neural Additive Models paper: https://proceedings.neurips.cc/paper/2021/file/251bd0442dfcc53b5a761e050f8022b8-Paper.pdf
Neural Basis Model paper: https://arxiv.org/pdf/2205.14120.pdf
Interpretable Feature Spaces paper: https://kdd.org/exploration_files/vol24issue1_1._Interpretable_Feature_Spaces_revised.pdf

Jun 30, 2023 • 54min
From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner
We talked about:
Simon's background
What MLOps is and what it isn't
Skills needed to build an ML platform that serves 100s of models
Ranking the importance of skills
The point where you should think about building an ML platform
The importance of processes in ML platforms
Weighing your options with SaaS platforms
The exploratory setup, experiment tracking, and model registry
What comes after deployment?
Stitching tools together to create an ML platform
Keeping data governance in mind when building a platform
What comes first – the model or the platform?
Do MLOps engineers need to have deep knowledge of how models work?
Is API design important for MLOps?
Simon's recommendations for furthering MLOps knowledge
Links:
LinkedIn: https://www.linkedin.com/in/simonstiebellehner/
Github: https://github.com/stiebels
Medium: https://medium.com/@sistel
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 23, 2023 • 53min
From MLOps to DataOps - Santona Tuli
We talked about:
Santona's background
Focusing on data workflows
Upsolver vs DBT
ML pipelines vs Data pipelines
MLOps vs DataOps
Tools used for data pipelines and ML pipelines
The “modern data stack” and today's data ecosystem
Staging the data and the concept of a “lakehouse”
Transforming the data after staging
What happens after the modeling phase
Human-centric vs Machine-centric pipeline
Applying skills learned in academia to ML engineering
Crafting user personas based on real stories
A framework of curiosity
Santona's book and resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/santona-tuli/
Upsolver website: upsolver.com
Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 16, 2023 • 51min
Data Developer Relations - Hugo Bowne-Anderson
We talked about:
Hugo's background
Why do tools and the companies that run them have wildly different names
Hugo's other projects beside Metaflow
Transitioning from educator to DevRel
What is DevRel?
DevRel vs Marketing
How DevRel coordinates with developers
How DevRel coordinates with marketers
What skills a DevRel needs
The challenges that come with being an educator
Becoming a good writer: nature vs nurture
Hugo's approach to writing and suggestions
Establishing a goal for your content
Choosing a form of media for your content
Is DevRel intercompany or intracompany?
The Vanishing Gradients podcast
Finding Hugo online
Links:
Hugo Browne's github: http://hugobowne.github.io/
Vanishing Gradients: https://vanishinggradients.fireside.fm/
MLOps and DevOps: Why Data Makes It Differenthttps://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/
Evaluate Metaflow for free, right from your Browser: https://outerbounds.com/sandbox/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 9, 2023 • 51min
Lessons Learned from Freelancing and Working in a Start-up - Antonis Stellas
We talked about;
Antonis' background
The pros and cons of working for a startup
Useful skills for working at a startup and the Lean way to work
How Antonis joined the DataTalks.Club community
Suggestions for students joining the MLOps course
Antonis contributing to Evidently AI
How Antonis started freelancing
Getting your first clients on Upwork
Pricing your work as a freelancer
The process after getting approved by a client
Wearing many hats as a freelancer and while working at a startup
Other suggestions for getting clients as a freelancer
Antonis' thoughts on the Data Engineering course
Antonis' resource recommendations
Links:
Lean Startup by Eric Ries: https://theleanstartup.com/
Lean Analytics: https://leananalyticsbook.com/
Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
Kafka Streaming with python by Khris Jenkins tutorial video: https://youtu.be/jItIQ-UvFI4
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

13 snips
Jun 2, 2023 • 50min
Data Access Management - Bart Vandekerckhove
We talked about:
Bart's background
What is data governance?
Data dictionaries and data lineage
Data access management
How to learn about data governance
What skills are needed to do data governance effectively
When an organization needs to start thinking about data governance
Good data access management processes
Data masking and the importance of automating data access
DPO and CISO roles
How data access management works with a data mesh approach
Avoiding the role explosion problem
The importance of data governance integration in DataOps
Terraform as a stepping stone to data governance
How Raito can help an organization with data governance
Open-source data governance tools
Links:
LinkedIn: https://www.linkedin.com/in/bartvandekerckhove/
Twitter: https://twitter.com/Bart_H_VDK
Github: https://github.com/raito-io
Website: https://www.raito.io/
Data Mesh Learning Slack: https://data-mesh-learning.slack.com/join/shared_invite/zt-1qs976pm9-ci7lU8CTmc4QD5y4uKYtAA#/shared-invite/email
DataQG Website: https://dataqg.com/
DataQG Slack: https://dataqgcommunitygroup.slack.com/join/shared_invite/zt-12n0333gg-iTZAjbOBeUyAwWr8I~2qfg#/shared-invite/email
DMBOK (Data Management Book of Knowledge): https://www.dama.org/cpages/body-of-knowledge
DMBOK Wheel describing the data governance activities: https://www.dama.org/cpages/dmbok-2-wheel-images
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

11 snips
May 26, 2023 • 56min
Data Strategy: Key Principles and Best Practices - Boyan Angelov
We talked about:
Boyan's background
What is data strategy?
Due diligence and establishing a common goal
Designing a data strategy
Impact assessment, portfolio management, and DataOps
Data products
DataOps, Lean, and Agile
Data Strategist vs Data Science Strategist
The skills one needs to be a data strategist
How does one become a data strategist?
Data strategist as a translator
Transitioning from a Data Strategist role to a CTO
Using ChatGPT as a writing co-pilot
Using ChatGPT as a starting point
How ChatGPT can help in data strategy
Pitching a data strategy to a stakeholder
Setting baselines in a data strategy
Boyan's book recommendations
Links:
LinkedIn: https://www.linkedin.com/in/angelovboyan/
Twitter: https://twitter.com/thinking_code
Github: https://github.com/boyanangelov
Website: https://boyanangelov.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html