

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Sep 8, 2023 • 54min
Pragmatic and Standardized MLOps - Maria Vechtomova
We talked about:
Maria's background
Marvelous MLOps
Maria's definition of MLOps
Alternate team setups without a central MLOps team
Pragmatic vs non-pragmatic MLOps
Must-have ML tools (categories)
Maturity assessment
What to start with in MLOps
Standardized MLOps
Convincing DevOps to implement
Understanding what the tools are used for instead of knowing all the tools
Maria's next project plans
Is LLM Ops a thing?
What Ahold Delhaize does
Resource recommendations to learn more about MLOps
The importance of data engineering knowledge for ML engineers
Links:
LinkedIn: https://www.linkedin.com/company/marvelous-mlops/
Website: https://marvelousmlops.substack.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 25, 2023 • 56min
Democratizing Causality - Aleksander Molak
We talked about:
Aleksander's background
Aleksander as a Causal Ambassador
Using causality to make decisions
Counterfactuals and and Judea Pearl
Meta-learners vs classical ML models
Average treatment effect
Reducing causal bias, the super efficient estimator, and model uplifting
Metrics for evaluating a causal model vs a traditional ML model
Is the added complexity of a causal model worth implementing?
Utilizing LLMs in causal models (text as outcome)
Text as treatment and style extraction
The viability of A/B tests in causal models
Graphical structures and nonparametric identification
Aleksander's resource recommendations
Links:
The Book of Why: https://amzn.to/3OZpvBk
Causal Inference and Discovery in Python: https://amzn.to/46Pperr
Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python
The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw
New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Aug 18, 2023 • 47min
Mastering Data Engineering as a Remote Worker - José María Sánchez Salas
Topics include moving from Spain to Norway, organizing the day as a remote worker, company's expertise and data collection process, challenges of finding a remote job in Norway, finding inspiration and writing interesting topics, benefits and challenges of remote work as a data engineer.

Aug 4, 2023 • 51min
The Good, the Bad and the Ugly of GPT - Sandra Kublik
We talked about:
Sandra's background
Making a YouTube channel to break into the LLM space
The business cases for LLMs
LLMs as amplifiers
The befits of keeping a human in the loop when using LLMs (AI limitations)
Using LLMs as assistants
Building an app that uses an LLM
Prompt whisperers and how to improve your prompts
Sandra's 7-day LLM experiment
Sandra's LLM content recommendations
Finding Sandra online
Links:
LinkedIn: https://www.linkedin.com/in/sandrakublik/
Twitter: https://twitter.com/sandra_kublik
Youtube: https://www.youtube.com/@sandra_kublik
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 28, 2023 • 55min
LLMs for Everyone - Meryem Arik
We talked about:
Meryam's background
The constant evolution of startups
How Meryam became interested in LLMs
What is an LLM (generative vs non-generative models)?
Why LLMs are important
Open source models vs API models
What TitanML does
How fine-tuning a model helps in LLM use cases
Fine-tuning generative models
How generative models change the landscape of human work
How to adjust models over time
Vector databases and LLMs
How to choose an open source LLM or an API
Measuring input data quality
Meryam's resource recommendations
Links:
Website: https://www.titanml.co/
Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml...
Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57
Discord: https://discord.gg/83RmHTjZgf
Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 21, 2023 • 55min
Investing in Open-Source Data Tools - Bela Wiertz
Bela Wiertz, investor in open-source data tools, talks about the viability of open source as a go-to-market strategy, the differences between angel investors, VC funds, and family offices, and the use of GitHub stars as a metric for investment. They also discuss the future of open source, recent successes of open source companies, and Bela's resource recommendations.

Jul 14, 2023 • 51min
Why Machine Learning Design is Broken - Valerii Babushkin
Links:
Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
Discount: poddatatalks21 (35% off)
Evidently: https://www.evidentlyai.com/
Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jul 7, 2023 • 53min
Interpretable AI and ML - Polina Mosolova
We talked about:
Polina's background
How common it is for PhD students to build ML pipelines end-to-end
Simultaneous PhD and industry experience
Support from both the academic and industry sides
How common the industrial PhD setup is and how to get into one
Organizational trust theory
How price relates to trust
How trust relates to explainability
The importance of actionability
Explainability vs interpretability vs actionability
Complex glass box models
Does the explainability of a model follow explainability?
What explainable AI bring to customers and end users
Can all trust be turned into KPI?
Links:
LinkedIn: https://www.linkedin.com/in/polina-mosolova/
Neural Additive Models paper: https://proceedings.neurips.cc/paper/2021/file/251bd0442dfcc53b5a761e050f8022b8-Paper.pdf
Neural Basis Model paper: https://arxiv.org/pdf/2205.14120.pdf
Interpretable Feature Spaces paper: https://kdd.org/exploration_files/vol24issue1_1._Interpretable_Feature_Spaces_revised.pdf

Jun 30, 2023 • 54min
From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner
We talked about:
Simon's background
What MLOps is and what it isn't
Skills needed to build an ML platform that serves 100s of models
Ranking the importance of skills
The point where you should think about building an ML platform
The importance of processes in ML platforms
Weighing your options with SaaS platforms
The exploratory setup, experiment tracking, and model registry
What comes after deployment?
Stitching tools together to create an ML platform
Keeping data governance in mind when building a platform
What comes first – the model or the platform?
Do MLOps engineers need to have deep knowledge of how models work?
Is API design important for MLOps?
Simon's recommendations for furthering MLOps knowledge
Links:
LinkedIn: https://www.linkedin.com/in/simonstiebellehner/
Github: https://github.com/stiebels
Medium: https://medium.com/@sistel
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Jun 23, 2023 • 53min
From MLOps to DataOps - Santona Tuli
We talked about:
Santona's background
Focusing on data workflows
Upsolver vs DBT
ML pipelines vs Data pipelines
MLOps vs DataOps
Tools used for data pipelines and ML pipelines
The “modern data stack” and today's data ecosystem
Staging the data and the concept of a “lakehouse”
Transforming the data after staging
What happens after the modeling phase
Human-centric vs Machine-centric pipeline
Applying skills learned in academia to ML engineering
Crafting user personas based on real stories
A framework of curiosity
Santona's book and resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/santona-tuli/
Upsolver website: upsolver.com
Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html