DataTalks.Club cover image

DataTalks.Club

Latest episodes

undefined
May 19, 2023 • 58min

Practical Data Privacy - Katharine Jarmul

We talked about: Katharine's background Katharine's ML privacy startup GDPR, CCPA, and the “opt-in as the default” approach What is data privacy? Finding Katharine's book – Practical Data Privacy The various definitions of data privacy and “user profiles” Privacy engineering and privacy-enhancing technologies Why data privacy is important What is differential privacy? The importance of keeping privacy in mind when designing systems Data privacy on the example of ChatGPT Katharine's resource suggestions for learning about data privacy Links: LinkedIn: https://www.linkedin.com/in/katharinejarmul/ Twitter: https://twitter.com/kjam Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
May 12, 2023 • 51min

Building Scalable and Reliable Machine Learning Systems - Arseny Kravchenko

We talked about: Arseny's background Working on machine learning in startups What is Machine Learning System Design? Constraints and requirements Known unknowns vs unknown unknowns (Design stage) Writing a design document Technical problems vs product-oriented problems The solution part of the Design Document What motivated Arseny to write a book on ML System Design Examples of a Design Document in the book The types of readers for ML System Design Working with the co-author Reacting to constraints and feedback when writing a book Arseny's favorite chapter of the book Other resources where you can learn about ML System Design Twitter Giveaway Links: Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter Discount: poddatatalks21 (35% off) Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Apr 21, 2023 • 56min

Building an Open-Source NLP Tool - Johannes Hötter

We talked about: Johannes’s background Johannes’s Open Source Spotlight demos – Refinery and Bricks The difficulties of working with natural language processing (NLP) Incorporating ChatGPT into a process as a heuristic What is Bricks? The process of starting a startup – Kern Making the decision to go with open source Pros and cons of launching as open source Kern’s business model Working with enterprises Johannes as a salesperson The team at Kern Johannes’s role at Kern How Johannes and Henrik separate responsibilities at Kern Working with very niche use cases The short story of how Kern got its funding Johannes’s resource recommendation Links: Refinery's GitHub repo: https://github.com/code-kern-ai/refinery Bricks' Github repo: https://github.com/code-kern-ai/bricks Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg Discord: https://discord.com/invite/qf4rGCEphW Ker's Website: https://www.kern.ai Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Apr 14, 2023 • 53min

Navigating Industrial Data Challenges - Rosona Eldred

We talked about: Rosona’s background How mathematics knowledge helps in industry What is industrial data? Setting up an industrial process using blue paint Internet companies’ data vs industrial data Explaining industrial processes using packing peanuts Why productive industry needs data Measuring product qualities How data specialists use industrial data Defining and measuring sustainability Using data in reactionary measures to changing regulations Types of industrial data Solving problems and optimizing with industrial data Industrial solvers Tiny data vs Big data in productive industry The advantages of coming from academia into productive industry Materials and resources for industrial data Women in industry Why Rosona decided to shift to industrial data Links: Kaggle dataset: https://www.kaggle.com/datasets/paresh2047/uci-semcom
undefined
Apr 7, 2023 • 51min

Mastering Self-Learning in Machine Learning - Aaisha Muhammad

We talked about: Aaisha’s background How homeschooling affects self-study Deciding on what to learn about Establishing whether a resource is good How Aaisha focuses on learning Deciding on what kind of project to build Find research materials Aaisha’s experience with the Data Talks Club ML Zoomcamp ML Zoomcamp projects Aaisha’s interest in bioinformatics Keeping motivated with deadlines Notes and time-tracking tools Drawbacks to self-studying Aaisha’s interest in machine learning Aaisha’s least favorable part of ML Zoomcamp Helping people as a way to learn Using ChatGPT as a “study group” Is it possible to use self-studying to learn high-level topics Switching topics to avoid burnout Aaisha’s resource recommendations Links: LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/ Twitter: https://twitter.com/ZealousMushroom Github: https://github.com/AaishaMuhammad Website: http://www.aaishamuhammad.co.za/ Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Mar 31, 2023 • 49min

The Secret Sauce of Data Science Management - Shir Meir Lador

We talked about: Shir’s background Debrief culture The responsibilities of a group manager Defining the success of a DS manager The three pillars of data science management Managing up Managing down Managing across Managing data science teams vs business teams Scrum teams, brainstorming, and sprints The most important skills and strategies for DS and ML managers Making sure proof of concepts get into production Links: The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38 Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/ How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/ How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/ Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38 Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Mar 24, 2023 • 54min

SE4ML - Software Engineering for Machine Learning - Nadia Nahar

We talked about: Nadia’s background Academic research in software engineering Design patterns Software engineering for ML systems Problems that people in industry have with software engineering and ML Communication issues and setting requirements Artifact research in open source products Product vs model Nadia’s open source product dataset Failure points in machine learning projects Finding solutions to issues using Nadia’s dataset and experience The problem of siloing data scientists and other structure issues The importance of documentation and checklists Responsible AI How data scientists and software engineers can work in an Agile way Links: Model Card: https://arxiv.org/abs/1810.03993 Datasheets: https://arxiv.org/abs/1803.09010 Factsheets: https://arxiv.org/abs/1808.07261 Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf Arxiv version: https://arxiv.org/pdf/2110. Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Mar 17, 2023 • 52min

Starting a Consultancy in the Data Space - Aleksander Kruszelnicki

We talked about: Aleksander’s background The difficulty of selling data stack as a service How Aleksander got into consulting The Mom Test – extracting feedback from people User interviews Why Aleksander’s data stack as a service startup was not viable How Aleksander decided to switch to consulting Finding clients to consult Figuring out how to position your services Geographical limitations Figuring out your target audience The importance of networking and marketing Pricing your services The pitfalls of daily and hourly pricing and how to balance incentives Is Germany a good place to found a company? Aleksander’s book recommendations Links: LinkedIn: https://www.linkedin.com/in/alkrusz/ Twitter: https://twitter.com/alkrusz Website: www.leukos.io Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Mar 10, 2023 • 53min

Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin

We talked about: Ruslan’s background Fighting procrastination and perfectionism What is biohacking? The role of dopamine and other hormones in daily life How meditation can help The influence light has on our bodies Behavioral biohacking Daylight lamps and using light to wake up Sleep cycles How nutrition affects productivity Measuring productivity Examples of unsuccessful biohacking attempts Stoicism, voluntary discomfort, and self-challenges Biohacking risks and ways to prevent them Coffee and tea biohacking Using self-reflection and tracking to measure results Mindset shifting Stoicism book recommendation Work/life balance Ruslan’s biohacking resource recommendation Links: LinkedIn: https://www.linkedin.com/in/ruslanshchuchkin/ ree data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
undefined
Mar 3, 2023 • 55min

Analytics for a Better World - Parvathy Krishnan

We talked about: Parvathy’s background Brainstorming sessions with nonprofits to establish data maturity Example of an Analytics for a Better World project The overall data maturity situation of nonprofits vs private sector Solving the skill gap Publicly available content The Analytics for a Better World Academy The Academy’s target audience How researchers can work with Analytics for a Better World Improving data maturity in nonprofit organizations People, processes, and technology Typical tools that Analytics for a Better World recommends to nonprofits Profiles in nonprofits Does Analytics for a Better World has a need for data engineers? The Analytics for a Better World team Factors that help organizations become more data-driven Parvathy’s resource recommendations Links: LinkedIn: https://www.linkedin.com/in/parvathykrishnank/ Twitter:  https://twitter.com/ABWInstitute Github: https://github.com/Analytics-for-a-Better-World Website:  https://analyticsbetterworld.org/ Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app