DataTalks.Club

Data Intensive AI - Bartosz Mikulski

9 snips
Mar 21, 2025
Bartosz Mikulski, an AI and data engineer, delves into the fascinating realm of Data Intensive AI. He shares his journey from Java development to AI engineering, emphasizing the pivotal role of testing in data pipelines. The conversation explores practical strategies for testing and the integration of AI tools like ChatGPT and Cursor in development workflows. Bartosz provides insights into creating a Chrome extension for lead classification on LinkedIn, while discussing prompt engineering techniques and the impact of AI on coding practices. A must-listen for data enthusiasts!
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ADVICE

Testing Data Pipelines

  • Build your data pipeline first, then create tests based on observed outputs.
  • This approach, while potentially missing edge cases initially, offers a practical starting point.
INSIGHT

Data Pipeline Testing

  • Unit tests are less useful for data pipelines compared to integration tests.
  • Data pipelines, unlike short functions, involve complex processes, making unit tests less effective.
ANECDOTE

Spark in MLOps

  • Bartosz's team, primarily backend developers, successfully used Spark for a new MLOps platform project.
  • Despite lacking prior Spark experience, the team found Spark to be the easiest solution for data processing from logs.
Get the Snipd Podcast app to discover more snips from this episode
Get the app