Data Intensive AI - Bartosz Mikulski

9 snips

Mar 21, 2025

Bartosz Mikulski, an AI and data engineer, delves into the fascinating realm of Data Intensive AI. He shares his journey from Java development to AI engineering, emphasizing the pivotal role of testing in data pipelines. The conversation explores practical strategies for testing and the integration of AI tools like ChatGPT and Cursor in development workflows. Bartosz provides insights into creating a Chrome extension for lead classification on LinkedIn, while discussing prompt engineering techniques and the impact of AI on coding practices. A must-listen for data enthusiasts!

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ADVICE

Testing Data Pipelines

Build your data pipeline first, then create tests based on observed outputs.
This approach, while potentially missing edge cases initially, offers a practical starting point.

INSIGHT

Data Pipeline Testing

Unit tests are less useful for data pipelines compared to integration tests.
Data pipelines, unlike short functions, involve complex processes, making unit tests less effective.

ANECDOTE

Spark in MLOps

Bartosz's team, primarily backend developers, successfully used Spark for a new MLOps platform project.
Despite lacking prior Spark experience, the team found Spark to be the easiest solution for data processing from logs.

Get the Snipd Podcast app to discover more snips from this episode

Get the app