Data Engineering Podcast

Evolving Responsibilities in AI Data Management

30 snips
Feb 16, 2025
Bartosz Mikulski, an MLOps engineer with a rich background in data engineering, dives deep into the realm of AI data management. He highlights the crucial role of data testing in AI applications, especially with the rise of generative AI. Bartosz discusses the need for specialized datasets and the skills required for data engineers to transition into AI. He also addresses challenges like frequent data reprocessing and unstructured data handling, showcasing the evolving responsibilities in this fast-paced field.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Data Testing in AI

  • In AI, data testing is more crucial than software development, especially the evaluation dataset (test dataset).
  • Multiple-step AI applications require separate test datasets for each step and the entire workflow.
ADVICE

Test Data for RAG and Agents

  • Prepare test datasets for every step in a Retrieval Augmented Generation (RAG) application, including user input, queries, and responses.
  • For AI agents, datasets must include expected tools, parameters, and queries for comprehensive testing.
INSIGHT

AI Team Responsibilities

  • AI engineers often handle all AI-related tasks, but existing data engineering, data science, and MLOps teams can adapt to generative AI.
  • Generative AI adds the challenge of working with text input and output.
Get the Snipd Podcast app to discover more snips from this episode
Get the app