Data Engineering Podcast

Streamlining Data Pipelines with MCP Servers and Vector Engines

10 snips
Jul 15, 2025
Kacper Łukawski, a Senior Developer Advocate at Qdrant, specializes in vector databases for large language models. He dives into transforming unstructured data into valuable insights through semantic search and retrieval-augmented generation. Kacper explains the integration of MCP servers for optimizing data pipelines and discusses the challenges of managing embeddings. He also highlights innovative applications in coding practices and the complexities of vector search, offering practical advice on fine-tuning models and reducing costs for enhanced search quality.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Kacper's Big Data Beginnings

  • Kacper Łukawski started working with big data pipelines around 2014-15 in the automotive industry.
  • He used Apache tools like Spark, Kafka, and Uzi, expanding into data ingestion and visualization projects.
INSIGHT

LLMs Are Not Magic Fixes

  • Large language models (LLMs) do not automatically fix poor data quality or solve all data problems.
  • Teams often struggle with scalability and deployment, especially in industries avoiding proprietary SaaS tools.
ANECDOTE

Ontology via Vector and Graph

  • One user built a system combining vector embeddings and graphs to derive ontologies in law and medicine.
  • This dual modeling approach captures semantics and relationships, popular in graph-RAG scenarios.
Get the Snipd Podcast app to discover more snips from this episode
Get the app