AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Integrating Knowledge Graphs with Vector Databases
This chapter explores the architecture of integrating knowledge graphs with vector databases, particularly focusing on user query flow and data narrowing techniques. It highlights practical applications using natural language queries related to Kamala Harris's campaign and examines the complexities of constructing and querying knowledge graphs to uncover hidden relationships in various contexts. The discussion also touches on challenges in multilingual entity translation and the ongoing need for entity disambiguation in knowledge graphs.
Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine learning and data analysis. His work on projects like Flowdapt (model serving) and FreqAI (adaptive modeling) has earned over 1,000 academic citations.
His team built AskNews, which he calls "the largest news knowledge graph in production." It's a system that doesn't just collect news - it understands how events, people, and places connect.
Current AI systems struggle to connect information across sources and domains. Simple vector search misses crucial relationships. But building knowledge graphs at scale brings major technical hurdles around entity extraction, relationship mapping, and query performance.
Emergent Methods built a hybrid system combining vector search and knowledge graphs:
Implementation Details:
Data Pipeline:
Entity Management:
Knowledge Graph:
System Validation:
Engineering Insights:
Key Technical Decisions:
Dead Ends Hit:
Top Quotes:
Robert Caulk:
Nicolay Gerold:
00:00 Introduction to Context Engineering 00:24 Curating Input Signals 01:01 Structuring Raw Data 03:05 Refinement and Iteration 04:08 Balancing Breadth and Precision 06:10 Interview Start 08:02 Challenges in Context Engineering 20:25 Optimizing Context for LLMs 45:44 Advanced Cypher Queries and Graphs 46:43 Enrichment Pipeline Flexibility 47:16 Combining Graph and Semantic Search 49:23 Handling Multilingual Entities 52:57 Disambiguation and Deduplication Challenges 55:37 Training Models for Diverse Domains 01:04:43 Dealing with AI-Generated Content 01:17:32 Future Developments and Final Thoughts
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode