
Weaviate Podcast CEO Han Xiao From Jina AI
Mar 15, 2022
Han Xiao, Founder and CEO of Jina AI, shares insights into the evolving world of neural search. He discusses his early experiences at Zalando and Tencent that fueled his passion for this tech. Han dives into building effective neural search pipelines, including hierarchical embeddings for images and the innovative DocumentArray structure. He outlines Jina Hub's foundations and how developers can publish their workflows easily. Lastly, he touches on the challenges of running an open-source company and the exciting future of multimodal searches.
AI Snips
Chapters
Transcript
Episode notes
From Frankenstein Models To Microservices
- Han Xiao describes building early "Frankenstein" neural search models at Zalando and later refactoring Tencent's Elasticsearch into microservices.
- These experiences drove him to design Jina as a microservice-first neural search framework and later found Jina AI.
Two Pillars And The Devil In The Details
- Neural search needs two pillars: strong representations from ML models and fast vector retrieval/storage systems.
- Preprocessing and postprocessing (segmentation, hierarchical embeddings, score aggregation) crucially determine search quality.
Recursive Document Structure Matters
- Documents should be recursive and nested: documents contain sub-documents and nearest-neighbor relations at multiple levels.
- This hierarchical, horizontal/vertical structure simplifies matching and score aggregation for complex objects.
