

#034 Rethinking Search Inside Postgres, From Lexemes to BM25
Dec 5, 2024
Philippe Noël, Founder and CEO of ParadeDB, dives into the revolutionary shift in search technology with his open-source PostgreSQL extension. He discusses how ParadeDB eliminates the need for separate search clusters by enabling search directly within databases, simplifying architecture and enhancing cost-efficiency. The conversation explores BM25 indexing, maintaining data normalization, and the advantages of ACID compliance with search. Philippe also reveals successful use cases, including Alibaba Cloud’s implementation, and practical insights for optimizing large-scale search applications.
AI Snips
Chapters
Transcript
Episode notes
ParadeDB's Sweet Spot
- ParadeDB excels with structured relational data in Postgres, offering strong data integrity.
- For large JSON document workloads, a NoSQL search engine might be more suitable.
Composable Data Systems and Their Challenges
- Integrating multiple query engines like DuckDB or DataFusion within Postgres creates overhead, hindering data integrity.
- Building features natively in Postgres, while more work, ensures better performance and transactional safety.
Alibaba Cloud Case Study
- Alibaba Cloud, ParadeDB's largest customer, uses it within their Postgres data warehouse.
- They chose ParadeDB over Elastic to offer a unified product with full-text search capabilities.