Continuous Iteration Towards Improved Retrieval Performance

1min Snip

00:00

Play full episode

Summary

Transcript

Episode notes

The company initially implemented HNSW for indexing, assuming all data was in memory. However, to enable disk storage-based usage, they switched to JVector, an implementation of disk ANN. JVector, an open-source project developed by Jonathan Ellis, delivered enhanced performance, leading to its adoption by projects like OpenSearch. Currently, the company is enhancing retrieval with Colbert on top of JVector, showcasing a commitment to continuous iteration for improved performance.

DataStax is a generative AI data company that provides tools and services to build AI and other data-intensive applications.

Ed Anuff is the Chief Product Officer at DataStax. He joins the show to talk about making Apache Cassandra accessible, adding vector support at DataStax, envisioning the future application stack for AI, and more.

Full Disclosure: This episode is sponsored by DataStax

Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .