#039 Local-First Search, How to Push Search To End-Devices

Jan 23, 2025

Alex Garcia, a developer passionate about making vector search practical, discusses his creation, SQLiteVec. He emphasizes its lightweight design and how it simplifies local AI applications. The conversation reveals the efficiency of SQLiteVec's brute force searches, with impressive performance metrics at scale. Garcia also dives into challenges like data synchronization and fine-tuning embedding models. His insights on binary quantization and future innovations in local search highlight the evolution of user-friendly machine learning tools.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

SQLite Storage Quirks

SQLite's row-oriented storage impacts SQLiteVec's performance, especially with large vector blobs.
4KB page sizes cause non-contiguous storage, affecting analytical tasks but benefiting transactional workloads.

ANECDOTE

Why SQLiteVec uses SQLite

Alex Garcia chose SQLite for SQLiteVec due to its simplicity and existing integration with his workflow.
He prioritized lightweight deployment and compatibility with his existing SQLite projects.

ADVICE

SQLiteVec Performance Limits

Consider SQLiteVec's practical limits: brute-force search handles hundreds of thousands of vectors (768 dimensions) efficiently.
Aim for sub-100ms search times; binary quantization extends scalability to ~1M vectors.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector search thing that I don't have to install 10,000 dependencies to use.”

Core Mantra: "Simple, Local, Scalable"

Why SQLite Vec?

"I didn't go along thinking, 'Oh, I want to build vector search, let me find a database for it.' It was much more like: I use SQLite for a lot of projects, I want something lightweight that works in my current workflow."

SQLiteVec uses row-oriented storage with some key design choices:

Vectors are stored in large chunks (megabytes) as blobs
Data is split across 4KB SQLite pages, which affects analytical performance
Currently uses brute force linear search without ANN indexing
Supports binary quantization for 32x size reduction
Handles tens to hundreds of thousands of vectors efficiently

Practical limits:

500ms search time for 500K vectors (768 dimensions)
Best performance under 100ms for user experience
Binary quantization enables scaling to ~1M vectors
Metadata filtering and partitioning coming soon

Key advantages:

Fast writes for transactional workloads
Simple single-file database
Easy integration with existing SQLite applications
Leverages SQLite's mature storage engine

Garcia's preferred tools for local AI:

Sentence Transformers models converted to GGUF format
Llama.cpp for inference
Small models (30MB) for basic embeddings
Larger models like Arctic Embed (hundreds of MB) for recent topics
SQLite L-Embed extension for text embeddings
Transformers.js for browser-based implementations

1. Choose Your Storage

"There's two ways of storing vectors within SQLiteVec. One way is a manual way where you just store a JSON array... [second is] using a virtual table."

Traditional row storage: Simple, flexible, good for small vectors
Virtual table storage: Optimized chunks, better for large datasets
Performance sweet spot: Up to 500K vectors with 500ms search time

2. Optimize Performance

"With binary quantization it's 1/32 of the space... and holds up at 95 percent quality"

Binary quantization reduces storage 32x with 95% quality
Default page size is 4KB - plan your vector storage accordingly
Metadata filtering dramatically improves search speed

3. Integration Patterns

"It's a single file, right? So you can like copy and paste it if you want to make a backup."

Two storage approaches: manual columns or virtual tables
Easy backups: single file database
Cross-platform: desktop, mobile, IoT, browser (via WASM)

4. Real-World Tips

"I typically choose the really small model... it's 30 megabytes. It quantizes very easily... I like it because it's very small, quick and easy."

Start with smaller, efficient models (30MB range)
Use binary quantization before trying complex solutions
Plan for partitioning when scaling beyond 100K vectors

Alex Garcia

Nicolay Gerold: