ThursdAI - The top AI news from the past week

📆 ThursdAI - Oct 23: The AI Browser Wars Begin, DeepSeek's OCR Mind-Trick & The Race to Real-Time Video

118 snips
Oct 24, 2025
Paul Klein, founder of BrowserBase, discusses agentic browser automation and the intriguing integration with 1Password that enhances secure access for browsing agents. Joining him is Quinn Kramer, an expert in real-time video technology, who explores the architecture behind low-latency multimodal interactions and the fascinating world of real-time lip sync for avatars. The conversation dives into the revolutionary DeepSeek OCR model, highlighting its innovative text compression methods that could redefine how we process images.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Visual Text Compression Breakthrough

  • DeepSeek-OCR compresses text as images and decodes with a tiny vision decoder, enabling dramatic context compression for long documents.
  • They report ~10x compression with 97% decode accuracy, suggesting a new, efficient way to represent large text context.
INSIGHT

Visual Tokens Multiply Context Efficiently

  • Visual token compression could squeeze ~1M text tokens into ~100k vision tokens, massively expanding usable context windows.
  • Because transformer flops scale quadratically with context, this yields large computational savings for long-context tasks.
ADVICE

Use Tiny OCR For Mass Ingestion

  • Run DeepSeek-style tiny OCR models for large-scale PDF/image ingestion to save compute and cost when generating training tokens.
  • Prefer small, fast models for massive dataset preprocessing to avoid huge cluster costs and speed up ingestion.
Get the Snipd Podcast app to discover more snips from this episode
Get the app