Nicolay here,
while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.
Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.
His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.
Key Insight: The Real World Action Gap
LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.
This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.
Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.
💡 Core Concepts
Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification
ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio
Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains
⏱ Important Moments
Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control
Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations
Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information
Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands
8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression
On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems
🛠 Tools & Tech
Whisper: github.com/openai/whisper
Moonshine: github.com/usefulsensors/moonshine
TinyML Book: oreilly.com/library/view/tinyml/9781492052036
Stanford Edge ML: github.com/petewarden/stanford-edge-ml
📚 Resources
Looking to Listen Paper: looking-to-listen.github.io
Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635
Connect: pete@usefulsensors.com | petewarden.com | usefulsensors.com
Beta Opportunity: Moonshine browser implementation for client-side speech processing in
JavaScript