740: Local AI Models in JavaScript - Machine Learning Deep Dive With Xenova

12 snips

Mar 8, 2024

Xenova, the creator of Transformers.js, joins to unravel the world of local AI models in JavaScript. They discuss how Transformers.js integrates machine learning into web apps seamlessly, making advanced functionalities like real-time speech recognition and object detection accessible. The impact of Hugging Face and the potential of projects like Doodle Dash highlight the efficiency of running AI tasks directly in the browser. Plus, they touch on the promise of WebGPU technology and quantization to enhance performance in machine learning.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Origin of Transformers.js

Xenova developed Transformers.js after realizing he wanted to run a model locally in a browser extension.
The library gained popularity after a post on Twitter, leading to its growth and development.

INSIGHT

How Transformers.js Works

Transformers.js uses ONNX Runtime, similar to how Transformers (Python) uses PyTorch.
This allows JavaScript to run models locally by loading ONNX files and handling pre/post-processing.

ADVICE

Exploration and Inspiration

Explore the Transformers.js documentation and demos for inspiration.
Building with it can be a fun way for web developers to get into AI/ML.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Scott and Wes are joined by special guest Xenova to explore local AI models in JavaScript. From Hugging Face to Transformers.js and practical applications like real-time speech recognition and object detection, this episode dives deep into the world of machine learning.

Show Notes

00:00 Welcome to Syntax!
00:41 Brought to you by Sentry.io
01:05 Who is Xenova?
02:08 What is Hugging Face?
03:29 What is Transformers.js?
06:16 How was the library developed?
SponsorBlock
09:04 How is it able to run?
10:09 Do they have to run in Python and how does Onnx work?
Onnx.ai
Hugging Face Optimum
14:19 What are some things you can do with this tech?
16:15 Vision tools.
17:38 This is actually running locally.
18:35 Doodle Dash
21:09 They currently run on CPU, what is required to make it run on GPU?
24:44 Can you run in JavaScript?
28:32 How it works with image vectors.
34:23 Why would people want to run it in another language?
35:55 Resizing images in the browser instead of on the server.
38:55 Applications distributed on the web vs running locally.
43:54 Electron has Node and Chrome, where would you run Transformers.js?
44:32 The API of Transformers.js
46:30 Object Detection.
Semantic Image Search Client
Real-Time Object Detection
Background Removal Tool
48:33 What is the easiest way to get started?
51:26 Real-time speech recognition on the horizon?
52:08 Will we ever be able to run Stable Diffusion via JavaScript?
56:10 The Web LLM.
57:22 Practical applications for YouTube.
59:39 What we want to build for Syntax.fm.
01:06:43 Mean pooling, why it’s necessary.
01:09:30 Stopping YouTube spam comments.
01:10:34 K-Means Clustering.
Text Clustering
01:13:49 Quantization.
01:17:35 Sick Picks + Shameless Plugs.

Sick Picks

Xeonva: WebGPU

Shameless Plugs

Xenova: Xenova on X

Hit us up on Socials!

Syntax: X Instagram Tiktok LinkedIn Threads

Wes: X Instagram Tiktok LinkedIn Threads

Scott:X Instagram Tiktok LinkedIn Threads

Randy: X Instagram YouTube Threads