The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744

83 snips

Aug 26, 2025

Prince Canuma, an ML engineer and open-source developer known for his contributions to Apple's MLX ecosystem, discusses his journey in optimizing AI for Apple Silicon. He shares insights on adapting models, the trade-offs between GPU and Neural Engine, and innovative techniques like pruning and quantization for enhanced performance. Prince introduces 'Fusion,' a unique approach to model behavior without retraining, and presents Marvis, a real-time voice agent. His vision for future AI focuses on multimodal models that adapt seamlessly across various media.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

First PR Sparked Deep Involvement

Prince got started contributing to MLX by porting StarCoder 2 and making his first PR.
A parallel PR was merged and that launched his deep involvement and rapid porting cadence.

ANECDOTE

Became The Lightning Model Porter

Prince describes porting ~1,000 quantized variants (20–30 base models) to MLX in a year.
He became so fast that new model releases often took him only ~30 minutes to port.

INSIGHT

Specialization Unlocks Hardware Gains

MLX focuses on Apple GPU execution and thus can exploit Apple Silicon-specific optimizations.
This specialization unlocked performance potential beyond more general local inference tools.

Get the Snipd Podcast app to discover more snips from this episode

Get the app