

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744
82 snips Aug 26, 2025
Prince Canuma, an ML engineer and open-source developer known for his contributions to Apple's MLX ecosystem, discusses his journey in optimizing AI for Apple Silicon. He shares insights on adapting models, the trade-offs between GPU and Neural Engine, and innovative techniques like pruning and quantization for enhanced performance. Prince introduces 'Fusion,' a unique approach to model behavior without retraining, and presents Marvis, a real-time voice agent. His vision for future AI focuses on multimodal models that adapt seamlessly across various media.
AI Snips
Chapters
Transcript
Episode notes
First PR Sparked Deep Involvement
- Prince got started contributing to MLX by porting StarCoder 2 and making his first PR.
- A parallel PR was merged and that launched his deep involvement and rapid porting cadence.
Became The Lightning Model Porter
- Prince describes porting ~1,000 quantized variants (20–30 base models) to MLX in a year.
- He became so fast that new model releases often took him only ~30 minutes to port.
Specialization Unlocks Hardware Gains
- MLX focuses on Apple GPU execution and thus can exploit Apple Silicon-specific optimizations.
- This specialization unlocked performance potential beyond more general local inference tools.