The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744

82 snips
Aug 26, 2025
Prince Canuma, an ML engineer and open-source developer known for his contributions to Apple's MLX ecosystem, discusses his journey in optimizing AI for Apple Silicon. He shares insights on adapting models, the trade-offs between GPU and Neural Engine, and innovative techniques like pruning and quantization for enhanced performance. Prince introduces 'Fusion,' a unique approach to model behavior without retraining, and presents Marvis, a real-time voice agent. His vision for future AI focuses on multimodal models that adapt seamlessly across various media.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

First PR Sparked Deep Involvement

  • Prince got started contributing to MLX by porting StarCoder 2 and making his first PR.
  • A parallel PR was merged and that launched his deep involvement and rapid porting cadence.
ANECDOTE

Became The Lightning Model Porter

  • Prince describes porting ~1,000 quantized variants (20–30 base models) to MLX in a year.
  • He became so fast that new model releases often took him only ~30 minutes to port.
INSIGHT

Specialization Unlocks Hardware Gains

  • MLX focuses on Apple GPU execution and thus can exploit Apple Silicon-specific optimizations.
  • This specialization unlocked performance potential beyond more general local inference tools.
Get the Snipd Podcast app to discover more snips from this episode
Get the app