The podcast discusses the future of AI industry with outlier LLMs, focusing on Phi 3 and Arctic models. It explores trends in open mixture of expert models and the impact of synthetic data on small models. The episode also touches on the training techniques of the new models, including Arctic's sparse mixture of experts architecture and Phi 3's evolution in the LLM space.
Arctic model emphasizes resource-efficient training with a dense MOE hybrid transformer architecture.
Phi 3 focuses on synthetic data and continuous refinement in model training strategies.
Deep dives
Innovative Model Releases: Phi 3 and Arctic Outlier LMs
The podcast highlights the recent releases of Phi 3 from Microsoft and Arctic from Snowflake, two innovative models that offer unique training techniques. Phi 3 focuses on synthetic high-quality data, while Arctic employs a sparse mixture of experts architecture, catering to coding-focused VRAM-rich inference audiences. These models provide a glimpse into the future of the industry, suggesting potential trends in the next 6 to 18 months.
Arctic Architecture and Design Strategy
Arctic stands out for its dense MOE hybrid transformer architecture, boasting 480 billion total parameters with 17 billion active parameters selected through top two gating. The model's emphasis on numerous but condensed experts and resource-efficient training and inference signifies a strategic approach towards top-tier intelligence without excessive computational burden. Arctic's architecture combines a dense transformer with the residual MOE component to achieve effective training efficiency through communication-computation overlap.
Phi 3 Small Models and Synthetic Data Evolution
The Phi series introduces Phi 3 as part of its suite of small models with varying parameter sizes. These models, ranging from 3.8 to 14 billion parameters, showcase advancements in leveraging synthetic data for training. Phi 3, an evolution from Phi 2, maintains a focus on GPT 3.5 for synthetic data generation, highlighting the importance of continuous refinement in model training strategies. Despite discussions surrounding test set training critiques, Phi models demonstrate proficiency in MLU benchmarks, reflecting the team's innovative approach to LM distillation.