ThursdAI - The top AI news from the past week

đź“… ThursdAI - Live @ NeurIPS, Mixtral, GeminiPro, Phi2.0, StripedHyena, Upstage 10B SoTA & more AI news from last (insane) week

Dec 14, 2023
This podcast covers a range of interesting topics including Open Source LLMs, Mixtral MoE, Mistral 0.2 instruct, Upstage Solar 10B, Striped Hyena architecture, EAGLE decoding method, Deci.ai's new SOTA 7B model, Microsoft's Phi 2.0 weights, QuiP LLM quantization & Compression, and Gemini Pro access over API.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Mistral's Mixtral MOE Model

  • Mistral released Mixtral, a mixture of experts model, combining eight 7B experts to outperform GPT-3.5.
  • This model is fully open source with Apache 2.0 license, allowing commercial use and further fine-tuning.
INSIGHT

Mixtral Sparse Architecture Explained

  • Mixtral uses sparse architecture with a router network activating two experts per token for inference.
  • Each expert is a dense 7B model, but only part activates per token, making inference efficient.
ADVICE

Tips on Using Mixtral Models

  • Use Mixtral’s API for better inference speed and efficiency instead of loading full model locally.
  • Consider fine-tuning tools like Axolotl for adapting mixture of experts models properly.
Get the Snipd Podcast app to discover more snips from this episode
Get the app