DeepSeek-V3

A Mixture-of-Experts Large Language Model

Book • 2025

DeepSeek-V3 is an open-source large language model that leverages a Mixture-of-Experts (MoE) architecture.

It features 671 billion parameters, with 37 billion activated per token, and incorporates innovative techniques such as Multi-Head Latent Attention (MLA), auxiliary-loss-free load balancing, and a novel Multi-Token Prediction (MTP) objective.

The model was pre-trained on 14.

8 trillion tokens and outperforms other open-source models on various benchmarks, including coding and mathematics tasks.

It also employs fine-grained quantization using FP8 and improved parallelism and cross-node communication for efficient training and inference.

Mentioned by

Mentioned in 1 episodes

Mentioned in the context of DeepSeek R1's impact on the market and tech stocks.

403 snips

#198 - DeepSeek R1 & Janus, Qwen2.5, OpenAI Agents

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app