#4667
Mentioned in 1 episodes
DeepSeek-V3
A Mixture-of-Experts Large Language Model
Book • 2025
DeepSeek-V3 is an open-source large language model that leverages a Mixture-of-Experts (MoE) architecture.
It features 671 billion parameters, with 37 billion activated per token, and incorporates innovative techniques such as Multi-Head Latent Attention (MLA), auxiliary-loss-free load balancing, and a novel Multi-Token Prediction (MTP) objective.
The model was pre-trained on 14.
8 trillion tokens and outperforms other open-source models on various benchmarks, including coding and mathematics tasks.
It also employs fine-grained quantization using FP8 and improved parallelism and cross-node communication for efficient training and inference.
It features 671 billion parameters, with 37 billion activated per token, and incorporates innovative techniques such as Multi-Head Latent Attention (MLA), auxiliary-loss-free load balancing, and a novel Multi-Token Prediction (MTP) objective.
The model was pre-trained on 14.
8 trillion tokens and outperforms other open-source models on various benchmarks, including coding and mathematics tasks.
It also employs fine-grained quantization using FP8 and improved parallelism and cross-node communication for efficient training and inference.
Mentioned by
Mentioned in 1 episodes
Mentioned in the context of DeepSeek R1's impact on the market and tech stocks.

403 snips
#198 - DeepSeek R1 & Janus, Qwen2.5, OpenAI Agents