

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures
114 snips Aug 29, 2025
Ankit Gupta dives into the latest advancements in open-source language models, focusing on OpenAI's new model and its competition, Qwen and Deepseek. Discover the unique architectural features that differentiate these models, including their approaches to long-context training and reasoning alignment. Gupta highlights how innovative design choices yield similar performance outcomes, showcasing the evolving landscape of AI development. This insightful comparison sheds light on the future of large language models and their practical implications.
AI Snips
Chapters
Transcript
Episode notes
OpenAI's GPT-OSS Design
- GPT-OSS is a mixture-of-experts decoder-only transformer with grouped query attention, SWEGLU, ROPE, and RMSNorm.
- It natively supports a 131,000 token context and ships quantized for consumer hardware.
Alibaba's Qwen 3 Approach
- Qwen 3 offers dense and MoE variants and uses QKNORM to stabilize attention scaling at large scale.
- Its three-stage pretraining includes a long-context stage with ABF, yarn, and dual chunk attention to reach 32k+ contexts.
DeepSeek V3's Scale And Optimizations
- DeepSeek V3 is a huge MoE model trained natively in 8-bit to cut training costs while enabling massive scale.
- V3.1 adds staged long-context training and hybrid thinking mode to improve reasoning and tool use.