GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

139 snips

Aug 29, 2025

Ankit Gupta dives into the latest advancements in open-source language models, focusing on OpenAI's new model and its competition, Qwen and Deepseek. Discover the unique architectural features that differentiate these models, including their approaches to long-context training and reasoning alignment. Gupta highlights how innovative design choices yield similar performance outcomes, showcasing the evolving landscape of AI development. This insightful comparison sheds light on the future of large language models and their practical implications.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

OpenAI's GPT-OSS Design

GPT-OSS is a mixture-of-experts decoder-only transformer with grouped query attention, SWEGLU, ROPE, and RMSNorm.
It natively supports a 131,000 token context and ships quantized for consumer hardware.

INSIGHT

Alibaba's Qwen 3 Approach

Qwen 3 offers dense and MoE variants and uses QKNORM to stabilize attention scaling at large scale.
Its three-stage pretraining includes a long-context stage with ABF, yarn, and dual chunk attention to reach 32k+ contexts.

INSIGHT

DeepSeek V3's Scale And Optimizations

DeepSeek V3 is a huge MoE model trained natively in 8-bit to cut training costs while enabling massive scale.
V3.1 adds staged long-context training and hybrid thinking mode to improve reasoning and tool use.

Get the Snipd Podcast app to discover more snips from this episode

Get the app