Y Combinator Startup Podcast

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

114 snips
Aug 29, 2025
Ankit Gupta dives into the latest advancements in open-source language models, focusing on OpenAI's new model and its competition, Qwen and Deepseek. Discover the unique architectural features that differentiate these models, including their approaches to long-context training and reasoning alignment. Gupta highlights how innovative design choices yield similar performance outcomes, showcasing the evolving landscape of AI development. This insightful comparison sheds light on the future of large language models and their practical implications.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

OpenAI's GPT-OSS Design

  • GPT-OSS is a mixture-of-experts decoder-only transformer with grouped query attention, SWEGLU, ROPE, and RMSNorm.
  • It natively supports a 131,000 token context and ships quantized for consumer hardware.
INSIGHT

Alibaba's Qwen 3 Approach

  • Qwen 3 offers dense and MoE variants and uses QKNORM to stabilize attention scaling at large scale.
  • Its three-stage pretraining includes a long-context stage with ABF, yarn, and dual chunk attention to reach 32k+ contexts.
INSIGHT

DeepSeek V3's Scale And Optimizations

  • DeepSeek V3 is a huge MoE model trained natively in 8-bit to cut training costs while enabling massive scale.
  • V3.1 adds staged long-context training and hybrid thinking mode to improve reasoning and tool use.
Get the Snipd Podcast app to discover more snips from this episode
Get the app