Vladimir Nesov, author known for his insights on AI, dives deep into the future of OpenAI's O3 training systems. He discusses the significance of upcoming investments in scaling AI capabilities with models projected to train at unprecedented FLOP levels. The conversation highlights the balance between data quality and quantity, emphasizing the need for 50 trillion training tokens. Nesov also evaluates the current state of GPT-4 and its competitors, pondering what advancements might emerge by 2028 in this rapidly evolving landscape.
By 2028, OpenAI's O3 is expected to drive advancements in AI training systems with capabilities reaching 5x10^28 FLOPs amidst significant funding support.
The race for AI scalability is intensifying as companies invest in massive data centers to enhance computational efficiency and model performance.
Deep dives
Future Training Systems and Performance
By 2028, significant advancements in training systems are anticipated, with funding now more substantial for developing technologies that can support immense computational demands. OpenAI's O3 may facilitate pre-training models capable of reaching 5x10^28 flops, while the constraints of natural text data will not hinder progress until much larger flops are achieved. This scaling of pre-training is expected to remain robust, demonstrating that leading models like GPT-4 set a benchmark for future developments despite others catching up. The race to enhance these systems is intensifying, with expected outputs surpassing current capabilities as hardware continues to evolve and expand.
Infrastructure Challenges and Scaling Potential
The scaling of AI training systems faces potential slowdowns due to factors like funding, power, and data management, yet large investments are still viable in the near future. Companies like Microsoft and Google are constructing massive data centers equipped to handle gigawatt-scale power needs, which could yield substantial computational outputs exceeding original models by a factor of 250. As network infrastructure develops to support inter-data center connections, the efficiency in scaling these systems becomes critical for achieving the ambitious goals set for 2025 and beyond. Furthermore, the ongoing efforts in optimizing data selection for models indicate that enhancing performance may shift focus from merely increasing data volume to improving data quality and relevance.
Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces, and 80% on ARC-AGI. These systems will be built in 2026-2027 and enable pretraining models for 5e28 FLOPs, while o3 itself is plausibly based on an LLM pretrained only for 8e25-4e26 FLOPs. The natural text data wall won't seriously interfere until 6e27 FLOPs, and might be possible to push until 5e28 FLOPs. Scaling of pretraining won't end just yet.
Reign of GPT-4
Since the release of GPT-4 in March 2023, subjectively there was no qualitative change in frontier capabilities. In 2024, everyone in the running merely caught up. To the extent this is true, the reason might be that the original GPT-4 was probably a 2e25 FLOPs MoE model trained on 20K A100. And if you don't already have a cluster this big, and experience [...]
---
Outline:
(00:52) Reign of GPT-4
(02:08) Engines of Scaling
(04:06) Two More Turns of the Crank
(06:41) Peak Data
The original text contained 3 footnotes which were omitted from this narration.