We’ll keep this brief because we’re on a tight turnaround: GPT 4.1, previously known as the Quasar and Optimus models, is now live as the natural update for 4o/4o-mini (and the research preview of GPT 4.5). Though it is a general purpose model family, the headline features are:
Coding abilities (o1-level SWEBench and SWELancer, but ok Aider)
Instruction Following (with a very notable prompting guide)
Long Context up to 1m tokens (with new MRCR and Graphwalk benchmarks)
Vision (simply o1 level)
Cheaper Pricing (cheaper than 4o, greatly improved prompt caching savings)
We caught up with returning guest Michelle Pokrass and Josh McGrath to get more detail on each!
Full Video Episode
Timestamps
Part 100:00:00 Introduction and Guest Welcome00:00:57 GPT 4.1 Launch Overview00:01:54 Developer Feedback and Model Names00:02:53 Model Naming and Starry Themes00:03:49 Confusion Over GPT 4.1 vs 4.500:04:47 Distillation and Model Improvements00:05:45 Omnimodel Architecture and Future Plans00:06:43 Core Capabilities of GPT 4.100:07:40 Training Techniques and Long Context00:08:37 Challenges in Long Context Reasoning00:09:34 Context Utilization in ModelsPart 200:10:31 Graph Walks and Model Evaluation00:11:31 Real Life Applications of Graph Tasks00:12:30 Multi-Hop Reasoning Benchmarks00:13:30 Agentic Workflows and Backtracking00:14:28 Graph Traversals for Agent Planning00:15:24 Context Usage in API and Memory Systems00:16:21 Model Performance in Long Context Tasks00:17:17 Instruction Following and Real World Data00:18:12 Challenges in Grading Instructions00:19:09 Instruction Following Techniques00:20:09 Prompting Techniques and Model Responses00:21:05 Agentic Workflows and Model PersistencePart 300:22:01 Balancing Persistence and User Control00:22:56 Evaluations on Model Edits and Persistence00:23:55 XML vs JSON in Prompting00:24:50 Instruction Placement in Context00:25:49 Optimizing for Prompt Caching00:26:49 Chain of Thought and Reasoning Models00:27:46 Choosing the Right Model for Your Task00:28:46 Coding Capabilities of GPT 4.100:29:41 Model Performance in Coding Tasks00:30:39 Understanding Coding Model Differences00:31:36 Using Smaller Models for Coding00:32:33 Future of Coding in OpenAIPart 400:33:28 Internal Use and Success Stories00:34:26 Vision and Multi-Modal Capabilities00:35:25 Screen vs Embodied Vision00:36:22 Vision Benchmarks and Model Improvements00:37:19 Model Deprecation and GPU Usage00:38:13 Fine-Tuning and Preference Steering00:39:12 Upcoming Reasoning Models00:40:10 Creative Writing and Model Humor00:41:07 Feedback and Developer Community00:42:03 Pricing and Blended Model Costs00:44:02 Conclusion and Wrap-Up
Get full access to Latent.Space at
www.latent.space/subscribe