AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Structured output is a crucial modality for AI engineers, enabling the generation of outputs in predefined formats like JSON. OpenAI's new structured output mode ensures 100% adherence to JSON schema, which addresses past issues with function calling that often resulted in mismatched outputs. Developers highlighted the necessity of precise output formats for integrating AI seamlessly into their systems. Understanding the evolution of these features offers insights into the ongoing improvements in API functionalities that cater specifically to developer needs.
The podcast traces the journey of OpenAI's function calling capability and JSON mode, revealing that these developments emerged from the need for more reliable data outputs. The introduction of JSON mode in November became popular due to the inconsistencies faced with initial function calling attempts. Numerous open-source solutions, such as instructor and lang chain, showcased the creative efforts of developers to overcome these challenges. The timeline of advancements culminated in the implementation of a structured output mode that greatly enhances the reliability of data parsing.
The introduction of constrained sampling in April 2024 significantly enhanced the performance of complex JSON schema adherence, allowing for more exact API responses. By offering a new required parameter for API calls, developers gained better control over structured output quality. The structured output mode further optimized the writing process to ensure improved results, especially in high-stakes applications. As other AI labs introduce similar functionalities, the competitive landscape drives continuous improvement across the industry.
Michelle Pokras emphasized the importance of integrating community feedback in the development of OpenAI's API features. The API models team actively collaborates with developers to refine existing functionalities, particularly for structured outputs and function calling. This iterative approach allows for tailoring the developments to suit real-world applications and addresses pain points faced by users. Frequent discussions with users result in better products, bridging the gap between engineering and practical implementation.
A range of features was discussed beyond the structured output, including advancements in the Assistance API and Whisper integration. Mentioned features like prompt caching and the Advanced Voice Mode API point to an expansive development strategy at OpenAI. Each newly introduced feature reflects a commitment to increasing the utility and accessibility of AI technologies for developers. The insights provided create a clearer picture of the overall roadmap guiding OpenAI's API functionality enhancements.
Engaging with the AI community is a fundamental aspect of OpenAI's growth, as exemplified by the announcement of upcoming meetups and summits. Opportunities for AI engineers to collaborate and share insights are increasingly being established, including specific events taking place in Germany and New York City. These gatherings are positioned to bring together professionals to discuss innovations and applications related to AI engineering. By fostering a vibrant community, OpenAI encourages continuous learning and knowledge sharing.
Michelle Pokras shared her impressive career trajectory, detailing her experiences at notable companies, including Google and Coinbase. Her time at these organizations provided critical insights that shaped her approach to engineering challenges, particularly in scaling and database management. This rich background informed her viewpoints on the importance of reliability and performance in system design. Her narrative highlights the value of diverse experiences in driving innovation within the AI space.
The discussion alluded to several forthcoming features and potential upgrades, including parallel function calling and custom grammars, which will further refine the user experience. OpenAI's commitment to enhancing model capabilities, such as improving the performance of schemas and integrating new functionalities, showcases its proactive approach to research and development. The roadmap details an expansive view of how AI models can be optimized for various applications, encouraging developers to adapt them for their unique needs. These innovations will continue to push the boundaries of what is achievable in AI technology.
Congrats to Damien on successfully running AI Engineer London! See our community page and the Latent Space Discord for all upcoming events.
This podcast came together in a far more convoluted way than usual, but happens to result in a tight 2 hours covering the ENTIRE OpenAI product suite across ChatGPT-latest, GPT-4o and the new o1 models, and how they are delivered to AI Engineers in the API via the new Structured Output mode, Assistants API, client SDKs, upcoming Voice Mode API, Finetuning/Vision/Whisper/Batch/Admin/Audit APIs, and everything else you need to know to be up to speed in September 2024.
This podcast has two parts: the first hour is a regular, well edited, podcast on 4o, Structured Outputs, and the rest of the OpenAI API platform. The second was a rushed, noisy, hastily cobbled together recap of the top takeaways from the o1 model release from yesterday and today.
Building AGI with Structured Outputs — Michelle Pokrass of OpenAI API team
Michelle Pokrass built massively scalable platforms at Google, Stripe, Coinbase and Clubhouse, and now leads the API Platform at Open AI. She joins us today to talk about why structured output is such an important modality for AI Engineers that Open AI has now trained and engineered a Structured Output mode with 100% reliable JSON schema adherence.
To understand why this is important, a bit of history is important:
* June 2023 when OpenAI first added a "function calling" capability to GPT-4-0613 and GPT 3.5 Turbo 0613 (our podcast/writeup here)
* November 2023’s OpenAI Dev Day (our podcast/writeup here) where the team shipped JSON Mode, a simpler schema-less JSON output mode that nevertheless became more popular because function calling often failed to match the JSON schema given by developers.
* Meanwhile, in open source, many solutions arose, including
* Instructor (our pod with Jason here)
* LangChain (our pod with Harrison here, and he is returning next as a guest co-host)
* Outlines (Remi Louf’s talk at AI Engineer here)
* Llama.cpp’s constrained grammar sampling using GGML-BNF
* April 2024: OpenAI started implementing constrained sampling with a new `tool_choice: required` parameter in the API
* August 2024: the new Structured Output mode, co-led by Michelle
* Sept 2024: Gemini shipped Structured Outputs as well
We sat down with Michelle to talk through every part of the process, as well as quizzing her for updates on everything else the API team has shipped in the past year, from the Assistants API, to Prompt Caching, GPT4 Vision, Whisper, the upcoming Advanced Voice Mode API, OpenAI Enterprise features, and why every Waterloo grad seems to be a cracked engineer.
Part 1 Timestamps and Transcript
* [00:00:42] Episode Intro from Suno
* [00:03:34] Michelle's Path to OpenAI
* [00:12:20] Scaling ChatGPT
* [00:13:20] Releasing Structured Output
* [00:16:17] Structured Outputs vs Function Calling
* [00:19:42] JSON Schema and Constrained Grammar
* [00:20:45] OpenAI API team
* [00:21:32] Structured Output Refusal Field
* [00:24:23] ChatML issues
* [00:26:20] Function Calling Evals
* [00:28:34] Parallel Function Calling
* [00:29:30] Increased Latency
* [00:30:28] Prompt/Schema Caching
* [00:30:50] Building Agents with Structured Outputs: from API to AGI
* [00:31:52] Assistants API
* [00:34:00] Use cases for Structured Output
* [00:37:45] Prompting Structured Output
* [00:39:44] Benchmarking Prompting for Structured Outputs
* [00:41:50] Structured Outputs Roadmap
* [00:43:37] Model Selection vs GPT4 Finetuning
* [00:46:56] Is Prompt Engineering Dead?
* [00:47:29] 2 models: ChatGPT Latest vs GPT 4o August
* [00:50:24] Why API => AGI
* [00:52:40] Dev Day
* [00:54:20] Assistants API Roadmap
* [00:56:14] Model Reproducibility/Determinism issues
* [00:57:53] Tiering and Rate Limiting
* [00:59:26] OpenAI vs Ops Startups
* [01:01:06] Batch API
* [01:02:54] Vision
* [01:04:42] Whisper
* [01:07:21] Voice Mode API
* [01:08:10] Enterprise: Admin/Audit Log APIs
* [01:09:02] Waterloo grads
* [01:10:49] Books
* [01:11:57] Cognitive Biases
* [01:13:25] Are LLMs Econs?
* [01:13:49] Hiring at OpenAI
Emergency O1 Meetup — OpenAI DevRel + Strawberry team
the following is our writeup from AINews, which so far stands the test of time.
o1, aka Strawberry, aka Q*, is finally out! There are two models we can use today: o1-preview (the bigger one priced at $15 in / $60 out) and o1-mini (the STEM-reasoning focused distillation priced at $3 in/$12 out) - and the main o1 model is still in training. This caused a little bit of confusion.
There are a raft of relevant links, so don’t miss:
* the o1 Hub
* the o1-preview blogpost
* the o1-mini blogpost
* the technical research blogpost
* the o1 system card
* the platform docs
* the o1 team video and contributors list (twitter)
Inline with the many, many leaks leading up to today, the core story is longer “test-time inference” aka longer step by step responses - in the ChatGPT app this shows up as a new “thinking” step that you can click to expand for reasoning traces, even though, controversially, they are hidden from you (interesting conflict of interest…):
Under the hood, o1 is trained for adding new reasoning tokens - which you pay for, and OpenAI has accordingly extended the output token limit to >30k tokens (incidentally this is also why a number of API parameters from the other models like temperature and role and tool calling and streaming, but especially max_tokens is no longer supported).
The evals are exceptional. OpenAI o1:
* ranks in the 89th percentile on competitive programming questions (Codeforces),
* places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME),
* and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).
You are used to new models showing flattering charts, but there is one of note that you don’t see in many model announcements, that is probably the most important chart of all. Dr Jim Fan gets it right: we now have scaling laws for test time compute, and it looks like they scale loglinearly.
We unfortunately may never know the drivers of the reasoning improvements, but Jason Wei shared some hints:
Usually the big model gets all the accolades, but notably many are calling out the performance of o1-mini for its size (smaller than gpt 4o), so do not miss that.
Part 2 Timestamps
* [01:15:01] O1 transition
* [01:16:07] O1 Meetup Recording
* [01:38:38] OpenAI Friday AMA recap
* [01:44:47] Q&A Part 2
* [01:50:28] O1 Demos
Demo Videos to be posted shortly
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode