Reinforcement Learning in Verifiable Domains

Jonathan explains RL with verifiers and self-play for coding and math, improving models via experiential learning.

Play episode from 24:24

chevron_right

Transcript

chevron_right

Transcript

Episode notes

AI has eaten the internet, data labeling is so over, and $30 trillion of human work is on the verge of automation. Jonathan Siddharth, Founder & CEO of Turing, joins Sourcery to break down the power shift in AI training — from commodity data labeling to expert research — positioning Turing apart from AI data providers like Scale AI, Mercor, & Surge.

Turing has become a hidden force in the AI race, hitting $300M in ARR in 2024 (~3x YoY), achieving profitability, and raising $111M at a $2.2B valuation in March. That growth cements its position as one of the fastest-growing AGI infrastructure companies.

Today, frontier labs like OpenAI, Anthropic, Meta, Google, Microsoft, Nvidia, & Amazon rely on Turing for the frontier data that pushes AI forward across the four pillars of superintelligence:

• Multimodality

• Reasoning

• Tool use

• Coding

We explore Turing’s expansion into the enterprise, closing the “gap” – where Fortune 500s in finance, insurance, and pharma are racing to build proprietary intelligence on their own data, creating durable moats in the $30T knowledge work economy.

PS Jonathan also explains how labs like OpenAI train models:

• Pre-training on filtered internet corpora (Common Crawl, GitHub, books, video)

• Post-training with supervised fine-tuning (human Q&A datasets)

• Reinforcement learning (RLHF + verifiable domains) to align models with human preferences

• Model-breaking data from Turing’s 4M+ engineers to close gaps and advance systems like GPT-5

1. Jonathan Siddharth: https://www.linkedin.com/in/jonsid/

2. Molly O’Shea: ⁠https://x.com/MollySOShea⁠

3. Sourcery: ⁠https://x.com/sourceryvc⁠

Brought to you by:

• Brex—The modern finance platform, combining the world’s smartest corporate card with integrated expense management, banking, bill pay, & travel.

As a Sourcery Listener you get: 75,000 points after spending $3,000 on Brex card(s), white-glove onboarding, $5,000 in AWS credits, $2,500 in OpenAI credits, & access to $180k+ in SaaS discounts. On top of $500 toward Brex travel, $300 in cashback, plus exclusive perks (like billboards..) visit → https://brex.com/sourcery

• Turing—Turing delivers top-tier talent, data, and tools to help AI labs improve model performance—and enables enterprises to turn those models into powerful, production-ready systems. Visit: https://turing.com/sourcery

• Carta—Carta connects founders, investors, and limited partners through software purpose-built for private capital. Trusted by 65,000+ companies in 160+ countries, Carta’s platform of software & services lays the groundwork so you can build, invest, and scale with confidence. Visit: https://carta.com/sourcery

• Kalshi—The largest prediction market and the only legal platform in the US where people can trade directly on the outcomes of future events: https://kalshi.com/sourcery

Follow Sourcery for the latest updates!

https://www.sourcery.vc/

(00:00) AI Ate The Internet

(00:49) Training superintelligence: the race to AGI

(02:31) Viral tweet

(03:24) What Turing actually does

(04:43) The internet data is “used up” — where will new data come from?

(05:34) Four pillars of superintelligence: multimodality, reasoning, tool use, coding

(06:07) Automating $30T of global knowledge work

(09:18) The $1B revenue opportunity

(10:59) Why Turing is a research-first accelerator, not a data labeler

(13:45) Jonathan’s Stanford AI Lab roots and founding DNA