

Inference by Turing Post
Turing Post
Inference is Turing Post’s way of asking the big questions about AI — and refusing easy answers. Each episode starts with a simple prompt: “When will we…?” – and follows it wherever it leads.Host Ksenia Se sits down with the people shaping the future firsthand: researchers, founders, engineers, and entrepreneurs. The conversations are candid, sharp, and sometimes surprising – less about polished visions, more about the real work happening behind the scenes.It’s called Inference for a reason: opinions are great, but we want to connect the dots – between research breakthroughs, business moves, technical hurdles, and shifting ambitions.If you’re tired of vague futurism and ready for real conversations about what’s coming (and what’s not), this is your feed. Join us – and draw your own inference.
Episodes
Mentioned books

7 snips
Aug 23, 2025 • 26min
When Will Inference Feel Like Electricity? Lin Qiao, co-founder & CEO of Fireworks AI
In this engaging conversation, Lin Qiao, co-founder and CEO of Fireworks AI and former head of PyTorch at Meta, shares her insights on the current AI landscape. She discusses the unexpected perils of achieving product-market fit in generative AI and the hidden costs of GPU usage. Lin highlights that 2025 may see the rise of AI agents across many sectors. She also delves into the pros and cons of open versus closed AI models, especially regarding innovations from Chinese labs. Finally, she shares her personal journey of overcoming fears.

Aug 23, 2025 • 25min
How to Make AI Actually Do Things | Alex Hancock, Block, Goose, MCP Steering Committee
In this discussion, Alex Hancock, a Senior Software Engineer at Block and key player in the development of Goose, shares insights into the emerging Model Context Protocol (MCP). Exploring how MCP transforms AI from mere models into functional agents, he emphasizes the importance of open governance and context management. They dive into challenges in API development and the necessity for intuitive AI interfaces. Alex also reveals his expectations for AGI's incremental arrival and how a long-term mindset shapes his contributions to AI infrastructure.

Aug 23, 2025 • 24min
Beyond the Hype: What Silicon Valley Gets Wrong About RAG. Amr Awadallah, founder & CEO of Vectara
In this episode of Inference, I sit down with Amr Awadallah – founder & CEO of Vectara, founder of Cloudera, ex-Google Cloud, and the original builder of Yahoo’s data platform – to unpack what’s actually happening with retrieval-augmented generation (RAG) in 2025.
We get into why RAG is far from dead, how context windows mislead more than they help, and what it really takes to separate reasoning from memory. Amr breaks down the case for retrieval with access control, the rise of hallucination detection models, and why DIY RAG stacks fall apart in production.
We also talk about the roots of RAG, Amr’s take on AGI timelines and what science fiction taught him about the future.
If you care about truth in AI, or you're building with (or around) LLMs, this one will reshape how you think about trustworthy systems.
Did you like the episode? You know the drill:
📌 Subscribe for more conversations with the builders shaping real-world AI.
💬 Leave a comment if this resonated.
👍 Like it if you liked it.
🫶 Thank you for watching and sharing!
Guest:
Amr Awadallah, Founder and CEO at Vectara
https://www.linkedin.com/in/awadallah/
https://x.com/awadallah
https://www.vectara.com/
📰 Want the transcript and edited version?
Subscribe to Turing Post: https://www.turingpost.com/subscribe
Chapters
00:00 – Intro
00:44 – Why RAG isn’t dead (despite big context windows)
01:59 – Memory vs reasoning: the case for retrieval
02:45 – Retrieval + access control = trusted AI
06:51 – Why DIY RAG stacks fail in production
09:46 – Hallucination detection and guardian agents
13:14 – Open-source strategy behind Vectara
16:08 – Who really invented RAG?
17:30 – Can hallucinations ever go away?
20:27 – What AGI means to Amr
22:09 – Books that shaped his thinking
Turing Post is a newsletter about AI's past, present, and future. Publisher Ksenia Se explores how intelligent systems are built – and how they’re changing how we think, work, and live.
Sign up (Jensen Huang is already in): https://www.turingpost.com
Things mentioned during the interview:
Hughes Hallucination Evaluation Model (HHEM) Leaderboard https://huggingface.co/spaces/vectara/leaderboard
HHEM 2.1: A Better Hallucination Detection Model and a New Leaderboard
https://www.vectara.com/blog/hhem-2-1-a-better-hallucination-detection-model
HCMBench: an evaluation toolkit for hallucination correction models
https://www.vectara.com/blog/hcmbench-an-evaluation-toolkit-for-hallucination-correction-models
Books:
Foundation series by Isaac Asimov https://en.wikipedia.org/wiki/Foundation_(novel_series)
Sapiens: A Brief History of Humankind Hardcover by Yuval Noah Harari https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari/dp/0062316095
Setting the Record Straight on who invented RAG
https://www.linkedin.com/pulse/setting-record-straight-who-invented-rag-amr-awadallah-8cwvc/
Follow us:
https://x.com/TheTuringPost
https://www.linkedin.com/in/ksenia-se
https://huggingface.co/Kseniase

Aug 23, 2025 • 27min
AI CHANGED THE WEB. Here’s How to Build for It | A conversation with Linda Tong, CEO of Webflow
Linda Tong, CEO of Webflow, is reshaping the web to accommodate the growing influence of bots. She discusses the rise of non-human traffic and the need for 'agent-first' design, emphasizing how websites can cater to both AI agents and human visitors. Linda introduces the concept of agentic engine optimization (AEO) as a new SEO strategy. She also reflects on the importance of dynamic, personalized experiences and shares leadership insights inspired by 'Ender’s Game.' Get ready for a fast-paced, thought-provoking conversation about the future of web design!

Jun 29, 2025 • 19min
When Will We Fully Trust AI to Lead? A conversation with Eric Boyd, CVP of AI Platform
At Microsoft Build, I actually sat down with Eric Boyd, Corporate Vice President leading engineering for Microsoft’s AI platform, to talk about what it really means to build AI infrastructure that companies can trust – not just to assist, but to act. We get into the messy reality of enterprise adoption, why trust is still the bottleneck, and what it will take to move from copilots to fully autonomous agents.We cover:
- When we'll trust AI to run businesses
- What Microsoft learned from early agent deployments
- How AI makes life easier
- The architecture behind GitHub agents (and why guardrails matter)
- Why developer interviews should include AI tools
- Agentic Web, NLweb, and the new AI-native internet
- Teaching kids (and enterprises) how to use powerful AI safely
- Eric’s take on AGI vs “just really useful tools”
If you’re serious about deploying agents in production, this conversation is a blueprint. Eric blends product realism, philosophical clarity, and just enough dad humor. I loved this one.
Did you like the episode? You know the drill:
📌 Subscribe for more conversations with the builders shaping real-world AI.
💬 Leave a comment if this resonated.
👍 Like it if you liked it.
🫶 Thank you for watching and sharing!
Guest:
Eric Boyd, CVP of AI platform at Microsoft
https://www.linkedin.com/in/emboyd/
📰 Want the transcript and edited version?
Subscribe to Turing Post https://www.turingpost.com/subscribe
Chapters
0:00 The big question: When will we trust AI to run our businesses?
1:28 From code-completions to autonomous agents – the developer lens
2:15 Agent acts like a real dev and succeeds
3:25 AI taking over tedious work
3:32 Building trustworthy AI vs. convincing stakeholders to trust it
4:46 Copilot in the enterprise: early lessons and the guard-rail mindset
6:17 What is Agentic Web?
7:55 Parenting in the AI age
9:41 What counts as AGI?
11:32 How developer roles are already shifting with AI
12:33 Timeline forecast for 2-5 years re
13:33 Opportunities and concerns
15:57 Enterprise hurdles: identity, governance, and data-leak safeguards
16:48 Books that shaped the guest
Turing Post is a newsletter about AI's past, present, and future. We explore how intelligent systems are built – and how they’re changing how we think, work, and live.
Sign up (Jense Huang is already in): Turing Post: https://www.turingpost.com
Follow us
Ksenia and Turing Post:
https://x.com/TheTuringPost
https://www.linkedin.com/in/ksenia-se
https://huggingface.co/Kseniase

Jun 19, 2025 • 29min
Why AI Still Needs Us? A conversation with Olga Megorskaya, CEO of Toloka
In this episode, I sit down with Olga Megorskaya, CEO of Toloka, to explore what true human-AI co-agency looks like in practice. We talk about how the role of humans in AI systems has evolved from simple labeling tasks to expert judgment and co-execution with agents – and why this shift changes everything.We get into:
- Why "humans as callable functions" is the wrong metaphor – and what to use instead
- What co-agency really means?
- Why some data tasks now take days, not seconds – and what that says about modern AI
- The biggest bottleneck in human-AI teamwork (and it’s not tech)
- The future of benchmarks, the limits of synthetic data, and why it is important to teach humans to distrust AI
- Why AI agents need humans to teach them when not to trust the plan
If you're building agentic systems or care about scalable human-AI workflows, this conversation is packed with hard-won perspective from someone who’s quietly powering some of the most advanced models in production. Olga brings a systems-level view that few others can – and we even nerd out about Foucault’s Pendulum, the power of text, and the underrated role of human judgment in the age of agents.
Did you like the episode? You know the drill:
📌 Subscribe for more conversations with the builders shaping real-world AI.
💬 Leave a comment if this resonated.
👍 Like it if you liked it.
🫶 Thank you for watching and sharing!
Guest:
Olga Megorskaya, CEO of Toloka
📰 Want the transcript and edited version?
Subscribe to Turing Post https://www.turingpost.com/subscribe
Chapters
0:00 – Intro: Humans as Callable Functions?
0:33 – Evolving with ML: From Crowd Labeling to Experts
3:10 – The Rise of Deep Domain Tasks and Foundational Models
5:46 – The Next Phase: Agentic Systems and Complex Human Tasks
7:16 – What Is True Co-Agency?
9:00 – Task Planning: When AI Guides the Human
10:39 – The Critical Skill: Knowing When Not to Trust the Model
13:25 – Engineering Limitations vs. Judgment Gaps
15:19 – What Changed Post-ChatGPT?
18:04 – Role of Synthetic vs. Human Data
21:01 – Is Co-Agency a Path to AGI?
25:08 – How To Ensure Safe AI Deployment
27:04 – Benchmarks: Internal, Leaky, and Community-Led
28:59 – The Power of Text: Umberto Eco and AI
Turing Post is a newsletter about AI's past, present, and future. Publisher Ksenia Semenova explores how intelligent systems are built – and how they’re changing how we think, work, and live.
Sign up: Turing Post: https://www.turingpost.com
If you’d like to keep followingOlga and Toloka:
https://www.linkedin.com/in/omegorskaya/
https://x.com/TolokaAI
Ksenia and Turing Post:
https://x.com/TheTuringPost
https://www.linkedin.com/in/ksenia-se
https://huggingface.co/Kseniase

May 30, 2025 • 28min
When Will We Train Once and Learn Forever? Insights from Dev Rishi, CEO and co-founder @Predibase
In this engaging discussion, Devvret Rishi, CEO and co-founder of Predibase, dives into the future of AI modeling. He explains the revolutionary concept of continuous learning and reinforcement fine-tuning (RFT), which could surpass traditional methods. Dev shares insights on the challenges of inference in production and the significance of specialized models over generalist ones. He addresses the gaps in open-source model evaluation and offers a glimpse into the smarter, more agentic AI workflows on the horizon.

May 19, 2025 • 31min
When Will We Give AI True Memory? A conversation with Edo Liberty, CEO and founder @ Pinecone
What happens when one of the architects of modern vector search asks whether AI can remember like a seasoned engineer, not a gold‑fish savant? In this episode, Edo Liberty – founder & CEO of Pinecone and one‑time Amazon scientist – joins me to discuss true memory in LLMs. We unpack the gap between raw cognitive skill and workable knowledge, why RAG still feels pre‑ChatGPT, and the breakthroughs needed to move from demo‑ware to dependable memory stacks.
Edo explains why a vector database needs to be built from the ground (and then rebuilt many times), that storage – not compute – has become the next hardware frontier, and predicts a near‑term future where ingesting a million documents is table stakes for any serious agent. We also touch the thorny issues of truth, contested data, and whether knowledgeable AI is an inevitable waypoint on the road to AGI.
Whether you wrangle embeddings for a living, scout the next infrastructure wave, or simply wonder how machines will keep their facts straight, this conversation will sharpen your view of “memory” in the age of autonomous agents.
Let’s find out when tomorrow’s AI will finally remember what matters.
(CORRECTION: the opening slide introduces Edo Liberty as a co-founder. We apologize for this error: Edo Liberty is the Founder and CEO of Pinecone.)
Did you like the video? You know what to do:
Subscribe to the channel.
Leave a comment if you have something to say.
Like it if you liked it.
That’s all.
Thanks.
Guest:
Edo Liberty, CEO and founder at Pinecone
Website: https://www.pinecone.io/
Additional Reading:
https://www.turingpost.com/
Chapters
00:00 Intro & The Big Question – When will we give AI true memory?
01:20 Defining AI Memory and Knowledge
02:50 The Current State of Memory Systems in AI
04:35 What’s Missing for “True Memory”?
06:00 Hardware and Software Scaling Challenges
07:45 Contextual Models and Memory-Aware Retrieval
08:55 Query Understanding as a Task, Not a String
10:00 Pinecone’s Full Stack Approach
11:00 Commoditization of Vector Databases?
13:00 When Scale Breaks Your Architecture
15:00 The Rise of Multi-Tenant & Micro-Indexing
17:25 Dynamically Choosing the Right Indexing Method
19:05 Infrastructure for Agentic Workflows
20:15 The Hard Questions: What is Knowledge?
21:55 Truth vs Frequency in AI
22:45 What is “Knowledgeable AI”?
23:35 Is Memory a Path to AGI?
24:40 A Book That Shaped a CEO – *Endurance* by Shackleton
26:45 What Excites or Worries You About AI’s Future?
29:10 Final Thoughts: Sea Change is Here
In Turing Post we love machine learning and AI so deeply that we cover it extensively from all perspectives: past of it, its present, and our joint-future. We explain what happens the way you will understand.
Sign up: Turing Post: https://www.turingpost.com
FOLLOW US
Edo Liberty: https://www.linkedin.com/in/edo-liberty-4380164/
Pinecone: https://x.com/pinecone
Ksenia and Turing Post:
Hugging Face: https://huggingface.co/Kseniase
Turing Post: https://x.com/TheTuringPost
Ksenia: https://x.com/Kseniase_
Linkedin:
TuringPost: https://www.linkedin.com/company/theturingpost
Ksenia: https://www.linkedin.com/in/ksenia-se

May 2, 2025 • 20min
When Will We Stop Coding? A conversation with Amjad Masad, CEO and co-founder @ Replit
What happens when the biggest advocate for coding literacy starts telling people not to learn to code? In this episode, Amjad Masad, CEO and co-founder at Replit, joins me to talk about his controversial shift in thinking – from teaching millions how to code to building agents that do it for you. Are we entering a post-coding world? What even is programming when you're just texting with a machine?We talk about Replit's evolving vision, how software agents are already powering real businesses, and why the next billion-dollar startups might be solo founders augmented by AI. Amjad also shares what still stands in the way of fully autonomous agents, how AGI fits into his long-term view, and why open source still matters in the age of AI.
Whether you're a developer, founder, or just AI-curious, this conversation will make you rethink what it means to “build software” in 2025.
Did you like the video? You know what to do:
Subscribe to the channel.
Leave a comment if you have something to say.
Like it if you liked it.
That’s all.
Thanks.
Guest:
Amjad Masad, CEO and co-founder at Replit
Website: https://replit.com/~
Additional Reading:
https://www.turingpost.com/p/amjad
Chapters
00:00 Why Amjad changed his mind about coding
00:55 From code to agents: the next abstraction layer
02:05 Cognitive dissonance and the birth of Replit agents
03:38 Agent V3: toward fully autonomous software developers
04:51 Engineering platforms for long-running agents
05:30 Do agents actually work in 2025?
05:48 Real-world examples: Replit agents in action
06:36 Is Replit still a coding platform?
07:43 Why code generation beats no-code platforms
08:22 Can AI agents really create billionaires?
10:59 Every startup is now an AI startup
12:31 Solo founders and the rise of one-person AI companies
14:00 What Amjad thinks AGI really is
17:46 Replit as a habitat for AI
19:50 Open source tools vs internal no-code systems
21:02 Replit's evolving community vision
22:19 MCP vs A2A: who’s winning the protocol game
23:48 The books that shaped Amjad’s thinking about AI
25:47 What excites Amjad most about an AI-powered future
Turing Post is a newsletter about AI's past, present, and future. Publisher Ksenia Semenova explores how intelligent systems are built – and how they’re changing how we think, work, and live.
Sign up: Turing Post: https://www.turingpost.com
FOLLOW US
Amjad: https://x.com/amasad
Replit: https://x.com/replit
Ksenia and Turing Post:
Hugging Face: https://huggingface.co/KseniaseTuring Post: https://x.com/TheTuringPost
Ksenia: https://x.com/Kseniase_
Linkedin:
TuringPost: https://www.linkedin.com/company/theturingpost
Ksenia: https://www.linkedin.com/in/ksenia-se

Apr 30, 2025 • 34min
When Will We Solve AI Hallucinations? A conversation with Sharon Zhou, CEO @ Lamini
In the episode 001: the incredible Sharon Zhou, co-founder and CEO of Lamini. She’s a generative AI trailblazer, a Stanford-trained protégé of Andrew Ng – who, along with Andrej Karpathy and others, is also an investor in her company Lamini. From co-creating one of Coursera’s top AI courses to making MIT’s prestigious “35 under 35” list, Sharon turns complex tech into everyday magic.She is also super fun to talk to!
We discussed:
– How to empower developers to understand and work with AI
– Lamini's technical approach to AI hallucinations (it's solvable!)
– Why benchmarks ≠ reality
– A notable industry use case and the importance of focusing on objective outputs: Subjective goals confuse it!
– And one of my favourite moments: Sharon crushes two of the hottest topics – agents and RAG. Turns out researchers don’t understand why there’s all this hype around these two.
– We also talked about open-source and its importance.
– And last but not least, Sharon (who teaches millions on Coursera) shared how to fight the lack of knowledge about AI. Her recipe: lower the barrier to entry, help people level up – plus memes!
Please give this video a watch and tell us what you think!
Likes and subscribing to the channel are hugely appreciated.
00:00 Intro & Sharon Zhou’s Early Days in GenAI
01:25 Maternal Instincts for AI Models
02:42 From Classics to Code: Language, Product, and AI
04:30 The Spark Behind Lamini
07:45 Solving Hallucinations at a Technical Level
09:20 Benchmarks That Matter to Enterprises
11:58 Staying Technical as a Founder
13:27 The Agent & RAG Hype: Industry Misconceptions
18:44 Use Cases: From Colgate to Cancer Research
20:07 The Power of Objective Use Cases
22:28 What Comes After Hallucinations?
23:21 Following AI Research (and When It’s Useful)
26:23 Open Source & Model Ownership Philosophy
28:06 Bringing AI Education to Everyone
32:36 AI Natives & Edutainment for the Next Gen
34:18 Outro
Lamini
Website - https://www.lamini.ai
Twitter - https://x.com/laminiai
Sharon Zhou
LinkedIn - https://www.linkedin.com/in/zhousharon/
Twitter - https://x.com/realSharonZhou/
Turing Post
Website - https://www.turingpost.com/
Twitter - https://x.com/TheTuringPost
Ksenia Se (publisher)
LinkedIn - https://www.linkedin.com/in/ksenia-se
Twitter - https://x.com/kseniase_