Generative AI in the Real World

O'Reilly
undefined
Dec 11, 2025 • 31min

The Year in AI with Ksenia Se

As the founder, editor, and lead writer of Turing Post, Ksenia Se spends her days peering into the emerging future of artificial intelligence. She joined Ben to discuss the current state of adoption: what people are actually doing right now, the big topics that got the most traction this year, and the trends to look for in 2026. Find out why Ksenia thinks the real action next year will be in areas like robotics and embodied AI, spatial intelligence, AI for science, and education.
undefined
Dec 1, 2025 • 32min

The LLMOps Shift with Abi Aryan

MLOps is dead. Well, not really, but for many the job is evolving into LLMOps. In this episode, Abide AI founder and LLMOps author Abi Aryan joins Ben to discuss what LLMOps is and why it’s needed, particularly for agentic AI systems. Listen in to hear why LLMOps requires a new way of thinking about observability, why we should spend more time understanding human workflows before mimicking them with agents, how to do FinOps in the age of generative AI, and more.
undefined
Nov 26, 2025 • 37min

Laurence Moroney on AI at the Edge

In this episode, Laurence Moroney, director of AI at Arm, joins Ben Lorica to chat about the state of deep learning frameworks—and why you may be better off thinking a step higher, on the solution level. Listen in for Laurence’s thoughts about posttraining; the evolution of on-device AI (and how tools like ExecuTorch and LiteRT are helping make it possible); why culturally specific models will only grow in importance; what Hollywood can teach us about LLM privacy; and more.
undefined
Nov 11, 2025 • 38min

Chris Butler on GenAI in Product Management

In this episode, Ben Lorica and Chris Butler, director of product operations for GitHub's Synapse team, chat about the experimentation Chris is doing to incorporate generative AI into the product development process—particularly with the goal of reducing toil for cross-functional teams. It isn’t just automating busywork (although there’s some of that). He and his team have created agents that expose the right information at the right time, use feedback in meetings to develop “straw man” prototypes for the team to react to, and even offer critiques from specific perspectives (a CPO agent?). Very interesting stuff.
undefined
Oct 20, 2025 • 41min

Context Engineering with Drew Breunig

In this episode, Ben Lorica and Drew Breunig, a strategist at the Overture Maps Foundation, talk all things context engineering: what’s working, where things are breaking down, and what comes next. Listen in to hear why huge context windows aren’t solving the problems we hoped they might, why companies shouldn’t discount evals and testing, and why we’re doing the field a disservice by leaning into marketing and buzzwords rather than trying to leverage what current crop of LLMs are actually capable of.
undefined
9 snips
Oct 2, 2025 • 29min

Emmanuel Ameisen on LLM Interpretability

Emmanuel Ameisen, an interpretability researcher who previously worked at Anthropic, shares fascinating insights into large language models. He dives into how these models resemble biological systems, revealing surprising patterns like multi-token planning and shared neurons across languages. Emmanuel discusses the mechanisms behind hallucinations and the importance of model calibration. He also explores practical applications in medicine and offers invaluable advice for developers on understanding and evaluating model behavior.
undefined
Sep 23, 2025 • 33min

Understanding A2A with Heiko Hotz and Sokratis Kartakis

Everyone is talking about agents: single agents and, increasingly, multi-agent systems. What kind of applications will we build with agents, and how will we build with them? How will agents communicate with each other effectively? Why do we need a protocol like A2A to specify how they communicate? Join Ben Lorica as he talks with Heiko Hotz and Sokratis Kartakis about A2A and our agentic future.
undefined
Sep 23, 2025 • 22min

Faye Zhang on Using AI to Improve Discovery

In this episode, Ben Lorica and AI Engineer Faye Zhang talk about discoverability: how to use AI to build search and recommendation engines that actually find what you want. Listen in to learn how AI goes way beyond simple collaborative filtering—pulling in many different kinds of data and metadata, including images and voice, to get a much better picture of what any object is and whether or not it’s something the user would want.
undefined
Sep 22, 2025 • 29min

Luke Wroblewski on When Databases Talk Agent-Speak

Join Luke Wroblewski and Ben Lorica as they talk about the future of software development. What happens when we have databases that are designed to interact with agents and language models rather than humans? We’re starting to see what that world will look like. It’s an exciting time to be a software developer.
undefined
Sep 19, 2025 • 43min

Jay Alammar on Building AI for the Enterprise

Jay Alammar, director and Engineering Fellow at Cohere, joins Ben Lorica to talk about building AI applications for the enterprise, using RAG effectively, and the evolution of RAG into agents. Listen in to find out what kinds of metadata you need when you’re onboarding a new model or agent; discover how an emphasis on evaluation helps an organization improve its processes; and learn how to take advantage of the latest code-generation tools.Timestamps0:00: Introduction to Jay Alammar, director at Cohere. He’s also the author of Hands-On Large Language Models.0:30: What has changed in how you think about teaching and building with LLMs?0:45: This is my fourth year with Cohere. I really love the opportunity because it was a chance to join the team early (around the time of GPT-3). Aidan Gomez, one of the cofounders, was one of the coauthors of the transformers paper. I’m a student of how this technology went out of the lab and into practice. Being able to work in a company that’s doing that has been very educational for me. That’s a little of what I use to teach. I use my writing to learn in public. 2:20: I assume there’s a big difference between learning in public and teaching teams within companies. What’s the big difference?2:36: If you’re learning on your own, you have to run through so much content and news, and you have to mute a lot of it as well. This industry moves extremely fast. Everyone is overwhelmed by the pace. For adoption, the important thing is to filter a lot of that and see what actually works, what patterns work across use cases and industries, and write about those. 3:25: That’s why something like RAG proved itself as one application paradigm for how people should be able to use language models. A lot of it is helping people cut through the hype and get to what’s actually useful, and raise AI awareness. There’s a level of AI literacy that people need to come to grips with. 4:10: People in companies want to learn things that are contextually relevant. For example, if you’re in finance, you want material that will help deal with Bloomberg and those types of data sources, and material aware of the regulatory environment. 4:38: When people started being able to understand what this kind of technology was capable of doing, there were multiple lessons the industry needed to understand. Don’t think of chat as the first thing you should deploy. Think of simpler use cases, like summarization or extraction. Think about these as building blocks for an application. 5:28: It’s unfortunate that the name “generative AI” came to be used because the most important things AI can do aren’t generative: they’re the representation with embeddings that enable better categorization, better clustering, and enabling companies to make sense of large amounts of data. The next lesson was to not rely on a model’s information. In the beginning of 2023, there were so many news stories about the models being a search engine. People expected the model to be truthful, and they were surprised when it wasn’t. One of the first solutions was RAG. RAG tries to retrieve the context that will hopefully contain the answer. The next question was data security and data privacy: They didn’t want data to leave their network. That’s where private deployment of models becomes a priority, where the model comes to the data. With that, they started to deploy their initial use cases. 8:04: Then that system can answer systems to a specific level of difficulty—but with more difficulty, the system needs to be more advanced. Maybe it needs to search for multiple queries or do things over multiple steps. 8:31: One thing we learned about RAG was that just because something is in the context window doesn’t mean the machine won’t hallucinate. And people have developed more appreciation of applying even more context: GraphRAG, context engineering.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app