Generative AI in the Real World

O'Reilly
undefined
Sep 12, 2025 • 32min

Danielle Belgrave on Generative AI in Pharma and Medicine

Join Danielle Belgrave and Ben Lorica for a discussion of AI in healthcare. Danielle is VP of AI and machine learning at GSK (formerly GlaxoSmithKline). She and Ben discuss using AI and machine learning to get better diagnoses that reflect the differences between patients. Listen in to learn about the challenges of working with health data—a field where there’s both too much data and too little, and where hallucinations have serious consequences. And if you’re excited about healthcare, you’ll also find out how AI developers can get into the field.Points of Interest0:00: Introduction to Danielle Belgrave, VP of AI and machine learning at GSK. Danielle is our first guest representing Big Pharma. It will be interesting to see how people in pharma are using AI technologies.0:49: My interest in machine learning for healthcare began 15 years ago. My PhD was on understanding patient heterogeneity in asthma-related disease. This was before electronic healthcare records. By leveraging different kinds of data, genomics data and biomarkers from children, and seeing how they developed asthma and allergic diseases, I developed causal modeling frameworks and graphical models to see if we could identify who would respond to what treatments. This was quite novel at the time. We identified five different types of asthma. If we can understand heterogeneity in asthma, a bigger challenge is understanding heterogeneity in mental health. The idea was trying to understand heterogeneity over time in patients with anxiety. 4:12: When I went to DeepMind, I worked on the healthcare portfolio. I became very curious about how to understand things like MIMIC, which had electronic healthcare records, and image data. The idea was to leverage tools like active learning to minimize the amount of data you take from patients. We also published work on improving the diversity of datasets. 5:19: When I came to GSK, it was an exciting opportunity to do both tech and health. Health is one of the most challenging landscapes we can work on. Human biology is very complicated. There is so much random variation. To understand biology, genomics, disease progression, and have an impact on how drugs are given to patients is amazing.6:15: My role is leading AI/ML for clinical development. How can we understand heterogeneity in patients to optimize clinical trial recruitment and make sure the right patients have the right treatment?6:56: Where does AI create the most value across GSK today? That can be both traditional AI and generative AI.7:23: I use everything interchangeably, though there are distinctions. The real important thing is focusing on the problem we are trying to solve, and focusing on the data. How do we generate data that’s meaningful? How do we think about deployment?8:07: And all the Q&A and red teaming.8:20: It’s hard to put my finger on what’s the most impactful use case. When I think of the problems I care about, I think about oncology, pulmonary disease, hepatitis—these are all very impactful problems, and they’re problems that we actively work on. If I were to highlight one thing, it’s the interplay between when we are looking at whole genome sequencing data and looking at molecular data and trying to translate that into computational pathology. By looking at those data types and understanding heterogeneity at that level, we get a deeper biological representation of different subgroups and understand mechanisms of action for response to drugs.
undefined
Sep 11, 2025 • 31min

The Startup Opportunity with Gabriela de Queiroz

Ben Lorica and Gabriela de Queiroz, director of AI at Microsoft, talk about startups: specifically, AI startups. How do you get noticed? How do you generate real traction? What are startups doing with agents and with protocols like MCP and A2A? And which security issues should startups watch for, especially if they’re using open weights models?Points of Interest0:30: You work with a lot of startups and founders. How have the opportunities for startups in generative AI changed? Are the opportunities expanding?0:56: Absolutely. The entry barrier for founders and developers is much lower. Startups are exploding—not just the amount but also the interesting things they are doing.1:19: You catch startups when they’re still exploring, trying to build their MVP. So startups need to be more persistent in trying to find differentiation. If anyone can build an MVP, how do you distinguish yourself?1:46: At Microsoft, I drive several strategic initiatives to help growth-stage startups. I also guide them in solving real pain points using our stacks. I’ve designed programs to spotlight founders.3:08: I do a lot of engagement where I help startups go from the prototype or MVP to impact. An MVP is not enough. I need to see a real use case and I need to see some traction. When they have real customers, we see whether their MVP is working.3:49: Are you starting to see patterns for gaining traction? Are they focusing on a specific domain? Or do they have a good dataset?4:02: If they are solving a real use case in a specific domain or niche, this is where we see them succeed. They are solving a real pain, not building something generic. 4:27: We’re both in San Francisco, and solving a specific pain or finding a specific domain means something different. Techie founders can build something that’s used by their friends, but there’s no revenue.5:03: This happens everywhere, but there’s a bigger culture around that here. I tell founders, “You need to show me traction.” We have several companies that started as open source, then they built a paid layer on top of the open source project.5:34: You work with the folks at Azure, so presumably you know what actual enterprises are doing with generative AI. Can you give us an idea of what enterprises are starting to deploy? What is the level of comfort of enterprise with these technologies?6:06: Enterprises are a little bit behind startups. Startups are building agents. Enterprises are not there yet. There’s a lot of heavy lifting on the data infrastructure that they need to have in place. And their use cases are complex. It’s similar to Big Data, where the enterprise took longer to optimize their stack.7:19: Can you describe why enterprises need to modernize their data stack? 7:42: Reality isn’t magic. There’s a lot of complexity in data and how data is handled. There is a lot of data security and privacy that startups aren’t aware of but are important to enterprises. Even the kinds of data—the data isn’t well organized, there are different teams using different data sources.8:28: Is RAG now a well-established pattern in the enterprise?8:44: It is. RAG is part of everybody’s workflow.8:51: The common use cases that seem to be further along are customer support, coding—what other buckets can you add?9:07: Customer support and tickets are among the main pains and use cases. And they are very expensive. So it’s an easy win for enterprises when they move to GenAI or AI agents. 9:48: Are you saying that the tool builders are ahead of the tool buyers?10:05: You’re right. I talk a lot with startups building agents. We discuss where the industry is heading and what the challenges are. If you think we are close to AGI, try to build an agent and you’ll see how far we are from AGI. When you want to scale, there’s another level of difficulty. When I ask for real examples and customers, the majority are not there yet.
undefined
Sep 10, 2025 • 43min

Securing AI with Steve Wilson

Join Steve Wilson and Ben Lorica for a discussion of AI security. We all know that AI brings new vulnerabilities into the software landscape. Steve and Ben talk about what makes AI different, what the big risks are, and how you can use AI safely. Find out how agents introduce their own vulnerabilities, and learn about resources such as OWASP that can help you understand them. Is there a light at the end of the tunnel? Can AI help us build secure systems even as it introduces its own vulnerabilities? Listen to find out.Points of Interest0:49: Now that AI tools are more accessible, what makes LLM and agentic AI security fundamentally different from traditional software security?1:20: There’s two parts. When you start to build software using AI technologies, there is a new set of things to worry about. When your software is getting near to human-level smartness, the software is subject to the same issues as humans: It can be tricked and deceived. The other part is what the bad guys are doing when they have access to frontier-class AIs.2:16: In your work at OWASP, you listed the top 10 vulnerabilities for LLMs. What are the top one or two risks that are causing the most serious problems?2:42: I’ll give you the top three. The first one is prompt injection. By feeding data to the LLM, you can trick the LLM into doing something the developers didn’t intend.3:03: Next is the AI supply chain. The AI supply chain is much more complicated than the traditional supply chain. It’s not just open source libraries from GitHub. You’re also dealing with gigabytes of model weights and terabytes of training data, and you don’t know where they’re coming from. And sites like Hugging Face have malicious models uploaded to them. 3:49: The last one is sensitive information disclosure. Bots are not good at knowing what they should not talk about. When you put them into production and give them access to important information, you run the risk that they will disclose information to the wrong people.4:25: For supply chain security, when you install something in Python, you’re also installing a lot of dependencies. And everything is democratized, so people can do a little on their own. What can people do about supply chain security?5:18: There are two flavors: I’m building software that includes the use of a large language model. If I want to get Llama from Meta as a component, that includes gigabytes of floating point numbers. You need to put some skepticism around what you’re getting.6:01: Another hot topic is vibe coding. People who have never programmed or haven’t programmed in 20 years are coming back. There are problems like hallucinations. With generated code, they will make up the existence of a software package. They’ll write code that imports that. And attackers will create malicious versions of those packages and put them on GitHub so that people will install them.7:28: Our ability to generate code has gone up 10x to 100x. But our ability to security check and quality check hasn’t. For people starting, get some basic awareness of the concepts around application security and what it means to manage the supply chain.7:57: We need a different generation of software composition environment tools that are designed to work with vibe coding and integrate into environments like Cursor.8:44: We have good basic guidelines for users: Does a library have a lot of users? A lot of downloads? A lot of stars on GitHub? There are basic indications. But professional developers augment that with tooling. We need to bring those tools into vibe coding.9:20: What’s your sense of the maturity of guardrails? 9:50: The good news is that the ecosystem around guardrails started really soon after ChatGPT came out. Things at the top of the OWASP Top 10, prompt injection and information disclosure, indicated that you needed to police the trust boundaries around your LLM.
undefined
Sep 9, 2025 • 30min

Shreya Shankar on AI for Corporate Data Processing

Businesses have a lot of data—but most of that data is unstructured textual data: reports, catalogs, emails, notes, and much more. Without structure, business analysts can’t make sense of the data; there is value in the data, but it can’t be put to use. AI can be a tool for finding and extracting the structure that’s hidden in textual data. In this episode, Ben and Shreya talk about a new generation of tooling that brings AI to enterprise data processing.Points of Interest0:18: One of the themes of your work is a specific kind of data processing. Before we go into tools, what is the problem you’re trying to address? 0:52: For decades, organizations have been struggling to make sense of unstructured data. There’s a massive amount of text that people make sense of. We didn’t have the technology to do that until LLMs came around.1:38: I’ve spent the last couple of years building a processing framework for people to manipulate unstructured data with LLMs. How can we extract semantic data?1:55: The prior art would be using NLP libraries and doing bespoke tasks?2:12: We’ve seen two flavors of approach: bespoke code and crowdsourcing. People still do both. But now LLMs can simplify the process.2:45: The typical task is “I have a large collection of unstructured text and I want to extract as much structure as possible.” An extreme would be a knowledge graph; in the middle would be the things that NLP people do. Your data pipelines are designed to do this using LLMs.3:22: Broadly, the tasks are thematic extraction: I want to extract themes from documents. You can program LLMs to find themes. You want some user steering and guidance for what a theme is, then use the LLM for grouping.4:04: One of the tools you built is DocETL. What’s the typical workflow?4:19: The idea is to write MapReduce pipelines, where map extracts insights, and group does aggregation. Doing this with LLMs means that the map is described by an LLM prompt. Maybe the prompt is “Extract all the pain points and any associated quotes.” Then you can imagine flattening this across all the documents, grouping them by the pain points, and another LLM can do the summary to produce a report. DocETL exposes these data processing primitives and orchestrates them to scale up and across task complexity.5:52: What if you want to extract 50 things from a map operation? You shouldn’t ask an LLM to do 50 things at once. You should group them and decompose them into subtasks. DocETL does some optimizations to do this.6:18: The user could be a noncoder and might not be working on the entire pipeline.7:00: People do that a lot; they might just write a single map operation.7:16: But the end user you have in mind doesn’t even know the words “map” and “filter.”7:22: That's the goal. Right now, people still need to learn data processing primitives. 7:49: These LLMs are probabilistic; do you also set the expectations with the user that you might get different results every time you run the pipeline?8:16: There are two different types of tasks. One is where you want the LLM to be accurate and there is an exact ground truth—for example, entity extraction. The other type is where you want to offload a creative process to the LLM—for example, “Tell me what’s interesting in this data.” They’ll run it until there are no new insights to be gleaned.  When is nondeterminism a problem? How do you engineer systems around it?9:56: You might also have a data engineering team that uses this and turns PDF files into something like a data warehouse that people can query. In this setting, are you familiar with lakehouses architecture and the notion of the medallion architecture?10:49: People actually use DocETL to create a table out of PDFs and put it in a relational database. That’s the best way to think about how to move forward in the enterprise setting. I’ve also seen people using these tables in RAG or downstream LLM applications. 
undefined
Sep 8, 2025 • 40min

Vibe Coding with Steve Yegge

Ever since Andrej Karpathy first tweeted it, “vibe coding” has been on every software developer’s mind. Join Ben Lorica and Steve Yegge to find out what vibe coding means, especially in a professional context. Going beyond the current memes, what will the future of software development look like when we have multiple agents? And how do you prepare for it? Don’t push back against AI now; lean into it.Points of Interest0:36: Let’s start with CHOP. What do you mean by “chat-oriented programming,” and how does it change the role of a software developer?1:02: Andrej Karpathy has come up with a more accessible packaging: “vibe coding.” Gene Kim and I are going with the flow in our book, which is also about agentic programming.2:02: The industry has the widest distribution of understanding that I’ve ever seen. We’ve got people saying, “You ought to stop using AI”; we’ve got people refusing to use AI; we’ve got people spread out in what they’re using.3:03: Vibe coding started off as “it’s easy.” But people misinterpreted Karpathy’s tweet to mean that the LLM is ready to write all the code. That’s led to production incidents, “no vibe coding,” and a debate over whether you can turn your brain off.3:35: Google decided to adopt vibe coding because you can do it as a grownup, as an engineer. You don’t have to accept whatever AI gives you. If you’re doing a weekend project or a prototype, you don’t have to look carefully at the output. But if you’re doing production coding, you have to demand excellence of your LLM. You have to demand that it produces code to a professional standard. That’s what Google does now.4:38: Vibe coding means using AI. Agents like Claude Code are pretty much the same. 4:58: There’s traditional AI-assisted coding (completions); with vibe coding, the trust in AI is higher. The developer becomes a high-level orchestrator instead of writing code line by line.5:37: Trust is a huge dimension. It’s the number one thing that is keeping the industry from rocketing forward on adoption. With chat programming, even though it’s been eclipsed by agent programming, you get the LLM to do the work—but you have to validate it yourself. You’re nudging it over and over again. Many senior engineers don’t try hard enough. You wouldn’t boot an intern to the curb for failing the first time.7:18: AI doesn’t work right the first time. You can’t trust anything. You have to validate and verify. This is what people have to get over.7:53: You’re still accountable for the code. You own the code. But people are struggling with the new role, which is being a team lead. This is even more true with coding agents like Claude Code. You’re more productive, but you’re not a programmer any more. 8:51: For people to make the transition to vibe coding, what are some of the core skill sets they'll have to embrace?9:07: Prompt engineering is a separate discipline from CHOP or vibe coding. Prompt engineering is static prompting. It’s for embedding AI in an application. Chat programming is dynamic; lots of throwaway prompts that are only used once. 10:13: Engineers should know all the skills of AI. With the AI Engineering book by Chip Huyen, that’s what engineers need to know. Those are the skills you need to put AI in applications, even if you’re not doing product development.11:15: Or put the book into a RAG system. 12:00: Vibe coding is another skill to learn. Learn it; don’t push back on it. Learn how it works, learn how to push it. Claude Code isn’t even an IDE. The form factor is terrible right now. But if you try it and see how powerful agentic coding is, you’ll be shocked. The agent does all the stuff you used to have to tell it to do.13:57: You’ll say, “Here’s a Jira ticket; fix it for me.” First it will find the ticket; it will evaluate your codebase using the same tools you do; then it will come up with an execution plan. It’s nuts what they are doing. We all knew this was coming, but nobody knew it would be here now.
undefined
Sep 5, 2025 • 33min

Interactions Between Humans and AI with Rajeshwari Ganesan

In this edition of Generative AI in the Real World, Ben Lorica and Rajeshwari Ganesan talk about how to put generative AI in closer touch with human needs and requirements. AI isn’t all about building bigger models and benchmarks. To use it effectively, we need better interfaces; we need contexts that support groups rather than individuals; we need applications that allow people to explore the space they’re working in. Ever since ChatGPT, we’ve assumed that chat is the best interface for AI. We can do better.Points of Interest0:17: We’re both builders and consumers of AI. How does this dual relationship affect how we design interfaces?0:41: A lot of advances happen in the large language models. But when we step back, are these models consumable by users? We lack the kind of user interface we need. With ChatGPT, conversations can go round and round, turn by turn. If you don’t give the right context, you don’t get the right answer. This isn’t good enough.1:47: Model providers go out of their way to coach users, telling them how to prompt new models. All the providers have coaching tips. What alternatives should we be exploring?2:50: We’ve made certain initial starts. GitHub Copilot and mail applications with typeahead don’t require heavy-duty prompting. The AI coinhabits the same workspace as the user. The context is derived from the workspace. The second part is that generative interfaces are emerging. It’s not the content but the experience that’s generated by the machine.5:22: Interfaces are experience. Generate the interface based on what the user needs at any given point. At Infosys, we do a lot of legacy modernization—that’s where you really need good interfaces. We have been able to create interfaces where the user is able to walk into a latent space—an area that gives them an understanding of what they want to explore.7:11: A latent space is an area that is meaningful for the user’s interaction. A space that’s relatable and semantically understandable. The user might say, “Tell me all the modules dealing with fraud detection.” Exploring the space that the user wants is possible. Let’s say I describe various aspects of a project I’m launching. The machine looks at my thought process. It looks at my answers, breaks [them] up part by part, judges the quality of response, and gets into the pieces that need to be better.9:44: One of the things people struggle with is evaluation. Not of a single agent—most tasks require multiple agents because there are different skills and tasks involved. How do we address evaluation and transparency?10:42: When it comes to evaluation, I think in terms of trustworthy systems. A lot of focus on evaluation comes from model engineering. But one critical piece of building trustworthy systems is the interface itself. A human has an intent and is requesting a response. There is a shared context—and if the context isn’t shared properly, you won’t get the right response. Prompt engineering is difficult; if you don’t give the right context, you go in a loop.12:26: Trustworthiness breaks because you’re dependent on the prompt. The coinhabited workspace that takes the context from the environment plays a big role.12:46: Once you give the questions to the machine, the machine gives a response. But if you don’t make a response that is consumable by the user, that’s a problem.13:18: Trustworthiness of systems in the context of agent frameworks is much more complex. Humans don’t just have factual knowledge. We have beliefs. Humans have a belief state, and if an agent doesn’t have access to the belief state, they will get into something called reasoning derailment. If the interface can’t bring belief states to life, you will have a problem.
undefined
Sep 4, 2025 • 32min

Getting Beyond the Demo with Hamel Husain

In this episode, Ben Lorica and Hamel Husain talk about how to take the next steps with artificial intelligence. Developers don’t need to build their own models—but they do need basic data skills. It’s important to look at your data, to discover your model’s weaknesses, and to use that information to develop test suites and evals that show whether your model is behaving well.Links to ResourcesHamel's upcoming course on evaluating LLMs.Hamel's O'Reilly publications: “AI Essentials for Tech Executives” and “What We Learned from a Year of Building with LLMs”Hamel's website.Points of Interest0:39: What inspired you and your coauthors to create a series on practical uses of foundation models? What gaps in existing resources did you aim to address?0:56: We’re publishing “AI Essentials for Tech Executives”¹ now; last year, we published “What We Learned from a Year of Building with LLMs.”² Coming from the perspective of a machine learning engineer or data scientist—you don’t need to build or train models. You can use an API. But there are skills and practices from data science that are crucial.2:16: There are core skills around data analysis and error analysis and basic data literacy that you need to get beyond a demo.2:43: What are some crucial shifts in mindset that you’ve written about on your blog?3:24: The phrase we keep repeating is “look at your data." What does “look at your data" mean?3:51: There’s a process that you should use. Machine learning systems have a lot in common with modern AI. How do you test those? Debug them? Improve them? Look at your data; people fail on this. They do vibe checks, but they don’t really know what to do next.4:56: Looking at your data helps ground everything. Look at actual logs of user interactions. If you don’t have users, generate interactions synthetically. See how your AI is behaving and write detailed notes about failure modes. Do some analysis on those notes: Categorize them. You’ll start to see patterns and your biggest failure modes. This will give you a sense of what to prioritize.6:08: A lot of people are missing that. People aren’t familiar with the rich ecosystem of data tools, so they get stuck. We know that it’s crucial to sample some data and look at it.7:08: It’s also important that you have the domain expert do it with the engineers. On a lot of teams, the domain expert isn’t an engineer.7:44: Another thing is focusing on processes, not tools. Tools aren’t the problem—the problem is that your AI isn’t working. The tools won’t take care of it for you. There’s a process: how to debug, look at, and measure AI. Those are the main mind shifts.9:32: Most people aren’t building models (pretraining); they might be doing posttraining on a base model. But there are a lot of experiments that you still have to run. There’[re] knobs you have to turn, and without the ability to do it systematically and measure, you’re just mindless[ly] turning knobs without learning much.10:29: I’ve held open office hours for people to ask questions about evals. What people ask most is what to eval. There are many components. You can’t and shouldn’t test everything. You should be grounded in your actual failure modes. Prioritize your tests on that.11:30: Another topic is what I call the prototype purgatory. A lot of people have great demos. The demos work, and might even be deployable. But people struggle with pulling the trigger.12:15: A lot of people don’t know how to evaluate their AI systems if they don’t have any users. One way to help yourself is to generate synthetic data. Have an LLM generate realistic user inputs and brainstorm different personas and scenarios. That bootstraps you significantly towards production.13:57: There’s a new open source tool that does something like this for agents. It’s called IntelAgent. It generates synthetic data that you might not come up with yourself.
undefined
Sep 3, 2025 • 27min

Agents—The Next Step in AI with Shelby Heinecke

Join Shelby Heinecke, senior research manager at Salesforce, and Ben Lorica as they talk about agents, AI models that can take action on behalf of their users. Are they the future—or at least the hot topic for the coming year? Where are we with smaller models? And what do we need to improve the agent stack? How do you evaluate the performance of models and agents?About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.Points of Interest0:29: Introduction—Our guest is Shelby Heinecke, senior research manager at Salesforce.0:43: The hot topic of the year is agents. Agents are increasingly capable of GUI-based interactions. Is this my imagination?1:20: The research community has made tremendous progress to make this happen. We’ve made progress on function calling. We’ve trained LLMs to call the correct functions to perform tasks like sending emails. My team has built large action models that, given a task, write a plan and the API calls to execute that. This is one piece. A second piece is when you don’t know the functions a priori, giving the agent the ability to reason about images and video.3:07: We released multimodal action models. They take an image and text and produce API calls. That makes navigating GUIs a reality.3:34: A lot of knowledge work relies on GUI interactions. Is this just robotic process automation rebranded?4:05: We’ve been automating forever. What’s special is that automation is driven by LLMs, and that combination is particularly powerful.4:32: The earlier generation of RPA was very tightly scripted. With multimodal models that can see the screen, they can really understand what’s happening. Now we’re beginning to see reasoning enhanced models. Inference scaling will be important.5:52: Multimodality and reasoning-enhanced models will make agents even more powerful.6:00: I’m very interested in how much reasoning we can pack into a smaller model. Just this week DeepSeek also released smaller distilled versions.7:08: Every month the capability of smaller models has been pushed. Smaller models right now may not compare to large models. But this year, we can push the boundaries.7:38: What’s missing from the agent stack? You have the model—some notion of memory. You have tools that the agent can call. There are agent frameworks. You need monitoring, observability. Everything depends on the model’s capabilities: There’s a lot of fragmentation, and the vocabulary is still unclear. Where do agents usually fall short?9:00: There’s a lot of room for improvement with function calling and multistep function calling. Earlier in the year, it was just single step. Now there’s multistep. That expands our horizons.9:59: We need to think about deploying agents that solve complex tasks that take multiple steps. We will need to think more about efficiency and latency. With increased reasoning abilities, latency increases.10:45: This year, we’ll see small language models and agents come together.10:58: At the end of the day, this is an empirical discipline and you need to come up with your own benchmarks and eval tools. What are you doing in terms of benchmarks and eval?11:36: This is the most critical piece of applied research. You’re deploying models for a purpose. You still need an evaluation set for that use case. As we work with a variety of products, we cocreate evaluation sets with our partners.12:38: We’ve released the CRM benchmark. It’s open. We’ve created CRM-style datasets with CRM-type tasks. You can see the open source models and small models on these leaderboards and how they perform.13:16: How big do these datasets have to be?
undefined
Sep 2, 2025 • 31min

Measuring Skills with Kian Katanforoosh

How do we measure skills in an age of AI? That question has an effect on everything from hiring to productive teamwork. Join Kian Katanforoosh, founder and CEO of Workera, and Ben Lorica for a discussion of how we can use AI to assess skills more effectively. How do we get beyond pass/fail exams to true measures of a person’s ability?Points of Interest0:28: Can you give a sense of how big the market for skills verification is?0:42: It’s extremely large. Anything that touches skills data is on the rise. When you extrapolate university admissions to someone’s career, you realize that there are many times when they need to validate their skills.1:59: Roughly what’s the breakdown between B2B and B2C?2:04: Workera is exclusively B2B and federal. However, there are also assessments focused on B2C. Workera has free assessments for consumers.3:00: Five years ago, there were tech companies working on skill assessment. What were prior solutions before the rise of generative AI?3:27: Historically, assessments have been used for summative purposes. Pass/fail, high stakes, the goal is to admit or reject you. We provided the use of assessments for people to know where they stand, compare themselves to the market, and decide what to study next. That takes different technology.4:50: Generative AI became much more prominent with the rise of ChatGPT. What changed?5:09: Skills change faster than ever. You need to update skills much more frequently. The half-life of skills used to be over 10 years. Today, it’s estimated to be around 2.5 years in the digital area. Writing a quiz is easy. Writing a good assessment is extremely hard. Validity is a concept showing that what you intend to measure is what you are measuring. AI can help.6:39: AI can help with modeling the competencies you want to measure.6:57: AI can help streamline the creation of an assessment.7:22: AI can help test the assessment with synthetic users.7:42: AI can help with monitoring postassessment. There are a lot of things that can go wrong.8:25: Five years ago in program, people used tests to filter people out. That has changed; people will use coding assistants on the job. Why shouldn’t I be able to use a coding assistant when I’m doing an assessment?9:16: You should be able to use it. The assessment has to change. The previous generation of assessments focused on syntax. Do you care if you forgot a semicolon? Assessments should focus on other cognitive levels, such as analyzing and synthesizing information.10:06: Because of generative models, it’s become easier to build an impressive prototype. Evaluation is the hard point. Assessment is all about evaluation, so the bar is much higher for you.10:48: Absolutely. We have a study that calculates the number of skills needed to prototype versus deploy AI. You need about 1,000 skills to prototype AI. You need about 10,000 skills for production AI.12:39: If I want to do skills assessment on an unfamiliar workflow, say full stack web development, what’s your process for onboarding?13:17: We have one agent that’s responsible for competency modeling. You can have a subject-matter expert (SME) share a job description or task analysis or job architecture. We take that information and granularize the tasks worth measuring. At that point, there’s a human in the loop.14:27: Where does AI help? What does the AI need? What would you like to see from people using your tool?15:04: Language models have been trained on pretty much everything online. You can get a pretty good answer from AI. The SME takes that from 80% to 100%. Now, there are issues with that process. We separate the core catalog of skills from the custom catalog, where customers create custom assessments. A standardized assessment lets you benchmark against other people or companies.16:32: If you take a custom assessment, it’s highly relevant to your needs, even though comparisons aren’t possible.16:41: It’s obviously anonymized, right?
undefined
Sep 1, 2025 • 30min

Chloé Messdaghi on AI Security, Policy, and Regulation

Chloé Messdaghi and Ben Lorica discuss AI security—a subject of increasing importance as AI-driven applications roll out into the real world. There’s a knowledge gap: Security workers don’t understand AI, and AI developers don’t understand security. It’s important to be aware of all the resources that are available. Make sure to bring everyone together to develop AI security policies and playbooks, including AI developers and experts. Be aware of all the resources that are available; we expect to see AI security certifications and training becoming available in the coming year.Points of Interest0:24: How does AI security differ from traditional cybersecurity?0:44: AI is a black box: We don’t have transparency to show how AI works or explainability to show how it makes decisions. Black boxes are hard to secure.2:12: There’s a huge knowledge gap. Companies aren’t doing what is needed.2:24: When you talk to executives, do you distinguish between traditional AI and ML and the new generative AI models?2:43: We talk about older models as well. But security is as much about, What am I supposed to do? We’ve had AI for a while, but for some time, security has not been part of that conversation.3:26: Where do security folks go to learn how to secure AI? There are no certifications. We’re playing a massive catchup game.3:53: What’s the state of awareness about incident response strategies for AI?4:15: Even in traditional cybersecurity, we’ve always had an issue of making sure incident response plans aren’t ad hoc or expired. A lot of it is being aware of all the technologies and products that the company has been using. It’s hard to protect if you don’t know everything in your environment.5:19: The AI Threat Landscape report found that 77% of the companies reported breaches in their AI systems.5:40: Last year, a statistic came out about the adoption of AI-related cybersecurity measures. For North America, 70% of the organizations said they did one or two out of five security measures. 24% adopted two to four measures.6:35: What are some of the first things I should be thinking about to update my incident response playbook?6:51: Make sure you have all the right people in the room. We still have issues with department silos. CISOs can be dismissed or not even in the room when it comes to decisions. There are concerns about restricting innovation or product launch dates. You have to have CTOs, data scientists, ML developers, and all the right people to ensure that there is safety and that everyone has taken precautions.7:48: For companies with a mature cybersecurity incident playbook that they want to update for AI, what AI brings is that you have to include more people.8:17: You have to realize that there’s an AI knowledge gap, and that there’s insufficient security training for data scientists. Security folks don’t know where to turn for education. There aren’t a lot of courses or programs out there. We’ll see a lot of that develop this year.10:13: You’d think we’d have addressed communications silos by now, but AI has ripped the bandaids off. There are resources out there. I recommend Databricks’ AI Security Framework (DASF); it’s mapped to the MITRE ATLAS. Also be familiar with the NIST Risk Framework and the OWASP AI Exchange.11:40: This knowledge gap is on both sides. What are some of the best practices for addressing this two-sided knowledge gap?12:20: Be honest about where your company stands. Where are we right now? Are we doing a good job of governance? Am I doing a good enough job as a leader? Is there something I don’t know about the environment? Be the leader who’s a bridge, breaks down silos, knows who owns what, and who’s responsible for what.13:24: One issue is the notion of shadow AI. Knowledge workers go home and use things that aren’t sanctioned by companies. Are there specific things that companies should be doing about shadow AI?

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app