One LLM to rule them all?

Aug 12, 2025

Ask episode

Chapters

Transcript

Episode notes

In this special episode of the Justified Posteriors Podcast, hosts Seth Benzell and Andrey Fradkin dive into the competitive dynamics of large language models (LLMs). Using Andrey’s working paper, Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming, they explore how quickly new models gain market share, why some cannibalize predecessors while others expand the user base, and how apps often integrate multiple models simultaneously.

Host’s note, this episode was recorded in May 2025, and things have been rapidly evolving. Look for an update sometime soon.

Transcript

Seth: Welcome to Justified Posterior Podcast, the podcast that updates beliefs about the economics of AI and technology. I'm Seth Benzel, possessing a highly horizontally differentiated intelligence—not saying that's a good thing—coming to you from Chapman University in sunny Southern California.

Andrey: And I'm Andrey Fradkin, multi-homing across many different papers I'm working on, coming to you from sunny—in this case—Cambridge, Massachusetts.

Seth: Wow…. Rare, sunny day in Cambridge, Mass. But I guess the sunlight is kind of a theme for our talk today because we're going to try to shed some light on some surprising features of AI, some important features, and yet, not discussed at all. Why don't people write papers about the important part of AI? Andrey, what's this paper about?

Andrey: I agree that not enough work has been done on this very important topic. Look, we can think about the big macroeconomic implications of AI—that's really fun to talk about—but it's also fun to talk about the business of AI. Specifically, who's going to win out? Which models are better than others? And how can we measure these things as they're happening at the moment? And so that's really what this paper is about. It's trying to study how different model providers compete with each other.

Seth: Before we get deep into that—I do want to push back on the idea that this isn't macroeconomically important. I think understanding the kind of way that the industry structure for AI will work will have incredible macroeconomic implications, right? If only for diversity—for equality across countries, right? We might end up in a world where there's just one country or a pair of countries that dominate AI versus a world where the entire world is involved in the AI supply chain and plugging in valuable pieces, and those are two very different worlds.

Andrey: Yeah. So, you're speaking my book, Seth. Being an industrial organization economist, you know, we constantly have this belief that macroeconomists, by thinking so big-picture, are missing the important details about specific industries that are actually important for the macroeconomy.

Seth: I mean—not every specific industry; there's one or two specific industries that I would pay attention to.

Andrey: Have you heard of the cereal industry, Seth?

Seth: The cereal industry?

Andrey: It's important how mushy the cereal is.

Seth: Well, actually, believe it or not, I do have a breakfast cereal industry take that we will get to before the end of this podcast. So, viewers [and] listeners at home, you gotta stay to the end for the breakfast cereal AI economics take.

Andrey: Yeah. And listeners at home, the reason that I'm mentioning cereal is it's of course the favorite. It's the fruit fly of industrial organization for estimating demand specifically. So—a lot of papers have been written about estimating serial demand and other such things

Seth: Ah—I thought it was cars. I guess cars and cereal are the two things.

Andrey: Cars and cereal are the classic go-tos.

Introducing the paper

Seth: Amazing. So, what [REDACTED] wrote the paper we're reading today, Andrey?

Andrey: Well, you know—it was me, dear reader—I wrote the paper.

Seth: So we know who's responsible.

Andrey: All mistakes are my fault, but I should also mention that I wrote it in a week and it's all very much in progress. And so I hope to learn from this conversation, as we—let's say my priors are diffuse enough so that I can still update

Seth: Oh dude, I want you to have a solid prior so we can get at it. But I will say I was very, very inspired by this project, Andrey. I also want to follow in your footsteps. Well, maybe we'll talk about that at the end of the podcast as well. But maybe you can just tell us the title of your paper. Andrey,

Andrey: The title of the paper is Demand for LLMs, and now you're forcing me to remember the title of the—

Seth: If you were an AI, you would remember the title of the paper, maybe.

Andrey: The title of the paper is Demand for LLMs: Descriptive Evidence on Substitution Market Expansion and Multi-Homing. So, I will state three claims, which I do make in the paper.

Seth: Ooh, ooh.

Andrey: And you can tell me your priors.

Seth: Prior on each one. Okay, so give me the abstract; claim number one.

Andrey: So the point number one is that when a new good model gets released, it gets adopted very quickly. Within a few weeks, it achieves kind of a baseline level of adoption. So I think that's fact number one. And that's very interesting because not all industries have such quick adoption cycles.

Seth: Right? It looks more like the movie or the media industry, where you have a release and then boom, everybody flocks to it. That's the sense that I got before reading this paper. So I would put my probability on a new-hot new model coming out; everybody starts trying it—I mean, a lot of these websites just push you towards the new model anyway.

I know we're going to be looking at a very specific context, but if we're just thinking overall. Man, 99% when a new hot new model comes out, people try it.

Andrey: So I'll push back on that. It's the claim that it's not about trying it, like these models achieve an equilibrium level of market penetration. It's not—

Seth: How long? How long is—how long is just trying it? Weeks, months.

Andrey: How long are—sorry, can you repeat that question?

Seth: So you're pushing back on the idea that this is, quote unquote, “just trying the new release.” Right. But what is the timeline you're looking over?

Andrey: It's certainly a few months, but it doesn't take a long time to just try it. So, if it was just trying we'd see us blip over a week, and then it would go back down. And I don't—

Seth: If they were highly horizontally differentiated, but if they were just very slightly horizontally differentiated, you might need a long time to figure it out.

Andrey: You might—that's fair. Okay, so the second claim is: the different models have very different patterns of either substituting away from existing models or expanding the market. And I think two models that really highlight that are Claude 3.7 Sonnet, which primarily cannibalizes from Claude 3.5 Sonnet.

Seth: New Coke,

Andrey: Yes, and it's—well, New Coke failed in this regard.

Seth: Diet Coke,

Andrey: Yeah. And then another model is Google's Gemini 2.0 Flash, which really expanded the market on this platform. A lot of people started using it a lot and didn't seem to have noticeable effects on other model usage.

Seth: Right?

Andrey: So this is kind of showing that kind of models are competing in this interesting space.

Seth: My gosh. Andrey, do you want me to evaluate the claim that you made, or are you now just vaguely appealing to competition? Which of the two do you want me to put a prior on?

Andrey: No no no. Go for it. Yeah.

Seth: All right, so the first one is: do I think that if I look at, you know, a website with a hundred different models, some of them will steal from the same company and some of them will lead to new customers?

Right? I mean with a—I, I'm a little bit… Suppose we asked this question about products and you said, “Professor Benzel, will my product steal from other demands, or will it lead to new customers?” I guess at a certain level, it doesn't even make sense, right? There's a general equilibrium problem here where you always have to draw from something else.

I know we're drawing from other AIs, which would mean that there would have to be some kind of substitution. So I mean, yes, I believe sometimes there's going to be substitution, and yes, I believe sometimes, for reasons that are not necessarily directly connected to the AI model, the rollout of a new model might bring new people into the market.

Right. So I guess I agree. Like at the empirical level, I would say 95% certain that models differ in whether they steal from other models or bring in new people. If you're telling me now there's like a subtler claim here, which is that the fact that some models bring in new people is suggestive of horizontal differentiation and is further evidence for strong horizontal differentiation.

And I'm a little bit, I don't know, I'll put a probability on that, but that's, that seems to be going a little bit beyond the scope of the description.

Andrey: Well, we can discuss that in the discussion session. And I think the final part that I make a claim about is that apps, and the users of apps as well, to multi-home across models. So it's not that people are using just one model. It's not like app developers are using just one model for each application. And that's kind of once again pointing to the fact that there isn't just kind of one superior model even within a given model class.

And, Seth, go for it

Seth: Andrey, you did the thing again. You did the thing again where you said, "Here, Seth, do you want to evaluate this empirical finding?" Or do you want me to now say, “This tells us something about the future of competition in AI'?"

Andrey: Yes, yes, yes. All right, go for it.

Seth: The empirical claim, right? Is—give me the narrow claim? One more time? Give it to me.

Andrey: The apps are multihoming.

Seth: The people multi-home. Okay. The narrow claim is we've got these apps; maybe we'll give the user, the listeners, a little bit of context of what a sample app would be.

Andrey: Yeah, so I think about two types of apps here. One is a coding app, so Klein and RU coder are two quite popular coding apps. And we see that users of those apps are multi-homing. And then—those apps are multi-homing; we don't know as much about the users—and then we have kind of various chat-persona apps. And then we have some kind of utility apps

Seth: Yeah. We'll call them, like—let's call that second group role-play apps.

Andrey: Yeah, yeah. We have kind of like PDF extractor and apps like that, that are also on the—

Seth: Very tool-ly. Okay, cool. Alright, so we've got all these apps out, and now you're going to tell me, Professor Benzel, "I think you would be surprised to find out that RU coder, for example, has both the Claude model underpowering it and an OpenAI model powering it." And that one is probably the thing I'm most surprised by.

Right? I definitely would not be surprised at all to know that RU coder can send its cloud tokens to one data center versus another data center; that makes perfect sense. But the fact that you would sustainably have many different contemporaneous models on the same platform feels like a stage in a process rather than where we're going to end up.

What do I mean by that? So why would you want to keep an old legacy model inside of your RU coder? So I've got—I'm, or Silly Tavern, is one that I like. So Silly Tavern is just, you can do role play and talk to characters and pretend you're going on adventures. Right?

It seems like that Claude 3.7 should just be better than 3.5 at that, right? I really don't—my intuition is that they're not strongly horizontally differentiated. Why would you keep both? It would be for legacy reasons, for backward compatibility. Maybe there's a specific interaction or scenario that you had that you had working in the old version of the app, and you want to make sure that that's still around for new users.

So, how would I think about this? I would think about if you want to say that this is like evidence of multi-homing. This multi-homing evidence is evidence of competition because the same app wants to use multiple versions. I kind of disagree, right? The way I think about it is maybe more like, you know, you're a car, and you can either use the old muffler or the new muffler, and some people have upgraded to the new muffler, but some people are still using the old muffler, and so that car has two different kinds of mufflers.

Andrey: Yeah, we can discuss that, you know, that claim as well. I guess, do you want me to address what I think?

Seth: Well, give me a taste, and then let's go to the evidence. Give me a taste.

Andrey: The multi-homing is not happening on an old and a new version of a model.

It's happening on, let's say, 3.7 and Gemini 2.5, which are both relatively new models. The other thing I'd say is that you read Reddit; there are some users that still like 3.5 better than 3.7.

Seth: On the internet, they will prefer one plain white cotton T-shirt to another plain white cotton T-shirt entry.

Andrey: Who are you to question the preferences? The consumer.

Seth: Right? But I guess, all right, so this is my last comment on the priors, and then we'll get into the evidence, which is. This sort of speculation about what people will actually want in the long run is the bridge that gets us from this cross-sectional evidence about 20 April, 2025, to what the world's going to look like in 2027 and 2028. So that's why I'm pushing back a little bit.

Andrey: Yeah, I don't want to make claims that are too great about 2027 based on this cross section. Yes,

Seth: you know, GDP girl's gonna be at 30%

Andrey: That's true.

Seth: And all of you in labor will be automated.

Andrey: There is going to be a lot of market expansion. I hear.

Seth: Oh, babe, listen to our Epic AI episode. We'll post that before this one so you can see what we're laughing at.

Andrey: All right.

Seth: So tell me, Andrey, I can think of no one better suited to walk us through the evidence of this paper than Professor Fradkin of Boston University.

Andrey: Look, it's very simple paper. It's essentially a few graphs, and the graphs are event studies, where we see what happens to a selected set of models around the time of the release of one of the new models. So for the release of Claude 3.7, we see a very obvious drop in the usage of 3.5. You know, if you ballpark it, it's about 80% cannibalization. And the adoption happens within a few weeks, so it's fairly fast. We also look at Flash 2.0. We see very fast adoption, and in terms of tokens used, Flash 2.0 is the biggest model very quickly. And then, Gemini Pro is another model that that gets released in this time period. And it also sees a very fast adoption curve that doesn't seem to cannibalize other models at this time period. So that's kind of the evidence on cannibalization and market expansion and then the evidence on multi-homing. So there, there's some intricacies with the scraping of the data here. So, actually—let's take a step back. Where does this data come from?

Seth: What is Open Router?

Andrey: We haven't discussed what Open Router is. All right. Look, one of the challenges with studying these issues is a lot of the data sits in these fortresses of data where you cannot extract anything from,

Seth: And we're trying for you listeners; we're banging at that gate. We're banging at that gate every day trying to get in for you.

Andrey: Yes. Yes. So people who are using OpenAI know through the chat app, through the direct open API calls, we're not going to get a lot of visibility into that data. We might get some auxiliary data from credit card providers, payment processors, and the like, but it's hard to know how usage is changing and how specific model usage is changing particularly. One thing that exists is this service called Open Router, and there are other companies that are similar to it. And it's built for, I'd say, a sophisticated user who might be like a software developer who knows that, Hey, you know, I want to use a mix of models, or I might want to change my code to use a different model as—

Seth: Andrey, what's the S word that I'm thinking of here?

Andrey: Substitution; What?

Seth: Selection, you're so this. We're looking under the light of the cult plate, not under the light of the people who want to multi-home.

Andrey: Yes. 100%. But I will say—we're looking—let me just explain what Open Router is, and then we'll talk about selection and whether we care about that or not.

Seth: Oops.

Andrey: Okay. So, so it's a very handy service that allows you to call many different types of models. It also allows you to set rules too. Or like which model to use as a function of things that you might not be thinking about if you're just a chat user, like latency, throughput, uptime, specific pricing, and how it affects prompt tokens versus reasoning tokens versus completion tokens. So it's just a really useful service for this, for the app developer.

Seth: I mean, can I—just to interrupt for a split second here, right? Honestly, I feel like you gave me more evidence for horizontal differentiation in this market just by listing those four different features than you did with almost anything else, right? Because all right, I could see why you would need to balance between latency, price, throughput, quality, et cetera, et cetera.

Andrey: Yeah. So, and there is actually an interesting feature of this market that many do not know: there are multiple companies that serve specific models. So this is obviously true with open-source models, where anyone can serve them. So we have a lot of servers of your Llamas and your Deepseeks. But it's also true of the closed-source models.

For example, Microsoft might serve an OpenAI model, and OpenAI might serve the OpenAI model, and there might be differences in how well they're serving these models.

Seth: Does that mean that Microsoft has to know the model weights, or are theyhidden in some way from them?

Andrey: That's above my pay grade. I—

Seth: We will find out for you.

Andrey: I mean, Microsoft owns a lot of OpenAI, so they have some access.

Seth: Okay.

Andrey: Yeah. So, that's kind of an interesting feature of—

Seth: Mm-hmm.

Andrey: Anyway. One thing that this company does is they publish a lot of data about model usage and how the model usage is changing over time, and also about how specific apps use different models.

In particular for each model, they put the top 20 apps using that model and their usage numbers. So you piece these together, and you can get some pretty good information about popular apps and what models they're using and how much they're using.

Seth: Mm-hmm.

Andrey: And even over time, if you're scraping it continuously—

Seth: Do we know if this is for the apps that list themselves on Open Router? Is this the universe of tokens going through those apps? Do we know that?

Andrey: I think it's a universe of tokens going through those apps, but not all apps are—

Seth: Obviously? Yeah.

Andrey: publicly disclosing it. Even if they are using Open Router.

Seth: Well, it's a fascinating data set, so it's going to show us the price of tokens. It's going to show us which apps are using which tokens, and we're going to get dynamics on that over time. So it seems like a perfect data set. Andrey, your next big contribution is just noticing the data set.

Andrey: It's, you know, to be clear, the ML community knows about this data set as well. You know, in this question of how do we evaluate which models are good and which are not, you know, what we all love is revealed preference.

Seth: Oh, ooh.

Andrey: Use? And an open router has one such ranking, right? That's publicly available. It seems pretty hard to game it, although we can talk about ways one could try to game it. and, that should tell us something about which, which model is better, the very least, which model is on the Pareto frontier? Um. And so has the machine learning community; the AI community has been noticing this. So yeah.

Seth: And then they told you, so then your contribution was the translation to economics.

Andrey: I don't know who told me. The other thing I should say is that now certain companies are releasing stealth models on open router as a way to test them

Seth: Oh,

Andrey: That's also an interesting dynamic to explore. In particular, OpenAI has stealth released some models through there.

Seth: And these would be so if I was running Silly Tavern; it would become apparent to me that there's a GPT-4o version too, and I could play around with it as an option.

Andrey: And there's a new model called Optimus Alpha

Seth: Oh God, did let Elon Musk name this one? Oh my God. Somebody took too much testosterone this morning.

Andrey: Yeah. So, all right. That model gets released for a few weeks. People play around with it, and then it's the new OpenAI model.

Seth: Got it, got it. And then, but but theoretically, normal app users of Silly Tavern might be interacting with this model for a little bit before the official release is therefore

Andrey: Yeah.

Seth: Got it. Okay. Cool.

Andrey: Yeah, so what? What questions do you have, Seth?

Seth: What questions do I have? Andrey, it occurs to me this population of LLM users might not be representative of the model of the market as a whole. How do you respond to that limitation?

Andrey: So, I acknowledge it. I think that's—let me kind of push a little bit. So there are different populations of, what shall we say, heavy LLM users that we can think about. One type of user is your basic consumer, and that type might have a ChatGPT subscription or might even use, you know, the free version or Claude, even though really most of the action is in ChatGPT; we're not talking about that. I think that's very clear. Then, it's a consumer product. We know consumers suffer from very large default effects.

Seth: Right?

Andrey: They're not going to be switching very actively in aggregate. So I don't think this paper is about that at all. The second type of use case that we know a lot about, or we're aware that there's a big use case for, is in programming. Right?

Seth: Mm-hmm.

Andrey: And here I think this is a bit of a more representative sample in a lot of ways. Why, Kline and RU code are are serious programming apps.

Seth: Even though they have silly names.

Andrey: Yes, 100%, and they have features that are essentially at parity with features of VS Code, the programming, the copilot, and VS Code and Cursor, even though, as far as I'm aware, Cursor and Copilot use their own software to route the model calls.

You can also model, you know; you can also do the same things in those apps. So I'd say the coverage I. And the user bases of these apps are quite similar; you might say client and Recode users are a little more sophisticated, but I actually don't think it's that big of a

Seth: They're just a little weirder.

Andrey: They're a little weirder.

Seth: So you think this is very representative of the market for AI tokens? For coding?

Andrey: yes, with, with exception, with a—

Seth: Mm-hmm.

Andrey: The exception is that some companies place severe limitations on the types of models their employees can use. So imagine you're working at Google. I imagine if you're working at Google,

Seth: Gotta use it; you gotta eat your own dog food.

Andrey: You cannot use O3for programming, I assume.

Seth: You cannot generate images of German Nazis. They have to be all-right. That's a callback joke, guys. All right?

Andrey: So then there are these other apps, and there, you know, it's hard, it's hard, you know, to say look, I, if you're, if you're an app developer and you have a single-use app, like a PDF text extractor or something like that, I imagine that you are actively, considering different models, especially trying to optimize your costs

Seth: Mm-hmm.

Andrey: And you may or may not use an open router. I'm not sure; certainly, there might be some selection, and if some apps are less, if there are developers who are less sensitive to these issues, they might not feel the need to use open router.

Seth: But for freelance coding, we think this is representative. All right. Now talk about these other settings, like the tools and the role-playing.

Andrey: Talking about this example, let's say you have a service where you send it a PDF, and it gives you back the structured text.

Seth: Mm-hmm. Mm-hmm.

Andrey: Which is a type of app that you can find on OpenRouter. I doubt that whoever's writing these types of apps is very different whether they use open route or not. I imagine they're considering many models.

Seth: Right. Well, I mean, I guess we're in; we're kind of like in the talk-about-it section, but like you could see a lot of this stuff getting backward built into the platform, right? There's this story, you know, about iPhones. When you started off with an iPhone, there was like a light bulb app that you had to install to get the light to go, but then they built it into a feature of it, right? So, I mean, in the long run is there even a place for something like Open Router, or are these all features that are going to be built right into OpenAI or built right into Anthropic?

Andrey: I guess the feature of being able to use the other models is a feature. I doubt that they'll build into it, but you know, who knows, right?

Seth: Right, but they might give you different versions. There would be the within OpenAI version and then the within Claude version, and they could give you a selection of models.

Andrey: Sure, sure. So if you're like, and I think a lot of big companies do this, if they sign an enterprise contract with OpenAI or Google or Anthropic, they're going to use their models. They might even have forward-deployed engineers that kind of show them how to use the model in the best possible way, how to fine-tune it, and so on.

So I think there's a lot of, if something, if an application requires really close cooperation between the foundation model provider and the application layer, I think we'll see that essentially the different competitors are splitting off into cooperating with different model providers.

Seth: Right. So you think that is one possible future, which is that we end up with much more fragmentation than open router. So there would be, in that universe, multi-homing across models, but not multi-homing across companies.

Andrey: Yeah. I think multi-homing across models versus multi-homing across providers—yeah, we should be kind of clearer about that. And I think the evidence that I have is at least not—it's not just multi-helping within, you know, within OpenAI or within Llama or—

Seth: Ooh. Ooh. We'll have to see about that. All right. Okay. Alright. Other questions I have about this are, you know, not all tokens are created equal, either. I mean, how large a range in prices are people paying for these tokens? Like, what I know is you have a little table of a maximum and minimum, but give the audience a sense of how expensive intelligence can get and how cheap it can get.

Andrey: How expensive and how cheap can it get? so it can be close to free, especially for pretty small models. And it can get pretty expensive. So, there's an output price of 18 per million tokens that exists on this platform. At the time I was looking at it, for example.

Seth: It's still cheaper than my ghostwriter.

Andrey: Yeah, I mean, a million tokens is not nothing for sure. And then, there are differences in input prices and output prices. And there's also something that I haven't captured very well in this data, which is there might be discounts for something called NGS. Things get more complicated the more I look at it in detail.

Seth: Right. And the question is, do these kinds of details suggest concentration, or do the details suggest disillusionment and horizontal differentiation?

Andrey: Yeah.

Seth: Hmm.

Andrey: let's talk a little bit about just some very basic economics of

Seth: What the f**k is competition? Why do we want it?

Andrey: Yeah. So I think first let's first think about the utility, the consumer app developer utility part of this, right? Let's imagine that they have some utility for the different models, but they also have to, you know, pay a price for it. So, the way we think about it is, how much are people willing to pay for the better model? And if we think that things are pretty vertically differentiated, everyone will want to pay more for the same types of models. If we think that things are horizontally differentiated, then different developers will want to pay more for different types of models. And then there's also this question about the scaling thing. Like, yeah, maybe there's a model that's a little bit better than the other model, but it's a lot more expensive, and people are not willing to pay for that. So that might be something going on.

Seth: Hmm.

Andrey: Prices, obviously, are a very important variable to think about, and especially when you can think about them in the following way. Say you have a hard problem. One way to approach it is you throw it to the best model. Another way to approach it is to call a slightly worse model 10 times and then pick the best answer, right? So there's some implicit kind of substitutability that might be present in this.

Seth: But that. Oh man. So now that's so interesting because the story you just told is not a story about horizontal differentiation. Right.

Andrey: yes,

Seth: But it is a reason why you might want lots of different vertically differentiated models.

Andrey: Yes. Yeah.

Seth: Ah huh. So maybe we don't have direct evidence on horizontal differentiation here.

Andrey: For what it's worth. I'm not sure how often these, this pattern, are being used, but it's

Seth: Okay.

Andrey: It's certainly possible. Yeah. And then there's another kind of thing to mention, which is this famous Jevons paradox, which is a paradox.

Seth: I mean, no. Paradox is really a paradox according to my book, Slight of Mind, about why paradoxes are dumb and you should just know all the right answers.

Andrey: Yes. Alright. So, let's say we have an efficiency improvement in our model serving, and we kind of lower our prices by a bit. The response to that might be so large that the total number of tokens used might go up.

Seth: Right?

Andrey: Essentially, the dynamic at hand or the total revenue can go up.

Seth: And so, I mean, it seems like that's happening constantly in this data, which is where we're releasing better and better models and demand just goes up.

Andrey: Yeah. Yeah,

Seth: Which is which provides another challenge for thinking about substitutability because we don't have individual-level data. This is not a static market.

People are entering this market all the time. You gotta be; I mean, the figures you make are quite compelling, like stuff is happening the instant these models are released. But it's also the case that, you know, compositionally, who's in this data is changing and pretty fluid.

Andrey: Yeah. Yeah. it's something I do hope to have more to say about, as I've been scraping at the time, because at least within an app, you might say that the

Seth: It's homogeneous within an app. Yeah. Or maybe you loop together all the coding apps and all the, you know, silly taverns. Okay, cool. Alright. I mean, how much are you in, and how much do you feel like you have to make a claim about horizontal differentiation here?

Andrey: Look, it's hard for me to see multihoming and no and think that there is no horizontal differentiation here.

Seth: Other than price, quantity, differentiation, or price quality,

Andrey: But there, no, no. Sure. But I guess, I guess a point that, you know, you can see in, in, in these figures is that you have, these are pretty similarly priced models in many ways that are being multi-homed.

Seth: The latency is a little bit different. Maybe I'm going to switch back and forth based on latency. There are a lot of different little things here, right?

Andrey: Sure, sure. That's fair. Without having the individual usage data, it's really hard for me to make these finely green claims. I certainly have begged for this data from the CEO of OpenRouter, but so far no cigar.

Seth: Okay, let me push. Let's talk about that a little bit more, right? Which is, if the multi-homing is driven by fluctuations in latency, let's say, right? Like, I don't have strong preferences between Claude and ChatGPT; I just want to call the one that's lower latency. You can definitely get multi-homing there without it being driven by any difference amongst the models.

Andrey: Sure. I guess I think this is very empirically testable. I haven't—the latency is at a five-second level, and just see how much it changes over time.

Seth: There we go.

Andrey: Yes.

Seth: Ooh, ooh. I've given you some more homework, it sounds like.

Andrey: So, I guess if we think that the latency is highly variable across time or the throughput is highly variable over time, then we might see that sort of pattern. If we don't see it being very highly variable over time, then maybe that's less—maybe that's some evidence that it's not quite what's driving it, but yeah.

Seth: Let me tell you what my prior is, so maybe this is like the key part here, right? I have this really strong prior that I did not have; I was not born with it, but I have been trained by talking to AI experts

Andrey: Mm-hmm.

Seth: There’s no such thing as the AI that's good at military stuff versus the AI that's good at writing humanities papers.

That it's all intelligence—you get more of it or less of it. Sure. At the margin there's fine-tuning, there's vibes, but with the right sort of prompt and, you know, with a sufficiently unlocked model, you should be able to; it should be just pure vertical differentiation. That's kind of it; when I've been in rooms with technologists, that's the claim they make.

Now, maybe that's because they're at OpenAI and they're at Anthropic, and it's their incentive for this to be a universe where there's only two big boys. But serious people I've talked to have suggested there isn't such a thing as significant LLM horizontal differentiation.

Andrey: Yeah. I don't believe that. Let's see what they—let's see what they actually do.

Seth: Mm-hmm.

Andrey: OpenAI is constantly updating its default model in ChatGPT. And sometimes they're optimized for one metric, and then they realize that they face a trade-off. So, for example, if your ChatGPT is a little too nice to you, that might lead you to use ChatGPT more, but it might feel ethically dubious for ChatGPT to be encouraging your addiction, given that you totally deserve to be addicted to your phone. So, there's clearly a Pareto frontier of different things that these models can be made to do. Right? So do I. So and so, a lot of experimentation by the companies is the form. is, how do we play on this pato frontier? The existence of Pato Frontier suggests that there isn't just one dimension on which things differ.

Seth: Right. But I guess where I come at this from is, okay, imagine there's like a continuum of steps of delivering the token to the consumer, right? The first step is a $500 billion pre-training run. We, you know, make the giant pre-trained model. The second step is we're going to fine-tune it. We do the RLHF and give my model its particular personality, and it knows it's not allowed to work for terrorists or whatever.

And then there's the third step, which is we're now going to plug that fine-tuned model into an app, and it's going to be deployed in something functional that a consumer can interact with. I guess the way I see it is like as we move down that continuum, this becomes more and more horizontally differentiated, and at the beginning it seems really not horizontally differentiated, and by the end it really is very, you know, you don't want the silly tavern AI, you know, helping you convert PDFs.

Right. So I guess when I hear LLMs are horizontally differentiated, I'm thinking about that pre-training step.

Andrey: Mm-hmm.

Seth: Maybe you want to make a claim about how the usage of AI in apps is horizontally differentiated, which is at the far other end.

Andrey: Sure. Yeah. I, I think that's true. We don't, you know, and you know, we've talked about unhobbling on the show before, and I certainly believe that lots of these models have capabilities that we haven't figured out how to get out of them. Right. They know so

Seth: Right. I've tried really hard to make OpenAI do some of those things, and it's not—it's not as nice as Grok when you ask him to, or

Andrey: Yeah. So, so I think that's right, right? How the application and how these models are used in the application layer can be differentiated even if we think that at the foundational level it's just a ball of clay and some of these balls are bigger clay balls than other balls.

Seth: Oh, right. And when you have smaller clay balls, you can't build the Mona Lisa of play balls. Right. So it's like a capacity thing. Yeah, I mean, it just brings us back to there being a vertical aspect and a horizontal aspect, and the question is like, in the market competition for AIs, where do those two come in? Right? Because in terms of app deployment, you wouldn't expect vertical. I mean, everyone's just going to use the best; they're going to use bottles that are on the Pareto frontier. So you'd expect the horizon, the vertical differentiation, to be less apparent in that last stage. Right?

Andrey: Yeah. I mean it; I do it. It seems to me that models like Gemini 2.5 Pro and 3.7 Sonnet are both on the frontier, but. Some people just like one, and some people like the other. And, and that, that is horizontal differentiation to me.

Seth: Right. And, and now, now you're referring to, like—

Andrey: It's like maybe there's this, like there's a cost difference, and there might be latency differences, and that's really what's driving, you know, the usage patterns.

Seth: Or maybe the prices are identical, and I'm Epsilon horizontally differentiated, and that's enough.

Andrey: Yeah.

Seth: I guess the last thing is that I think my instinct is that horizontal differentiation will become less important over time. Right. So if you think about these balls of clay getting bigger and bigger and bigger, right?

Sculpting them exactly the way you want is going to get easier and easier as you have more and more clay to discard. Do you buy that argument?

Andrey: I think we'll get better at sculpting things over time. I think that it's certainly true. Yeah, and I think that comes back to your question about whether we are going to have horizontal differentiation in the sculpting step. And then the question is, who's going to be sculpting it? Is it going to be app developers sculpting it? Is it still going to be the big labs that sculpt it in various specific ways? Yeah, that.

Seth: Right. I mean, it makes it like if we, if we're doing the sculpting at the app stage, right, there's just a lot more room for horizontal differentiation, right? Because there's a lot more players who are going to be involved, and, you know, that's, that's the domain where, yeah, it does make mean, you know, a dollar to a consumer, whether the interface is blue versus pink and like even stupid s**t like that can support an industry, no offense to, you know, app developers out there.

Okay. So one question that is kind of like the implicit background question in this paper, in my opinion,

Andrey: Okay.

Seth: But it is a prior, which we did not put a probability on, but I just kind of want to ask you, can you come at this with having done this research? It doesn't—you don't have to do it in a prior way, which is like, do you think the market for AI will be, you know, relatively competitive or relatively concentrated in four or five years?

Because I mean, my reading of this paper was like, it's a shot for, it's going to be less concentrated and more competitive than you think.

Andrey: I think it depends a lot on the complementarity of other things.

Seth: There you go. There you go. Speaking of Catherine Tucker, we had her asking her about AI competition. She's like, "Well, you know, I'm Catherine Tucker." Catherine Tucker thing.

Andrey: That is not how she talks.

Seth: She does not talk like that. So I'm not going to try to do my Catherine Tucker voice. But like, her point was like, we know how to do antitrust. It has to do with networks of complementarities and substitution abilities. There's nothing special about AIs. Is that kind of your take?

Andrey: I don't think I'm going to make the claim that we know how to do antitrust of AI. That seems premature, to say the least. I will say that the concentration of the industry is very likely to be determined by complementary integration assets. So how important is it to have that Anthropic engineer sitting at, you know, SAP, the specific molded version of Claude, or a particular application or not? Is it something where. at SAP will just call Open Router, and it's just going to be good enough that way. And they don't have to do specific SaaS contracts with Anthropic or anything like that. and that's hard for me to answer right now. But you know, if I had, if I were a betting man, I would say that there'd be a handful of models that are pretty competitive with each other.

But I don't think there'll be like a thousand models that are competitive with each other.

Seth: Right. That frontier, there's just not, there's not enough room at the top, at the frontier. Just because these trading runs will be so, so expensive. I guess that's kind of—as I was reading this paper, in the back of my head, I'm thinking, you know, like, how many people are going to come up with $500 billion to pre-train their own models?

Right. It—it just seems like there's a maximum of how competitive this industry can get.

Andrey: But I guess so. I would say like five; five is often enough to get a very competitive dynamic. Why do we want competition? It's not just because we want a bunch of competitors, for competitors' sake. We actually want there to be the correct incentives to innovate and then to price fairly, right?

So those are kind of the two things we're trading off. And in industrial organization, there are some results that in certain cases where you want even less than five competitors for the incentives. So that still seems quite competitive, even if there is a lot of concentration.

Seth: Right. I—it's all maybe another way of thinking about this is, suppose we could wave a magic wand and either make AI more horizontally differentiated or make it less horizontally differentiated. Right. We could choose which world we're in.

Andrey: Mm-hmm.

Seth: A world where they're less horizontally differentiated is probably one with faster growth and, you know, fewer implementation costs and less friction. Right.

Andrey: Yeah, I'm not sure. It depends; it depends on how we think about, like, the specific innovation production function. Don't; it's not obvious to me that there's, like, one answer, right? Because you can imagine that in a horizontally differentiated world, more players are going to be able to try to innovate, and because there are more, there are going to be more rents. But if you think that it's all about just that huge run, that one big run,

Seth: Right,

Andrey: Maybe it's that you want it to be vertically differentiated and kind of a winner-take-all dynamic. But, one where the winner can change to from time to time.

Seth: Right. You want a comp, so then we're in a universe where it's competition for the market rather than competition in the market. And that brings its own set of antitrust concerns. Andrey, you know, believe it or not, I took a minute to look at the same data and ask questions right along these lines of, like, how concentrated is this market exactly?

Because reading your paper, it's a paper that's supposed to give me some hints about the competitiveness of the industry. The first thing people ask about an industry is, well, how concentrated is it? Right? So Andrey, what's your sense? Are these models more or less concentrated than a typical industry?

Andrey: Um.

Seth: Industry? And actually I want you to tell me, all right? So I've got three. I'll leave my test on the table here. I've got four HHI indices I'm looking at right now. I've got open wrap. This is for the week, the first week of May. we've got. The number of tokens is called at the AI company level, so it aggregates up to companies.

We got the number of tokens called at the AI app level, so that's like a silly tavern, et cetera, et cetera. Then we've got the number of tokens called at the model level, and then I would like you to compare these two to inequality in motor vehicles and breakfast cereals. So I want you to rank those five from most equal to least equal.

Andrey: Yeah, so I will push back on. You count already; you count like the Met Lamas as being Metas, right? Because Meta is not the one who's serving them. Right. But.

Seth: Ooh. Ooh. Well, I could do providers too. That would be a fourth way to split it.

Andrey: Yes. But generally, yeah. Look, it's more concentrated than these other industries.

Seth: It's pretty concentrated.

Andrey: I'd say more so than I, for I, for all of them, with the model-specific one. Even with that, I'd say it's probably more concentrated than the—

Seth: That one is actually pretty low. So the model, so just, I'll put some numbers out there. Just, ballpark, motor vehicles have an HHI of about 2,500; breakfast cereals are just below that.

Andrey: Mm-hmm.

Seth: The number of tokens at the company level has an HHI of 2960, so it's a little bit higher than those guys. But if we go to the app level, we're at 2160, so that's kind of more competitive than motor vehicles and breakfast cereals, which we think have a decent amount of competition.

And then the model level, so we're going to treat 3.5 and 3.7 differently. We're pretty equal. We're at the 1500 level, which is considered pretty, pretty competitive.

Andrey: competitive. Yeah.

Seth: All right. Does that change your progress, Andrey?

Andrey: Well, I guess I wouldn't have used those industries as a comparison set.

Right? Like, I think a lot of digital infrastructure types of industries have a lot more concentration. So you think about cloud computing or search or phones, right?

Seth: mm-hmm.

Andrey: I think so. Relative to those kinds of industries, it is less concentrated. But certainly compared to physical goods products, it's more, it seems, more concentrated, I guess. I assume that you didn't calculate that HHI per car. Right? So it's kind—

Seth: No, it was not. That was at the company level.

Andrey: Yeah. I mean—you know, disclosure, you know, this, this definitely has been on my to-do list. I just have not gotten around to it. But I don't.

Seth: All right,

Andrey: I don't think that, this changes my, my priors very much, if

Seth: Okay, well, I've got a second fact for you. Second stylized fact. All right, so now I want you to imagine, oh man, I don't know if we have time to start talking. We'll see the power law and probability distributions for the next episode. But let me give you four different things that might be more or less concentrated.

Right? Here's another four things to think about. The concentration of one is 2023 US CompStat companies. One is the open router, AI at the company level. The second is Hugging Face. You know, our hugging face is another website where people will post AI models. This is for free downloads, so these are like public models.

So I have downloads of Hugging Face AI models. And then finally I have all-time movie box office. So you tell me which of these is going to be the most concentrated: hugging-faced AI downloads, open router, AI tokens, 2023 US publicly traded companies, or movie box offices. All the time.

Andrey: This is by the open router one. That's by the model creator.

Seth: I believe that, yeah, at the company level.

Andrey: Okay. Um. I think Open Router is the most concentrated of these.

Seth: Correct. Second most

Andrey: hugging face?

Seth: hugging face, second most, third most

Andrey: I don't know how to think about CompStat HHI. That seems like how—what's the product market? Sorry.

Seth: the product. Oh, CompuStat. It's publicly traded corporations. So it's everything together.

Andrey: oh, you're just combining all the—?

Seth: Yeah, yeah, yeah.

Andrey: Just revenue by revenue.

Seth: No, it's market value. So, you know, implied market,

Andrey: Yeah, I think that'll be three. And then the movies are four.

Seth: Dude, you don't even need data. You got this down.

Andrey: How about those priors?

Seth: Who needs evidence? But okay. What, you see what I'm trying to get out here, Andrey? Right? Which is, you can give me evidence that people are willing to move back and forth, but if it's the most concentrated industry I can find, it seems pretty concentrated.

Andrey: you like a bunch of industries that are more concentrated.

Seth: Alright? Okay, so now we go. All right, so listen, this is going to be a special two-part episode of Justified Posteriors. In the next episode, Professor Benzel will bring his own evidence and analysis to bear on the data from Open Router, and you'll be the judge. Is AI competitive? Is it not competitive?

It's the future you're going to have to live with one way or the other. Andrey, are we ready to talk about our priors a little bit?

Seth: All right. What's yours? So tell us, you had three claims here. I guess you're a hundred percent convinced of all the claims. Again, you wrote them down.

Andrey: Look, my claims are empirical, right?

Seth: Right.

Andrey: no, I'm not saying that they're right, but I, you know, I think

Seth: They're descriptive.

Andrey: They're quite descriptive. Unless I made a scraping error or something like that, I think they're, you know, they are what they are, but the interpretation is obviously up for debate.

Seth: Mm-hmm. Do you want to take a shot at it? Do you want to give me a percentage chance that in two years—I don't know how to say this—let's say AI, the AI industry, will be more or less competitive than the average tech sub-industry? Is that a fair comparison?

Andrey: I don't know what an average tech sub-industry is.

Seth: I know or choose one search. Let's just search. How about searching? That's really unequal. Alright. Alright. So yeah, that's the question.

Andrey: It's going to be more competitive than search. I have no doubt

Seth: Okay. All right. Let's check that in a couple of years.

Andrey: And also more competitive than phone operating systems.

Seth: Yeah, we got two big boys there. That's fair. Okay.

Andrey: Is it going to be more concentrated two years from now than today? I think that's an interesting question.

Seth: You want to take a—is that 50/50 for you? Or, I think it's pretty; I put 90—ninety's too strong—85% of that is more concentrated in the future than now.

Andrey: I do, so it depends on whether we're measuring by revenue or by token.

Seth: Let's do tokens at the company level. Oh, I guess we should do revenue, right? Revenue's the more economical thing you can do with either one.

Andrey: the reason I was asking is, like, I still imagine there's still going to be a ton of use cases for small, cheap models and,

Seth: Yeah. So the most down. Yeah.

Andrey: A very competitive market, right? Like in the sense that it's, that's, people are going to roll up their, put in, in principle, roll up a very good, small model.

It's the big model that we're really worried about right in.

Seth: Right, right. So yeah, so it's like the value-weighted is the one where you'd be really worried about concentration, given that there might be a lot of small toy ones that people f**k around with. But I think—

Andrey: Talk, I don't. I'm not even talking about f*****g around. There are so many—

Seth: Yeah.

Andrey: Like, you could have the model call; you would, right?

Seth: Mm-hmm.

Andrey: you know, every email you're writing in Gmail

Seth: Mm-hmm.

Andrey: For the line of code that you're going through, why not call a cheap model just as a first pass? That might even be the model used to determine whether you want a, you know, more fancy model or something like that.

Seth: Right, right. And you can imagine a universe in which, like those super low-level AI observations, intelligence calls aren't even captured in data because I might be running that locally on my own laptop, right? Yeah—So yeah, so maybe there's some sort of size cutoff above which this, like, becomes interesting and tractable.

Andrey: I mean, I can, yeah. I don't have strong priors on this, I have to say. I could see arguments either way. Maybe 60/40 towards becoming more concentrated in terms of revenue.

Seth: All right. Well, I'm going to try to get Andrey's answer up in the next half of this two-part episode on Concentration in Competition in the AI Industry: Evidence from Open Router. This time it's personal.

Andrey: All right.

Seth: All right. Like, share, and subscribe.

Andrey: Yeah. If you have better data, we're very—

Seth: Give it to us, please. Yo, we'll be your friend. We'll co-author you.

Andrey: Yeah. Just, you'll get such great exposure for your company on this podcast.

Seth: Mm-hmm. Right? We will. And we'll also use your AI to write copy if you have an AI model yourself.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com