Speaker 2
of the things that we notice that, you know, at Waste and Biases, when people, you know, launch a feature like, like Co-builder, like, like you're talking about is that, um, there's a little bit of a cultural shift in the organization going from kind of a more deterministic product to a product where like it may or may not work. Like it kind of looks maybe more like growth hacking or something like that. Where it's unlikely that like Co-builder is gonna make like any application that you know somebody might dream of right. There's kind of a range of things where it works. You might not know exactly what that surface area is. Is the non-determinism of the underlying LLMs that you're using changing the way you think about software development?
Speaker 1
I get your question. It's something I think about a lot, right? Because if you think about a traditional piece of code, like I must hear about bug in it, you run it and it kind of does the same thing every time, right? Where even traditional API services, right? Like, you know, if you want to interact with the Twilio API, you send an API call and it does something pretty deterministically, right? Send an SMS to this person, right? And if it fails, you get a very clear failure code, right? Like, could not be delivered because this phone never does not exist. And I think you're right that like, you know, with LLMs, it's deceptively like an API call. I mean, it is an API call, right? If you're using an inference API, you're not like writing the models yourself. So you can make an API call to open the eye or to like, you know, hug your face or whatever. And, you know, but but the the reality is like, what you get back is like kind of very non deterministic, right? And meaning, like, even the frontier of what the models are good at is kind of relatively unknown, right? I mean, like, of course, we've all seen the benchmarks on, tests, MMLU, et cetera. But in terms of all of the surface area of real world problems, you can apply LLF to words. That's relatively undiscovered, right? That's why I think it's fascinating to constantly look at Twitter and Reddit and see all these cool new zany use cases that people are unlocking. Like, oh my gosh, I uploaded know blood test markers over five years into chat chibi tea and it actually did an extraordinary job of diagnosing some like you know potential health issues I had right and like these are real life examples that's a true story
Speaker 2
that's yeah yeah yeah I mean
Speaker 1
apparently like you the hack is to get past the um the kind of like safety gates like you have to tell it well I'm actually writing a script for a movie and here's the data I'm uploading for the movie, but it's hard to be very, very correct because in the movie, we want to be scientifically accurate or something like that. But the point is, even the researchers who work on these models, as you know, don't even really know where the model is going to be really amazing, potent and capable before they train the model. And even afterwards, you know, that it's like sometimes you're kind of a sound in it like, wow, like we didn't think 3.5, you know, and tropics, you know, 3.5 model saw it is actually going to be so good, right? It kind of amazed us in, you know, certain dimensions. So I think there is this like frontier of unknown in terms of how these models do in specific use cases. And when you think about the really wide varying texture of real business use cases, think about every single part of a knowledge worker's job in every single department. So marketing, legal, finance, product, HR, in every single type of industry and different type of company size, like, you know, the only like if you were to go and test how these LLMs did at specific steps, like, obviously, you can't do the whole thing end to end short of AGI being developed. But even if you tested it on specific key steps, like, sometimes you don't really know what you're going to get until you try it, right? And so I do think this implies. Something for both our customers and something for how we develop our product for our customers. It implies that you cannot go and just, you know, establish a centralized AI team or committee that kind of top down and from an ivory tower goes and says, these are the use cases for AI we're gonna deploy into the company and they're gonna work for sure. Because I think you have to have experimentation. There is this unknown of take this miracle kind of product and just test it on all these different problems across the organization. And the only way to do that is really to empower a culture and a, you know, even like, you know, platform model of local experimentation, meaning, like, you need people within each of those functions, somebody within HR, somebody within legal, who understands their own work and the workflow and the data to really be able to have, you know, the ability to experiment with AI. And I think the way to do that is not necessarily just through chat interfaces, but really, you know, be able to embed the model intelligence into your data itself, which we do with our approach. I think that's going to be really key. And then the second question of how does this affect the way that we develop the product, I think it means that we need to provide tooling to our customers that lends to this rapid experimentation, right? And so meaning like prompt engineering is hard, right? And knowing how to chain together different prompts to get desired output, for instance, if you have one prompt, I'll put a draft of a legal document, right? And then the second prompt can actually review and critique that legal document given a certain set of criteria or standards, or even like example documents and needs to compare them against. Then you'll get like the critiques, and then you can have get a third step which might be human or AI to like incorporate those those, you know, those edits, right. and how you design the prompt and how you give it the many shot examples and each cakes are not always obvious to business end users, right? I mean very few people even know how to prompt engineer and like leverage these LLMs effectively. And so I think a lot of our job as a product is to give people the kind of tools and the UX to make it really easy to do this. So instead of having to go and like define that prompt sequence yourself to give you kind of a higher-order wizard to say, this is what I'm trying to do, and then have our product underneath the hood, kind of implement the specific prompts, the specific like many-shot feedback loop. And in fact, we have the advantage of potentially being able to use the user feedback that people give us in the app itself, or they're reviewing in Airtable, the outputs of the AI, and they're saying, no, this is wrong, or actually let me override this. We can feed that back into a mini shot example set that feeds into the prop, even without doing fine tuning to get better and better results over time. Right? So just being able to like abstract away a lot of the like, you know, weird nuances and, and like, you know, kind of, um, all the prompt engineering and LLM best practices, I think they're arcane and they're really hard to get to know and they're kind of constantly evolving. And so, we can make that all easier for our non-technical business users.
Speaker 2
And do you use those techniques yourself internally also to actually build the product? So we
Speaker 1
are trying to be very, very good customer number zero users of our own product, right? And even start to push the limits of like the next generation of capabilities. Like we want to be able to kind of dog food or do in a hacky way ourselves, so that we can learn what works, what doesn't work. So the short answer is yes, and increasingly so. I think, you know, what we're learning is, sometimes it's really helpful to have like almost a rough and dirty version of a capability, let's say, we want to be able to use AI to suggest AI use cases within an app, right? So imagine like if you work in legal at Airtable and you already have a lot of like legal contract and workflows in Airtable, Airtable could suggest to you, hey, why don't you add these AI steps or AI fields to automate these things that I've inferred you can do based on the structure of the contents of your app and even what you're doing in the app, right? And, you know, one thing we want to do is like start experimenting with let's just build a rough and dirty version of that and put it into the product for ourselves, or maybe for a very limited set of alpha customers externally, and let's see how they use it. Let's see, you know, even without looking at their actual customer data, like, let's get their feedback on, is it working, is it not? And then we can like kind of tune and change the prompts for our own kind of meta features to see how they work. So the short answer is yes. And I think to your point about like, a lot of these things are unknown and you kind of have to like experimentally try them. We want to lend ourselves to a more experiment capable way of using our own features at AI.