Speaker 3
Yeah, it makes me think of a one way hash function. It takes a tremendous amount of compute power to brute force it. But if you have the key, you can actually determine, like with almost no compute that whether or not it's correct.
Speaker 1
I think I think you should think of like, you should think of P and NP. And T is the class of all problems that are easily checkable. P is the class of problems that are both, you know, easy to generate a solution and easy to check. We basically want NP problems. That's kind of, or, you know, not precisely NP problems, but metaphorically for a human checker, we want NP style problems. That's where Olegm's really shine right now.
Speaker 2
So I want to go into an area you alluded to, which is like the need for precise models and to the idea of having like maybe a fleet of well tuned precision models that can do certain tasks really well and they're smaller. So in comparison to open AIs like GPT 3.5 or four or anthropics cloud, which are really versatile and have an ability to know a lot of things. Like when I'm talking to customer support, you know, I don't need them to tell me about, you know, chapter four of the great Gatsby, like that knowledge is not relevant for the particular context. Help us understand a little bit more on what the future might look like. Is it going to be that you have really tiny models that do small tasks really, really well and they know nothing about things outside of their tasks or what is the happy medium and I'm already hearing your response. So if we know nothing about anything, but tell us like what you think are some strategies that might work well. Yeah.
Speaker 1
I think what I want to avoid conflating is the model size and the model specificity in terms of what it works on. It's entirely possible that you have a highly specialized model that it's actually pretty big because maybe, you know, you need a big model or big models perform better in that situation or you have a lot of data and big models require it, but they're still highly specialized. So I do want to kind of,
Speaker 2
I want to distinguish between those two. Can you share a little bit more like what makes the model large? Is it like number of parameters? What does
Speaker 1
that mean? It's just raw number of parameters. A model is big because it takes more compute to create a run. Okay. And so, you know, domain specific models are often smaller. Um, and you can often get away with something smaller because you're not asking as much of the model, maybe, but even asking a model to write natural language is asking a lot of a model and to have a good conversation is asking a lot of a model, even if it doesn't need to, you know, even if it's, you know, conversant on every topic versus convince it on one topic in practice. So I don't want to, again, this is one of those we know nothing situations. The advice I always give is start small and work your way up. And, you know, as you see the scaling behavior and as you see that things are getting better with each additional, you know, level of scale, cool. You invest more. If you stop getting ROI, you stop. Um, but the other pieces that I think, you know, I don't want to make any statements about the underlying nature of how these things work, but I think I can make a statement about the field. Um, we're seeing a lot of fine tuning products start to pop up everything from fine tuning llama to GPT 3.5 fine tuning that just got announced. That seems to be the direction the field is moving. People want customization. Otherwise nobody would be building these products or we'll see these products fail in another few months and opening eye will cancel this or something, but, you know, everybody seems to be moving toward a customization centric world. And, you know, I think that's telling.
Speaker 2
Can you help demystify what the process looks like to fine tune? So you have an audience of people that have lots of data, very familiar with data prep in the traditional sense for like analytics workloads. But imagine the data prep here looks really different. Uh, if you could walk through like a workflow of how to partner with people to get the right kinds of company data and what does the training process look like in practice?
Speaker 1
Yeah, the answer is really you need a lot of text and a lot of relevant text and you need to eliminate the low quality stuff. I think the jury is still out on whether you want to have only high quality text or whether you just want to have lots and lots of text. You know, what is the quantity quality trade off? Obviously, if you have one very high quality sentence, that's not going to make for a great training set. Um, so, you know, there is some trade off intrinsically there. We're still kind of figuring that out. We definitely think that eliminating the lowest quality data is probably important. Um, and then beyond that, I think there are a lot of questions, but it is, you need text and you need it to be kind of in natural language sentence form. Like it needs to be, you know, pros beyond that. Um, there's a lot of experimentation that goes into this. There's a lot of work in figuring out, okay, which data seems to be more important for improving the metrics. How do I mix different data sources together? Cause everybody's got dozens of data sources they can pull from. Um, what kind of data is helpful? What kind of data is not helpful? Again, it comes down to how you measure, but even in wrangling the data. So we don't do data wrangling ourselves at Mosaic. Um, that's something we can't help with, but there's this amazing company called Databricks, um, and they're really good at that. So you can see why, you know, joining forces was a really kind of an obvious win for everybody. We can now offer an end to end solution where you can come and say, I would like to train a large language model. And here's how I'm measuring success. And we can work with you from, where's your data? And how do we get it in and clean and tokenized and ready to go? There's a really nice, easy process for that now to cool. We'll take that data and train the model. There's basically an API call and then sitting and waiting for the model to train for a while to deploying that model and evaluating it in production and rinsing and repeating over and over and over again until you're happy. And that's now an end to end process that we can offer within one company. I think that's pretty exciting. That's what got me excited about the potential acquisition.