Speaker 1
add something to what you said after opening the document for those of you who don't understand how this works all these models behind the scenes are actually using api's two different large language models once you have the data in the database in order to generate the actual responses and each and every one of these tools with an open source or closed source has a different cost associated with using the api and when join i sang inference inferences when these models are generating results like the actual generation of content and tokens is actually called inference so that's basically the generation of anything in these models and you pay per token which again is about 0.7 words doesn't matter why it's this way just take that as this is what it is so the price range varies dramatically between these models in some cases for a million tokens of 700 000 words you will pay 20 cents like a mixtrel 7b which is an open source model from a french company llama is another open source model from meta similar pricing if you go to a clod 3 opus which is right now the most advanced model from clod from anthropic that's 70 dollars instead of 20 cents for the same amount of generation quality you're probably going to get a higher quality at least now so you got to pick and choose in which cases you want to use which models and i assume that's where you were about to go
Speaker 2
yeah exactly and i would have to say that the price will be commoditized at some point the cost of intelligence will go to zero probably we all we see that basically with gpt4o that it halved the cost
Speaker 1
with the same quality so yeah that's after gpt4o half that from
Speaker 2
gpt4o yeah yeah it's crazy yeah basically i said i did that i did not update my presentations on that but yeah this is how it did so basically just give you an example so for instance it used case like the julia example that we were talking about about having 300 users to use the system 10 times per month it will cost with gpt4o per on 14 almost 15 dollars per user so imagine a business that sells this you need also to be viable so yeah it costs a lot but you can also very key see how much more less it's with smaller models so one technique
Speaker 1
basically i want to go back to this just for a second yeah to give the numbers for 300 users wore over four thousand dollars a month with gpt4o 300 dollars a monthish with gpt3.5 and 159 with mixtrale 7b so it's 20 times more to do it with gpt4o but sometimes you have to like there's use cases with these better models just give you better results but i agree with china that this price is going to go down and down all the time
Speaker 2
yeah for instance this mikanic rack use case we needed to ugpt4 to actually yeah it and obviously their quality matters right and and of course when you have little users it does not even matter yeah but in case you scale with users one solution is just to have in a router that depends on the complexity of the query it basically routes it to a cheaper bottle so this is one of the many yeah then we look into security very important when you have especially a front facing a rack up we had this use case not the use case sorry examples that there was this chatbot that sold a car for one dollar and it was actually legally binding and also another example of air canada chatbot that gave basically a bad advice on regarding a price yeah exactly there was this refund that the user wanted and it gave it to them and it was declining so who is liable there basically what happens here is we need to imply some guard trails to eliminate basically any data leakage or prompt injections and yeah for the guardrails part there's tools like nimo guardrails or guardrails.ai a lot of them basically so it's a way how before the it could have it both on input and output query but it's basically a step in the middle where before it gives the answer to the user it actually filters it to or from or any harmful or basically an appropriate content and also protect personal information very important step as i said for front facing customer facing solutions and of course also for the pocket to monitor basically the token usage and we also need to solution there very important to understand where costs are coming from and there are tools like lance minion for finis from our eyes and that you can directly see how many tokens are queries from your users and you can even track costs per user and per query so very important and when you think to basically put the app a finite introduction but also before especially the the token usage monitor yeah i think that's it
Speaker 1
no it's great i i want before before i dive into there's a bunch of questions from the audience before we dive into those questions i want to summarize that aspect of it so we said there's three different levels that you can test this out one is very simple the other is a little more advanced but still does not require any third party custom development and then third requires custom development in the custom development universe you have to pay attention and actually on the security side probably on the other two as well like you gotta understand what data you will be sharing with who and that company that you're sharing with how are they keeping your data secure if they're keeping your data secure and different companies will have different comfort levels with different solutions right so if you're joe schmoe and you're selling shoes in the market there maybe you don't have any information that is problematic for to give to cha-chippity or to claud or one of those but if you're a doctor a lawyer a a financial advisor all these highly regulated companies you just cannot it's not even an option which means your company that you're working for and if you own it then it's on you to figure out how to run this in a secure way that does not expose your data which means putting the right measurements in place so that's problem number one problem number two is that that jonah mentioned is how do you protect yourself from just mistakes that these models make either because they're just made a mistake or because people know how to manipulate these models in order to give them information that they shouldn't as jonah said people can use it against you because if the model is going to commit to a price it committed on your behalf and so you're legally bind to whatever the model says if it's in a chat with a client and so you got to take these things into consideration with the solution that you're providing and make sure that the solution that you're putting in place is aligned with the needs of your business and this could be anything right it could be i don't care that's a very enough solution it's fine like i said there's cases where if it's an internal tool that people are used and you give them the ideas and they like okay fine so it's going to help you 90% of the time and the other 10% people still have to do the old manual process okay so there's different justifications and different environments will require different kind of level of security both in means of data security as well as it means of protecting you from getting the wrong answers from a model i want to jump into a few questions i have one and then there's people from the audience who have a few so i'll start with the first the first question is can you please re mention how AI systems are priced so do you want to take that one
Speaker 2
yeah sure i can actually show you here do you still see my yeah yeah so here we have basically all the costs for an LLM app and specifically for a rack up so we have here the banning costs which are basically the banning models which it's very minimal as you can see for the top load 10 million documents or say who uploads take a million but let's say 10,000 documents it just costs you 65 cents and 10 cents with with a smaller model and banning model very minimal we don't even yeah
Speaker 1
the company yeah
Speaker 2
exactly but then here the inference cost it's basically each of this companies like jib a pop and AI they charge pair pocket like so for instance it tokens is it's basically you have a query and depends on how long is this query it charges you basically for the talking that of the query so for instance here you could see that the cost per one million input tokens it's ten dollars compared to GPT through 35 tour boys just 50 cents and 27 for mixto then they have a different price for the output tokens so basically for the tokens that they will generate to give you to generate the answer to you so it's 30 for GPT for a tour and so on for the other APIs models and yeah we also have basically the costs from the document context and the prompt that the system of our app has so for instance back to the example with with the mechanic car instructions the prompt template will be basically the instructions that are in the back end for the lend to tell hey now you are an expert in car repair here it's the question of the user please check your documents and find the appropriate basically answer so this is the prompt template so this is also basically the cost yeah
Speaker 1
again to explain this in all these solutions even if you're using basic GPT you're basically adding an additional prompt what the user is writing in order to explain the system exactly what to retrieve and how to retrieve and what data it's looking for and all of these counts in your token count but the question from the audience the follow up is one token equals one byte and no the answer is no it's just a way these systems work and just take it into that a token is about 0.7 words and that's what you're going to pay for most of the pricing models that you're going to look at are going to give you the price for a million tokens so basically if you're looking at a million tokens it's going to cost you x a million tokens is going to be 700 000 words ish just depends on how long the words are
Speaker 2
yeah we can actually also see it here i just make a quick example of what tokens are so we have this query how to fix a broken air back of a car you can see here it's basically the tokens how it splits so each of those are
Speaker 1
tokens yeah tokens so there's multiple tools online what juan is showing now is called tokenizer you can literally just paste your text in the air and it's going to tell you how many tokens but it's also going to show you how it's broken up totally unnecessary if you just want to know roughly just assume that every 0.7 words is one token and you're going to pay for the tokens you use and there's different price for tokens coming in then there is for tokens coming out on some of these models so what i mean by coming in is your input so your prompt the document you're loading and all of that stuff is tokens coming in and then tokens coming out is inference or what the the model is generating and in most cases they're not equal and in all cases it's not equal the inference to generation actually cost you more money and sometimes a lot more money it's still very small amounts like as juan as i mentioned before the most expensive model right now is claud three claud three opus and it's 70 dollars for every million tokens of output so 700 000 words i don't know how many books that is but it's probably two to three books of 300 pages each or something like that's going to cost you 70 bucks so if compare that to any other way of generation of that amount of content before it's still free right but if you add that times 300 employees times 10 times a day these costs start adding up and so optimizing for cost is important so that was one topic that we've covered i have another question still on this what's the cost of actually hosting this right so i need that victor database to reside somewhere what is roughly the cost of hosting that i
Speaker 2
have actually to host 50 million tokens in vector database it costs i think just 70 something
Speaker 2
also yeah very negligible yeah yeah yeah yeah awesome
Speaker 1
the next question is what size of companies you feel are best suited for this solution so i think and i'll let you answer but i think we mentioned three different solutions so i think we can reply on the different solutions and the different sizes of companies
Speaker 2
yes any basically company even if you're individual you can and you have some documents that basically you want to chat with your documents or gain more information from the documents or you might have i don't know you might have five youtube videos about a specific topic that you want to create a new content about generating the original content out of it you can basically use gpt to do that create a gpt and store the data there um but sorry back to your question it's
Speaker 1
so any any size of company yeah i so no go ahead yeah so but if you
Speaker 2
for instance if you use a company like gain if you we go now to their website they're basically they don't even have the price in online so it shows us that it's for enterprise so this might be for a company that are 50 hundred 300 000 employees yes i'm sure there's and
Speaker 1
the fact they raise whatever 200 million dollars also shows you yes yes exactly
Speaker 2
yeah so they're for sure other tools like gillion that are more specialized essentially for smaller companies i'm i don't i'm not really actually i think i have even spreadsheets with those companies
Speaker 1
yeah there there are many chatbot tools are basically the same thing chat base or dante are basically chatbot yes yeah that's good they're really cheap and you can connect whatever data you want to them and you will be able to talk to that data so there's cheaper solution than gleen to give you an entry level to the level two okay i'm not using chat gpt or gemini or cloth art i'm using an actual tool that does it so dante chat base there's a bunch of those that do the same thing and you can use them to upload your company information to connect to urls to to upload videos connect to youtube videos like all these things multiple sources and you can do that i will say something uh beyond that and it's half a question as well both google and microsoft are clearly working in that direction right where you'll be able to use their chatbots that is going to be integrated to everything within their universe and beyond like even today on the microsoft environment or microsoft copilot you can connect microsoft copilot to external data sources and and it's not everything but it's the big one so you can connect it to slack you can connect it to sales for crm and stuff like that so do you think this whole concept of rag will become basically a given sometime in the next 12 to 18 months at least to some level
Speaker 2
yeah i actually have this discussion yesterday with someone about the fact that yes google also has i think given the ability even now you pay a bit more around 20 something dollars and you can search through your bull drive and everything yes however now you cannot connect it with your slack or basically with your other sales force or whatever this kind of apps that it's that might be not sufficient if you actually want to have access to all your basically apps that you're using you might use to like gain the hundreds of connectors but definitely i also without penai where they're going with their assistant api and also now the increase also context to their gpts you will be able actually to upload more so definitely there yes i would say there is some competition and it will we will use that yeah but there is room for other companies probably as well at least in the long short term yeah awesome
Speaker 1
so quick summary first of all this was fantastic like we touched on a lot of things i think we give a lot of people and both the comments in zoom as well as the comments in on linkedin are all very positive and people really appreciate all the information that you shared the quick summary you can use multiple levels of tools to communicate and chat with your data not doing it is by definition costing you more money than searching the old way where you gonna miss timelines you're gonna get the wrong information you're gonna potentially miss clients use lose clients not win proposals etc etc there's really very few excuses why not to do it and you can start very small with tools that are that requires zero technical knowledge other than literally connect your data sources and you can start chatting with them jonah if people want to follow you learn from you work with you one of the best ways to do that
Speaker 2
yeah sure so you just add me on john stock pregame i basically can also see it here and ask me whatever you want i can share more tools more than my spreadsheet cost calculations and everything so feel free to yeah just shoot me a message on linkedin yeah i
Speaker 1
think we'll do with a spreadsheet because a lot of people asked for it most of the people linkedin said yeah i wanted i wanted i wanted i wanted so what we're going to do is i think i will ask you to create a shared google sheets with it and we'll just connect the link in the show notes once the podcast goes live and then anybody who listens to the podcast can have it i can't thank you enough i want to thank also the people who join us i'm gonna just go not with everybody because it's gonna be a while but the people ask questions and participate so elsa paul and katie cobe and kyle king and the james lindsey my man and uh kursad parathas i hope i'm not butchering names here and also china and danielli and egor and that have joined us on and asked questions on the zoom as well so thank you everyone for participating thank you so much jonah for sharing your interviews with you this was awesome we'll do it again sometime
Speaker 2
in the future definitely thank you for having me and thank you everyone who joined and yeah talk to you soon katie we take it from
Speaker 1
there ciao bye everyone