Welcome Friends, to the first episode of ThursdAI recap. If you can’t come to the spaces, subscribing is the next best thing. Distilled, most important updates, every week, including testimony and tips and tricks from a panel of experts. Join our community 👇Every week since the day GPT-4 released, we’ve been meeting in twitter spaces to talk about AI developments, and it slowly by surely created a community that’s thirsty to learn, connect and discuss information. Getting overwhelmed with daily newsletters about tools, folks wanted someone else to do the legwork, prioritize and condense the most important information about what is shaping the future of AI, today! Hosted by AI consultant Alex Volkov (available for hire), CEO of Targum.video, this information-packed edition covered groundbreaking new releases like GPT 4.5, Claude 2, and Stable Diffusion 1.0. We learned how Code Interpreter is pushing boundaries in computer vision, creative writing, and software development. Expert guests dove into the implications of Elon Musk's new XAI startup, the debate around Twitter's data, and pioneering techniques in prompt engineering. If you want to stay on top of the innovations shaping our AI-powered tomorrow, join Alex and the ThursdAI community. Since the audio was recorded from a twitter space, it has quite a lot of overlaps, I think it’s due to the export, so sometimes it sounds like folks talk on top of each other, most of all me (Alex) this was not the case, will have to figure out a fix. Topics we covered in July 13, ThursdAI GPT 4.5/Code Interpreter:00:02:37 - 05:55 - General availability of Chad GPT with code interpreter announced. 8k context window, faster than GPT-4.05:56 - 08:36 - Code interpreter use cases, uploading files, executing code, skills and techniques.08:36 - 10:11 - Uploading large files, executing code, downloading files.Claude V2:20:11 - 21:25 - Anthropic releases Claude V2, considered #2 after OpenAI.21:25 - 23:31 - Claude V2 UI allows uploading files, refreshed UI.23:31 - 24:30 - Claude V2 product experience beats GPT-3.5.24:31 - 27:25 - Claude V2 fine-tuned on code, 100k context window, trained on longer outputs.27:26 - 30:16 - Claude V2 good at comparing essays, creative writing.30:17 - 32:57 - Claude V2 allows multiple file uploads to context window.32:57 - 39:10 - Claude V2 better at languages than GPT-4.39:10 - 40:30 - Claude V2 allows multiple file uploads to context window.X.AI:46:22 - 49:29 - Elon Musk announces X.AI to compete with OpenAI. Has access to Twitter data.49:30 - 51:26 - Discussion on whether Twitter data is useful for training.51:27 - 52:45 - Twitter data can be transformed into other forms.52:45 - 58:32 - Twitter spaces could provide useful training data.58:33 - 59:26 - Speculation on whether XAI will open source their models.59:26 - 61:54 - Twitter data has some advantages over other social media data.Stable Diffusion:89:41 - 91:17 - Stable Diffusion releases SDXL 1.0 in discord, plans to open source it.91:17 - 92:08 - Stable Diffusion releases Stable Doodle.GPT Prompt Engineering:61:54 - 64:18 - Intro to Other Side AI and prompt engineering.64:18 - 71:50 - GPT Prompt Engineer project explained.71:50 - 72:54 - GPT Prompt Engineer results, potential to improve prompts.72:54 - 73:41 - Prompts may work better on same model they were generated for.73:41 - 77:07 - GPT Prompt Engineer is open source, looking for contributions.Related tweets shared: https://twitter.com/altryne/status/1677951313156636672https://twitter.com/altryne/status/1677951330462371840@Surya - Running GPT2 inside code interpreter tomviner - scraped all the internal knowledge about the envPeter got all pypi packages and their descriptionswyx added Claude to to smol menubar (which we also discussed)SkalskiP awesome code interpreter experiments repoSee the rest of the tweets shared and listen to the original space here:https://spacesdashboard.com/space/1YpKkggrRgPKj/thursdai-space-code-interpreter-claude-v2-xai-sdxl-moreFull Transcript: 00:02 (Speaker A) You. First of all, welcome to Thursday. We stay up to date so you don't have to. There's a panel of experts on top here that discuss everything. 00:11 (Speaker A) If we've tried something, we'll talk about this. If we haven't, and somebody in the audience tried that specific new AI stuff, feel free to raise your hand, give us your comment. This is not the space for long debates. 00:25 (Speaker A) We actually had a great place for that yesterday. NISten and Roy fromPine, some other folks, we'll probably do a different one. This should be information dense for folks and this will be recorded and likely we posted at some point. 00:38 (Speaker A) So no debate, just let's drop an opinion and discuss the new stuff and kind of continue. And the goal is to stay up to date so you don'thave to in the audience. And I think with that, I will say hi to AlanJanae and we will get started. 00:58 (Speaker B) Hi everyone, I'm NISten Tahira. I worked on, well, released one of the first Docker chat bots on the market for Dr. Gupta and scaled it,and now we're working on getting the therapist bought out once. We can also pass more testing and get Voice to work at a profitable manner because we don't really have VC. So at the scale of few hundred thousand users, the API bills matter quite a bit. 01:31 (Speaker B) So, yeah, these spaces have been pretty helpful because I have some trouble with running a Voice transformer, trying to run it on the browser on web GPU, and then the person that wrote Transformers JS comes in here and just says, oh yeah, that back end is messed up. Just try blas and synth and stuff. So these have been very interesting and technical spaces. 01:54 (Speaker A) Yeah, we need to get Zenova in here. Zenova is the guy who NISten wasreferring to. Al Janae, do you want to give a few words of intro and say hi and then we'll start? Just briefly, please, because I think weneed to get going. 02:09 (Speaker C) Sure. Hi, I'm Janae. 02:11 (Speaker D) I'm the resident noob, I started messing around with AI at the beginning of. 02:16 (Speaker E) The year, and I also host the. 02:18 (Speaker D) Denver AI Tinkerers coming up next week. 02:20 (Speaker A) And if you're in Colorado area, greater Denver, please join us. It's going to be a blast. 02:27 (Speaker F) Hi, I'm Al Chang. I'm kind of an old school technologist. Just getting started with the AI again and just here to help. 02:36 (Speaker A) Yeah. All right, folks, so I think we've had a whole space on this. Simon Wilson and me and many, many other folks chimed in. The second this was released. 02:50 (Speaker A) Was that six? Was that Sunday? It's hard to keep track of actual days. Saturday, Saturday, last week, exactly during those spaces, by the way, as we were talking, Chad GPT, Logan and everybody else from OpenAI announced general availability of Chad GPT with code interpreter. So GPT four with code interpreter. 03:12 (Speaker A) And I think we just heard from Matt that even some folks who got access to the slept on it a little bit because it's maybe potentiallybecause of its very horrible name that's really hard to type interpreter and get lost in the R's. But it's an extremely powerful new superpower that we've got. And we've had the whole space talking about use cases that people already had. 03:37 (Speaker A) It was like three days into it and since then I bet that many more people tried it. I think Swyx 20,000 listens to that space, plus the pod. At least people definitely want to hear new use cases, right? 03:53 (Speaker G) Yeah, not much else to add about it. I think it's the feature for Switch. 03:59 (Speaker A) Posted a whole deep dive essay and coined it GPT 4.5 between us friends. And one of the interesting things about it is that we think at least that's where we are currently after playing around with this, is that it's a fine tuned model. So they kept training this on actually running code and executing code. 04:21 (Speaker A) That's what we believe. We don't know, nobody confirmed this and thenthat it's fine tuned from an earlier checkpoint of GBT Four. And so we actually had some folks on spaces talking about that it's less restricted and better like previous times. 04:36 (Speaker A) So it's an interest, I think NISten right. We have some folks who tell us they're using code interpreter without the code part. They just stopped the GPT Four just because it's that model. 04:48 (Speaker A) And I think also they took down the 25 messages per hour restriction on code interpreter. I've had like four hour sessions and it stopped like I didn't saw complaints. 05:03 (Speaker G) So it's just better. 05:06 (Speaker A) It's also fast. I think it's fast because not many people maybe use this by default and this could be the reason for the speed, but it's definitely faster for sure. I think also context window, was it Yam? Somebody summarized the context window and they told us the context window for code interpreter is eight k versus the regular GPD for actually that could be also a kick. 05:29 (Speaker G) You mean Yam copied and pasted. 05:34 (Speaker A) I would encourage you and Yam need to kiss in the cup because Yama isdoing a lot of legwork to take down the stuff that he posted and Yamais working on that and it's very visible and you guys need to do there you go, yam, you need to clear the air. However, Pharrell and Gabriel bring you up as well. And we're going to keep talking about code interpreter because that's what we're here to do. NISten and a few other folks and we started cooking with code interpreter. 05:59 (Speaker A) And by cooking I mean we started stretching the complete boundaries of what's possible there. And I think Simon Willison kick started this with the latent space Pod. So for folks who are not following latent space pod, feel free to follow SWIX, his main account, not this hidden one. 05:59 (Speaker A) And SWIX reposted the spaces we had simon Wilson was able to run nodeJS and Dino within code interpreter, even though OpenAg didn't allow for that by uploading like a binary and asking code interpreter to generate. Simon then promptly said they fine tuned the model away from that and we found ways anyway to ask it to do some stuff. I havea thread on how I was able to run a vector DB chroma inside code interpreter. 06:10 (Speaker A) I ran whisper CPP. We saw some folks running GPT-2 inside code interpreter, right? So imagine an Ll GPD Four running another and talking to it. It's like a little brother inside. 06:10 (Speaker A) I personally love that inception. I don't know if the person who ran GPD Two is in the audience as Dan I think was the nickname NISten. I don't know. 07:22 (Speaker A) Surya. 07:23 (Speaker B) Surya. He also wrote the search to PDF plugin for GP Four plugins andhe wrote that in like two days and it's more used than any other enterprise thing, which is pretty hilarious. 07:36 (Speaker A) We need to get surya. 07:38 (Speaker B) Yeah, he just did that as I'm just going to do a search plugins for PDF and it's like the most used. 07:45 (Speaker A) So dope pretty amazing. Again, in that space we've talked about having like a living manual, so to speak, for code interpreter use cases because it's coding. So it covers pretty much everything that we can think of as coders, maybe just in Python, maybe restricted to an environment. And I've been trying to do that with the code interpreter can hashtag and I encourage all of you, let me pin this to the top of the space, to the jumbotron if you have an interesting code interpreter thing and I'll bring up Skalsky P to the stage as well. 08:03 (Speaker A) And Lantos, so many good friends. If you have a very interesting codeinterpreter technique or skill or new thing that people can do without coding skills, please tag with this hashtag so folks can findthis. Otherwise I will cover the main three things the code interpreter gave us besides the new model. 08:42 (Speaker A) One of them is uploading files. And since we've talked, we've noticedthat you can upload up to 250 megabyte files and those can be zips ofother files. So we've uploaded like full models weights. 08:55 (Speaker A) We've uploaded bin files. It's incredible that you can now drag and drop whole directory and have JPT just know about this and read aboutthis. We've uploaded weights in embeddings. 09:08 (Speaker A) You can then obviously execute code in a secure environment, which isagain incredible, and you can download files, you can ask it to actually generate a download for you, which is also super, super cool. Maybe one last thing I'll say before I'll give it to the audience for a few more cool use cases. And folks in the stage, please feel free to raise your hand. 09:21 (Speaker A) I'll get to you in the order that you raise your hand if you have a use case. Some folks built like a built in memory built in brain within code interpreter just to save to a file. That's what I try to do with my vector DB and then they download that memory at the end ofevery session and then upload this to the next one and have some likea prompt that reminds the jgpd like to start from that point. 09:50 (Speaker A) So in addition to the context window, they're also having a separate offloaded file persisted memory. So code interpreter incredible. Again. 10:00 (Speaker A) Potentially GPT 4.5. And if you haven't played with this, feel free to if you don't know what to play with, follow the code interpreter can hashtag and let's get to Skowski. 10:11 (Speaker A) What's up, man? 10:14 (Speaker H) Hi, hello. Do you hear me? 10:15 (Speaker A) Yeah, we can hear you fine. 10:19 (Speaker H) Yeah, I've been playing a lot with code interpreter over the past five days, mostly with computer vision use cases because that's what I do. I haven't introduced myself. I'm pretty much doing computer vision full time for the past five years and was focusing on like when I saw that you can input image and video, that was immediately what I was thinking, we need to make it to computer vision. So I wentthrough some low effort tasks. 10:46 (Speaker H) So I managed to run old school computer vision algorithms, face detection, tracking of objects, stuff like that. But I also managed to exploit it a little bit. So you can add yolo object detection models to the list of models that were run in code interpreter. 11:15 (Speaker H) There are some problems with memory management, so I'm not yet fully happy with the result. But yeah, I managed to run it on images and onvideos and the things that are super cool and are kind of like underrated right now, false positive. So when the model detects something that shouldn't be detected, you can really use text to ask code interpreter to filter out false detections. 11:48 (Speaker H) You can just give it your feeling like why that stuff is happening orwhen or where. And it's very good at cleaning the detections, which was kind of like mind blowing for me. And one thing that I noticed that it sucks at is I managed to create an application that counts objects moving on the video when they cross the line. 11:55 (Speaker H) And I didn't use any off the shelf libraries, I just had detector andsay, okay, now draw a line and count objects when they cross the line. It's terrible at that, writing math logic to figure out that something crossed something, we had like ten prompts or twelve prompts exchange and I basically bailed out on that, forget it. So there are some things that blow my mind, but there are something thatprobably not. 12:49 (Speaker A) So folks, feel free to follow Skowski. And also I just pin to the topof the Tweet his brand new awesome code interpreter use cases, git repo, and there's a list, there's a bunch of use cases there. This could also serve as a de facto manual. So feel free to go there at PRS and follow that for updates. 12:52 (Speaker A) And I want to get to Lentos because he seems to be unmuting. What's up, Lentos? 13:12 (Speaker H) I was just going to say I can't follow him because he's blocked me. 13:15 (Speaker C) Sad face. 13:16 (Speaker H) Oh, no, I noticed that, but I'm not sure why. I will undo that. 13:20 (Speaker A) All right, I'm the peacemaker in the status. Please kiss and make up.You two as well. Everybody should get along. 13:26 (Speaker A) Yay. I want to get to some other folks who came up on stage recently.And Gabriel, welcome to talk about code interpreter and your use cases. 13:35 (Speaker A) Jeanette, if you play with this, I would like to hear two more opinions before we move on to the next incredible thing. Yeah. Oh, you guys are talking about let's get together and then June sorry, I should have been explicit about the order. 13:54 (Speaker E) No worries. So I just posted a comment on this space about the message cap on a conversation. So even though in the UI, it still says 25 messages per 3 hours, if you look at the network request, youcan see that. And I posted this, it's actually 100 messages per 3 hours now. 14:12 (Speaker E) And I don't know if they're scaling that up and down as demand increases and decreases, or they're just trying to trick people into conserving their messages, but it's definitely been on 100 for a little while now. Can you confirm same thing you can see in the network? 14:32 (Speaker A) Can you confirm the same for the regular mode, or do you think the regular mode is still restricted? Well. 14:41 (Speaker E) Based on just the fact that there's only one message cap, they don't have message cap per model. So I think it's just consistent across all the GP four models. And that's also my experience in the last it's been a little while now. It's probably at least a couple of weeks that it's been higher. 14:51 (Speaker E) And same thing we discussed, I think, on Saturday about the context window. And you can also see it in the API that the context window iseight K for plugins and code interpreter, and it's 4K for the base GPT four model. 15:16 (Speaker A) That's awesome. Like suicide. Better in every single way. 15:22 (Speaker D) Yeah. 15:23 (Speaker A) Awesome. Thanks. 15:24 (Speaker E) Yeah. In terms of use cases I can share, I've been digging around a lot in the code interpreter, and I was really trying to hone in on why are the packages that are installed there, the Python packages inthe environment? Why are they there? Some of them seem really random,and some of them make a lot of sense. And they released it, saying it's for, basically data analysis. And a lot of them make sense for that, but some of them are just really wild, like the ML packages. 15:54 (Speaker A) And the Gabriel folks in the audience. If you look up at the jumbo tone where we pin Tweets two Tweets before there's a Tweet by Peter Zero Zero G, who actually printed all the packages and asked GPT Fourto kind of summarize what they do. So if you have no idea about the potential capabilities of what it can do, feel free to pin that tweetfor yourself. And then it has a bunch of descriptions of what's possible. 16:11 (Speaker A) So go ahead. Gabriel. Yeah, cool. 16:28 (Speaker E) Yeah, I've done the same kind of thing with just a short yeah, I got it to do a four word description for each one. So if you're looking for a really short description of each package, I'll post that tweet.And if you're looking for a long one, I think Peters is great. And what you can see there is that there are packages for web development, right? There's Fast API, there's Flask, there's a bunch of other packages for Web development. 16:40 (Speaker E) And besides the fact that there's no network access, which obviously other people using it might be turning it on, but it was just interesting to me. My perspective is that OpenAI has been using this internally throughout all their teams for development and testing it internally, but probably also using it pretty consistently. They probably have access to the Internet. 17:14 (Speaker A) Yeah, I'm sure they have access to. 17:15 (Speaker E) The Internet and they can install new packages. But I think they alsohave the ability, instead of uploading files and downloading files, they have the ability to just mount persist memory, I don't think, topersist. I think they just mount their local working directory on their computer right wherever they're working. So they have their active directory where they have their project, and they just mount that and give the code interpreter access to the whole directory withtheir whole repo of their project. 17:48 (Speaker C) Yeah. 17:48 (Speaker E) And then Chat Gvt is just writing code to the working directory and reading from there and it can explore their whole project. We can do that now by uploading, you can zip your whole project and upload the whole thing zipped and have it unzipped. And then it can kind of explore your whole project. But then once it makes some changes, you want to commit them, you have to ask it to zip the whole thing back, download it and upload it. 17:48 (Speaker E) And then I think what they're able to do is more of like a kind of peer programming thing where the developer makes some changes and then Chat GPT makes some changes and they're kind of working together. This is taking it one step further. I don't know if they have this or not, but it would be super. 18:29 (Speaker A) Cool in the realm of updates unless there is no speculation. But I would love to explore this more with you in the next stage because this applies to open source and how people already saw somebody tag us after the last space and said, hey, I'll build this open source. Iwould love to pin this to the top of the space. However, I want to move on to new space and then move on to other updates. 18:51 (Speaker A) Sorry to interrupt, but thanks. I think that the collaborative, persistent code superpower that probably maybe at some point will come to us as well. Plus the internet access is like another ten x I want to get to Skowskin and lent us and I think we'll move on to Claude. 19:08 (Speaker A) Thanks Gabriel. 19:11 (Speaker H) Yeah, I have a question. I'm not really sure guys, if you notice thatI was obviously experimenting with PyTorch because I needed it for computer vision. I noticed that the PyTorch version that is installedin the environment actually pre compiled to work with CUDA. So it's aGPU version of PyTorch. 19:31 (Speaker H) Even though that in the environment you don't have access to GPU, youonly have CPU. So I'm curious guys, what you think about that. Why isthat? Any ideas? 19:42 (Speaker A) Ideas that just come from what Gabriel just said? Likely we're getting the same Kubernetes container. However, the open AI folks have like unlimited stuff. They probably also have CUDA that would make sense right there is probably connected to a GPU as well, but that's just an idea. Lantos, I want to get to you and then we'll moveon to Claude. 20:02 (Speaker A) Folks and folks in the audience, feel free to hit the little right button on the bottom left looks like a little message and leave comments through commenting as well. Moving on to Claude V Two. Folksin the audience and folks on stage, feel free to hit up the emojis plus one. 20:19 (Speaker A) Minus one if you have tried Claude V two if you like it and you haven't liked it. I'm going to cover this anyway because I think somebody called me, I think Roy from Python called me a Cloud V Two fanboy yesterday and I first got offended and I told him that I'm just a fanboy for 24 hours. Before that I was a code interpreter fanboy and then I figured with myself whether or not I am a fanboy ofClaude V Two. 20:43 (Speaker A) And yeah, I am and Sweet told me to relax and in fact I invited him here to be the red blanket on the other side of the list. Anthropic the company that we can definitely consider number two after opener. I think that's fair in terms of quality. 21:02 (Speaker A) Have long released Claude version and they made some ways when they released Claude AKS clong with 100K complex window, they have released Cloud V Two and let me paste some Claude sorry, pin some Claude thingies in the jumbotron, sorry. However, Cloud V Two released with multiple stuff and I want to focus on two stuff and I think we'll cover the UI first and then we're going to talk about themodel itself, UI wise and product wise. My hot take and I'll pin thisto the top. 21:38 (Speaker A) Unfortunately not debate this, but I love you, all of you. Is that asproducts, Cloud V Two right now beats JPD as a product. My mom can gointo two websites and she'll prefer one versus the other one. 21:51 (Speaker A) Or my friends that don't know Xai as plugged in as we are, theirs is free. And I think Cloud V Two beats GPD 3.5, which is also free, and 100K context window with the model being traded, 200 unleashes, a bunch of use cases that were not possible before. 22:12 (Speaker A) It just frees you up. If you heard Skowski just say the limitations of code interpreter. A bunch of these limitations stem from the eightK context window. 22:13 (Speaker A) If you print a bunch within the code that you're doing, code interpreter sometimes forgets what you guys talked about 20 minutes ago. And the 100K context window also means a long, long conversationhistory with the model. And I think it's really great. 22:37 (Speaker A) Not to mention that you can drag and drop full books in there. Those books need to be in like one or two files and they still don't acceptzip files. And I'm planning to release an extension soon that does this for us and unifies and single files. 22:51 (Speaker A) So hopefully by next week we'll have some updates. However, once you upload that much or you can upload like a transcript or a podcast, you can do a bunch of stuff because Cloud V Two is also better trained on code and we saw a significant jump in wait, I'm switching to the model, so let me get back to the UI. The UI allows you to upload files. 23:09 (Speaker A) The UI has a command k interface, which I personally love. I hit Command K in every website and see if they support it. You can just start a new chat real quick. 23:21 (Speaker A) It doesn't have Share, but it's definitely refreshed and free UI. It's called Cloud AI and that's the URL, and if you haven't tried it,definitely try it. Comments about just the product side and the UI side before we move to the model? Anybody play with this? Anybody like it? Anybody loves the upload files feature? I would love to hearhands and comments. 23:42 (Speaker A) Go ahead, Matt. 23:44 (Speaker D) A bit of a weird thing, but what I've noticed is it's actually quite frustrating if you want to paste text in it actually, if it's over a certain length, will paste in as a file. Little small thing. Hopefully they'll change it, but it is really annoying because then you can't edit it. Chat GP does do that much better, but I generally agree with you that overall the product experience on Claude is. 24:03 (Speaker A) Significantly the new one. The fresh coat of paint they released for us. I will say that Cloud so far was kind of a hidden gem, that only folks who got access to the API actually got access to their UI, and that UI was very restricted and folks who have access to Cloud API know what I'm talking about. I think that UI is still around. 24:22 (Speaker A) It still shows your history. It's like very restrictive. It's not as cool as this it's not as leak as this. 24:27 (Speaker A) So we like cloud AI, definitely a plus. Check it out. Now, let's talkabout the model behind this UI, because that model also changed and several incredible things that changed with it. 24:38 (Speaker A) First of all, they released a new model, same price as the previous one. We love to see this. Please everybody, including opinion, continue giving the same price and cheaper and cheaper down the line.24:41 (Speaker A) We love to see this. Second of all, they claim it's been fine tuned on several things. One of them is code. 24:54 (Speaker A) And we actually saw a bump in the evaluation called Human Eval, whichis a set of questions that OpenAI released and I think the bump was from like 55% to 78%, which I think beats 3.5 and is not there compared to GPT four. Correct? 25:14 (Speaker C) Yeah, and four and four on past first on the first, not on GPT four that is allowed to refine and fix it there, but on the first trial. Yeah, by a little bit. 25:33 (Speaker A) So, news to me and thank you for joining in the past numbers is how many times it's able to reflect upon the sensors and improve them. 25:43 (Speaker C) The past time is kind of what I meant by reflection is even stronger GPT four. If GPT four sees the exception, it can come up with a solution. So this is not in the Human Eval test, but if you use GPT four this way, you get to 90 something percent, which is which I think it's more realistic if you think about it. No programmer writesthe whole code in a one go. 26:10 (Speaker C) You write it intuitively, six bugs and so on. And also in code interpreter, you see it. But it is remarkable to see state. 26:19 (Speaker A) Of the art on first and it's significantly better in code. And I suggest folks who previously tried quad and haven't impressed to try as well. An additional crazy thing that they've trained on is 100K contacts window and they've actually trained, they claim on 200K contact window, so twice as much as the previous round. And we followthis one guy of your press, the guy behind Self Ask with Search and the guy behind Alibi, the ability to extend complex windows. 26:55 (Speaker A) He just defended his PhD and he talked about complex windows and he was impressed with the way they presented and the way they showed their loss curve. And so this could be we saw the paper maybe this week the folks saw the paper where the window dips in the middle. There's like less attention in the middle of the beginning at the end. 27:03 (Speaker A) And it looks like that's not the case for Claude as well. So I suggest you try the huge context window and al you have your raised hand and then we'll talk about some other model changes. 27:26 (Speaker F) Yeah, I would talk a little bit about I used Claude about a month anda half ago to win Best Solo Hacker at the Craft Ventures hackathon david Sachs won. Yeah, it had like 200 entries, but it's exceptionally good at creative writing and also like comparing and contrasting. I don't think people have really taken advantage of whatthe context window is capable of doing. It's more than just loading single files in. 27:53 (Speaker F) So what I did for the project was I loaded these large legislative bills, these like 50 page unreadable bills, and you turned them into relatable narratives. So one of the things that Claude can do is you can adopt a persona. So a lot of times with summaries, summaries justcompress the text that you see, but you can tell it to say, write 1000 words from a social conservative point of view, or a bus driver's point of view, or a social liberal point of view. 28:21 (Speaker F) And what that does is it takes all of its knowledge about the outsideworld and gives you not a summary, but it gives you essentially an essay about the practical effects of something like a bill. I've actually been working with the idea of reading a book and having it tell you what I would have learned from this, because that's actuallyprobably what you're more interested in. What it can do in terms of comparing and contrasting large essays is exceptional. 28:51 (Speaker F) So you could have it say, write 2000 words from a social conservativepoint of view, 2000 words from a social liberal point of view, and then have it contrast the essays, which is something that would be very difficult for a human to do. So you get to give it multiple files and have it just give you a more balanced approach so you get rid of some of the bias that comes in. 29:18 (Speaker A) My dream, go to my dream project that I never get to is to create this for Twitter as like a Chrome extension that I can select a bunchof tweets and then say, remove the bias from this and just give me the debiased version of all of this. Yeah, completely. Like the crossreference ability of Cloud between because of this context window is incredible for many, many use cases. 29:41 (Speaker F) Yeah, I would say that as far it's not as good as GPT Four for certain things. But that context window is fantastic. And I would saya lot of people that are using embeddings and retrieval, you can actually just put the whole thing in the context window and ask questions to that and then you have a baseline to compare your results from it. Most people, if they're chatting to a website or something like that, you actually can just put the whole thing in there as opposed to trying to chunk it up and do questions and you'llsee that your results are much better that way. 29:51 (Speaker F) And for most people, that would be good enough. 30:17 (Speaker A) So additional thing that the additional thing that Cloud was trained on, they've talked about the output tokens, just the number. Of output tokens of how much cloud is able to generate. And they've saidthat previous models, I don't know if the same about GPT, I haven't seen numbers on GPT Four, but they've said that previous Claude models were focused on shorter outputs just as they were trained. Andthis latest model was trained to output up to 4000 tokens in output. 30:47 (Speaker A) This is added to the fact that they also fine tuned it and trained tooutput JSON files, complete JSON files as responses, which we as engineers, we waited for this and Open Xai gave us functions via kindof here you go, there's the function interface. And we love the function interface. The function interface kind of locks us down to the OpenAI ecosystem. 31:04 (Speaker A) And it's great to see another model that's like very close to state of the art in human evil that also is now fine tuned to respond in full intact JSONs. And those JSONs can be 4000 tokens at length. Any thoughts on these? 31:28 (Speaker F) Yeah, I can confirm on it being able to write large amounts of output. I mean, I was having it write like 2000, 3000 word like sort of essays and outputs and it was fine with that. 31:40 (Speaker A) Yes. And I think it's I'm going to. 31:45 (Speaker B) Stick with GPT Four myself. But this might be pretty useful for just dumping in an entire code base, given the 100k context window and then getting some reviews and stuff, and then maybe moving some of the stuff. 32:02 (Speaker A) Once I stop posting status and build that chrome extension that you upload the zip and it flatlines it to one file and then upload it, then we'd be able to do, like, a proper comparison, because code interpreter can take zip files and then extract them. Oh, one difference that I want to for folks in the audience, GPD Four with code interpreter allows you to upload zip files, et cetera. We talkedabout this. It does not load them into context window, right? So there's like eight k context window. 32:30 (Speaker A) The files that you upload are not automatically in the context window. The model doesn't it has to write Python code that actually prints the files. And it usually does like the first few lines, hint,hint. 32:30 (Speaker A) The folks in the audience who get my drift. But it doesn't usually read all the unless you specifically ask it to and Claude does. So everything you upload to, Claude goes directly to the immediate working memory of the complex window. 32:38 (Speaker A) And that's a major difference to watch out for and also take care of.Go ahead. 33:00 (Speaker C) I would like to ask everyone before I say my opinion, what do you think about it in comparison to GPT Four about the performance? What do you think? 33:10 (Speaker A) I would like comments from folks who actually use both and did the comparison. And before I get to folks, please raise your hand to answer. I want to call out SWIX's small menu bar which allows you to actually Swyx. Can you give us like a brief two minutes on the menu bar thing? 33:28 (Speaker G) Yeah, well, you don't have to choose. Just run it all the time on every single chat. So it's a little electron app that runs in the menu bar. And I've been maintaining it and I just added Cloud Two this week. 33:42 (Speaker G) Cloud Two is not super stable yet. Sometimes it will fail to submit the button. So you just have to retry manually to submit the button. 33:50 (Speaker G) But yeah, it's a great way to a B test models, but then also just amplify every question with between four to five different chat models with the answers. So I've been trying it. It's up to you if you want. 34:07 (Speaker A) To. 34:10 (Speaker C) Find it. 34:14 (Speaker A) With the announcements, if you can. Yeah, awesome. Yeah, just basically and maybe for instance, you don't have to stop using, you don't have to choose. So I think the last thing that we need to acknowledge it's, Claude, is the multilinguality. 34:28 (Speaker A) So they actually focused on showing us how much better, like, the newones from previous ones, and they posted blue scores, Bleu scores, clock Two is significantly better at languages than the previous versions. I think, to answer your question, I think it's close to GPDFour, if not better at some things. Hebrew goes fluently, and usuallyHebrew is not that great. 34:57 (Speaker A) Russian and Ukrainian that I use also go fluently. And that part is really good with a lot of context because you sometimes need to do a lot of translation, or at least I need to do a lot of translation. 35:11 (Speaker C) Yeah, multilinguality works great. I was surprised. Absolutely. What I think if you just compare the two on the same prompt, the same question, I have a feeling that GPT Four is slightly better, but I just don't have an example to tell you. 35:31 (Speaker C) Okay, here I don't know, it's a strange situation, but I really wanted to ask you, like, what did you try and work better here and there? 35:38 (Speaker A) So here's my use case that GPT Four currently cannot do. Yesterday, Lex Friedman interviewed Israel's Prime Minister Benjamin Netanyahu in one of the weirdest turns of history this podcast was, and given that I know kind of who Benjamin Netanyahu is from, before I decided to not listen to this, I decided to use the tools that we have at ourdisposal. So I ran this through Whisper with Diarization. So I have, like, a very nice transcript of who's talking. 36:10 (Speaker A) When I took that, I just dumped this as a text file. And I agree withMatt, it's a little bit annoying that Claude turns whatever you pasteinto like, a little text file uploads. That because you can't edit it. 36:21 (Speaker A) However, I uploaded that transcript directly to Cloud, and then I asked it to do sentiment analysis, entity extraction, and sentiment analysis and entity extraction. Something that if I'd asked GPT code interpreter, it would probably write some Python code to do this, andQuad just kind of did it. And I haven't seen GPT Four being able to do this for bigger files. 36:38 (Speaker A) And once I could just let me just this point. I continued by saying, hey, because of the new coding abilities of Quad, I asked it like, hey, print me a Python file that dumps whatever table of topics he mentioned and sentiment, negative, positive, dump it into a word cloud. That's something the code interpreters can actually do and show you. 37:03 (Speaker A) But I asked it from Quad because previously Claude was s**t at codingand it gave me Python files that ran from the first time. I didn't have to change anything, there was no bugs. And then showed me a wordcloud of everything that was mentioned by BB in that podcast and it all took like maybe seven minutes. 37:11 (Speaker A) And I don't know if for bigger complex windows, GPT Four can currently do this. Go ahead, Al. 37:28 (Speaker F) Yeah, I've actually been putting a lot of transcripts for podcasts inthere and you can actually have the because it seems so much about the speakers and it knows about the speakers, you can actually have them continue a discussion about things that they didn't actually discuss. Yeah, so it's like you can have it say, okay, well, what aresome topics they disagreed on and then some things that they didn't cover? Tangentially, you can just have it give you another two minutes of interview and it does a pretty reasonable job, especially with public figures that it actually has a lot of their background on. So it's pretty interesting. 38:01 (Speaker A) And not to mention free, ngbt Four needs a $20 a month payment and quality is free. 38:08 (Speaker F) That's a good point, too. For those of you that have eval keys, you'll notice that they're actually not charging you for them, so youcan actually go on as long as you want. The limitation is that you can only do one request per organization. So if it's just a single person, they only charge you basically when you start deploying for commercial purposes. 38:21 (Speaker F) So that's something that people may not have realized. 38:32 (Speaker A) So I think we've covered everything right, trained on 200K context, which they can enable tomorrow for us, and we'll get like two X. It'sgoing to be insane. There is some stuff that they have in Cloud in a tropic called Constitution AI, so they have a mix of Rlhf access and Constitution AI. So they're working on their model to actually be more helpful, but also more safe and less jail breakable. 38:57 (Speaker A) They talked at length about this. We talked about human evil better and same price and free playground. I think we've covered most of it.39:03 (Speaker A) So anything else about Quad that we haven't covered, feel free to raise your hand and tell us, and if not, I think we can move on. Whatdo you guys think? 39:17 (Speaker G) I'll mention briefly, did you talk about the multiple file uploads? 39:21 (Speaker A) No, go ahead. 39:24 (Speaker G) So I think it's just an interesting way difference between co interpreter and Claude code interpreter. You can only upload one file, right? But it can be a zip file with multiple files in Zion. Soit's de facto multiple files, but then you can only run code on that.Whereas what Cloud here is doing is something slightly different, which is to me is interesting, which is you can upload multiple files, it just reads the file straight into the context and it's using that 100K context to synthesize answers. 39:24 (Speaker G) So you can do, for example, PDF A and PDF B and give me a comparison between the two of them or synthesize knowledge across them. And I think that is something that code interpreter cannot do because code interpreter will only run code across files. So I think that's noteworthy. 40:15 (Speaker G) It's called genuinely coming up with one new thing that is not copying chat GBT and good for them. 40:23 (Speaker A) Yeah. And unfortunately no zip allowed. But we're going to fix this with an extension and hopefully talk about this next week. I want to say hi to Weather Report. 40:33 (Speaker A) Feel free to chime in. Sorry you raised your hand open to come up before. So if you have a comment about code interpreter, we've moved past it, but if you have a comment about Claude, feel free to tell uswhat's up with the report. 40:46 (Speaker A) Actually, I had only one thing about code interpreter that in the previous space I talked about that there was a hypothesis I had aboutcode interpreter, which. 40:56 (Speaker B) Is to use it as a huddle because it's recorded. 40:59 (Speaker A) We'll move on and let's talk about code interpreter next time. I think that some folks are saying that their audio is glitching and sothey're not able to and I want to see if I think Joseph has comment about code interpreter. Joseph Polak. We'll give him a second to log in and then I think we'll move on to other updates because we have many other things to talk about. 41:29 (Speaker A) What's up, Joseph? Welcome to stage. 41:31 (Speaker G) Hi there, folks. 41:33 (Speaker A) Thanks for taking my question. I didn't even know all about that codeinterpreter stuff with the file. 41:40 (Speaker G) So I'm really happy to have heard it. About Cloud, though. 41:46 (Speaker A) For Cloud. Well, I'm still on waitlist. First of all, it's free now. You can access it right now. 41:53 (Speaker A) Cloud AI. There's no waitlist anymore unless you live in the States and you'll have to get a VPN. Okay, I'll definitely check that out. 42:03 (Speaker A) My question was about using Cloud and actually code interpreter through API. Do you think that's ever going to exist or if it's coming so clogged API? But I think that's waitlisted. I have talked with Claude folks and they said the waitlist is now going faster. 42:24 (Speaker A) So they are ready to get more people in. I think because of the new safety updates, they're less afraid. So definitely apply for the waitlist on quads account. 42:35 (Speaker A) Code interpreter is not available via API, and we've seen some folks who hack it together with like, I think a browser plugin that proxy something. Sweets I don't know if you remember the unofficial quote unquote code interpreter API and it's how to access this, but it's not available in the official OpenAI APIs as of yet. We haven't seen them. 42:56 (Speaker G) No. For the record, there's no unofficial code interpreter API. There's the browser side thing that we are trying to but nobody's made any. 43:07 (Speaker D) Adapter for it yet. 43:08 (Speaker G) I think you can, if you want, using puppeteer. 43:12 (Speaker A) I would not recommend definitely, if anything, there was some folks that tagged us and I need to go and find this that they're working onlike an open source version of code interpreter that uses laws and stuff. And that one this will likely be the way forward. If you do want something programmatic that has code interpret capabilities, go ahead. NISten. 43:35 (Speaker B) There's also Chatbot UI on GitHub. So yeah, for the other people thatare hacking something together, I'll wait until there is something public before, because then. 43:45 (Speaker D) We don't know everything. 43:47 (Speaker G) Open source is going to be worse. Because you are missing the model. 43:51 (Speaker A) Yeah, because we think that it's fine tuned on actually knowing how to run code. Right. That's kind of the highlight that we get with from the less space. We think it's smarter because of that. 44:01 (Speaker A) And one of the main things again, sorry, going back to code number just real quick, it is able to then fix itself and ask itself, oh, oops, I made a mistake. Let me try again. Matt, I saw you unmute yourself. 44:13 (Speaker A) Feel free to go ahead. 44:16 (Speaker D) Well, yeah, just a quick thing. So from what I know, openi will be offering fine tuning relatively soon. So at that point, you theoretically could go and fine tune your own code interpreter like Model, even if they don't offer it, which is going to you. 44:31 (Speaker A) Can also theoretically not that we would recommend, but theoreticallyright now you could start distilling some stuff from code interpreterby asking it questions. Generate code and store it to a file. Ask it to download and then quote, unquote, generate the data set. But not that you should, but you can theoretically as well, so that when it'stime to fine tune, you have some data set. 44:52 (Speaker D) Yeah, theoretically. I don't know if a shared GBT currently supports those types of conversations, but if it does, I'm sure that's going to happen really soon. 45:00 (Speaker G) I don't think it's maintained because chat GPT itself well, I want tospeak for share GBT. I know, Steven, but I can help you move the conversation back to cloud. 45:11 (Speaker A) Yes, please. Let's move back to cloud. Thank you. 45:14 (Speaker G) So just between the how many people are listening to this chat anyway? I think it's like 60 people. Email support@anthropic.com for the Cloud API. 45:26 (Speaker A) Yes, email them, state your use case and they'll likely get you in and you can use SWIX's menu bar to actually kind of run them in parallel with the megaprom feature. Megapron super prompt, what is itcalled? I think SWIX dropped. There is like one prompt that you type and then it all goes to both to all the models. I want to recognize some folks in the audience. 45:50 (Speaker A) Hey, feel free to regime if you. 45:52 (Speaker D) Want to come up. 45:52 (Speaker A) Obviously, I saw some other Euro I saw in the audience. Max AI. Welcome, Dexter. There's a bunch of folks who are usually here and it's great to see, and I think we're moving on to a very spicy one. 46:06 (Speaker A) What do you guys think about Xai? So I'm pasting the summary of the people. Elon Musk and a bunch of other folks have announced X. AI they're essentially answer to OpenAI. 46:22 (Speaker A) We've all seen Elon kind of talk about safety and talk about helping open Xai and then could not be open since then. He talked about truthGPT at some point. And finally they announced Xai as we were talking.46:37 (Speaker A) By the way, I have an application from Xai which they're going to have spaces tomorrow to go deep into deeper into Xai. But so far there's not a lot of detail. There are some details about the folks who work there. 46:50 (Speaker A) So they have folks who wrote the Adam Optimizer. There are other folks thoughts about Xai before we get to hear what they do. Obviously, there's no product yet. 46:59 (Speaker A) I don't think they've started training. The one thing that I will sayis that they will have premium access to Twitter, obviously, because Twitter is now rebranded.com Xai. After closing down the APIs and closing down the scraping for Twitter, xai will now have a data set that's insane to train on Twitter. 47:21 (Speaker A) And we wish them, quote, unquote, good luck. I would love to hear from folks on stage. What do you think about the announcement, the direction, the people? And we're going to wait for tomorrow to actually hear them talk. 47:24 (Speaker A) I know. NISten, you have some ideas if you want to share to get started. 47:40 (Speaker B) Well, this is more of an old lady babushko opinion that's just talking about stuff. I found it interesting that they went from, whatwas it? Base GPT through street taking on GPT four and this entire competition to doing something more noble like dedicating it to be better at math and discovering new things in physics. So the way I see that, that's pretty noble. But at the same time, I feel like that's a result of having problems hiring in order to be competitive with the other ones. 48:26 (Speaker B) So, yeah, this will be interesting. But the way I see the whole set up right now is, as the kids say, it's pretty mid, in my opinion. 48:39 (Speaker A) As the kids you don't use with that. I will say that we will see tomorrow from their space. They're probably going to use Elon's Cloudto maybe try to hire and it's probably harder now to hire because everybody knows how quick they're getting fired and how much. It's not like super fun to work for X, but we're in for a nice ride because they do have access to the cross pollination from Tesla as well, right? So if they have big questions, tesla does have a few good folks still, even after Andre Capati left, and so they'd be ableto ask them for assistance. 49:20 (Speaker A) There's obviously the whole Dodgy thing in play, which we can I don'tthink we have time to talk about Dodgy, and it's not new, but there could be something there. Gabriel, you wanted to come up? Maybe you have. Yeah, go ahead. 49:33 (Speaker A) Gabriel. 49:34 (Speaker E) Yeah, I was just going to say about Xai, I mean, you mentioned Twitter's data, and I'd be interested in hearing other people on the stage opinion on this because recently there's been a lot of work done on quality of data over quantity of data. And of course, Elon also has a ton of GPUs. Reportedly, he's bought tens of thousands of GPUs. So that's definitely important in building these big models. 49:58 (Speaker E) But I'd be interested in hearing from people on the stage if they think Twitter's data and the kind of data that Twitter has is actually going to be really powerful for training good models. 50:11 (Speaker A) Anybody wants to take this? 50:13 (Speaker F) Yeah, I'll take a little of it. One of the things that Twitter has that other people don't is that people are actually debating issues. So I think that's one of the reasons why he's really focused on the idea of Twitter being a source of truth and being sort of unrestricted so that you're not just following like, one thread, you watch the narratives being debated and he has access to all that. 50:35 (Speaker A) Data and community notes. And it's really hard to scrape. Like, I don't think it's API ball at all. It's not super simple to scrape at all. 50:42 (Speaker A) I want to get yum before I think Matt wanted to unmute and go and then yum. If Matt, you still want to chime in and then yum. 50:53 (Speaker D) Yeah, I mean, nothing too much to add here. I think the community notes are very interesting as a way to sort of like, reduce hallucinations. I think one of the things that they're going to want to do heavily is invest in sort of filtering that data set because there's a lot of great stuff on Twitter. There's a lot of crap on Twitter. 51:07 (Speaker A) A lot of yeah. 51:09 (Speaker D) And the more of that that seeps in, the worse the model is going to perform. Obviously, scale is important, but data quality is incredibly, incredibly important and the scale kind of doesn't negatebad data quality. So I think if they do one thing right, it's going to have to be getting the sort of filtering of the data set down. Butthey do have a ton of incredibly high quality data. 51:27 (Speaker A) Yes, I think Yam was next and then we have a few folks wanted to comein. I think Pharrell wanted to come up. So yam. And then pharrell. 51:34 (Speaker A) And then Gabriel. 51:37 (Speaker C) I just want to say, of course, if you just take Twitter data and start training your model, you can expect it to be average Twitter, which is not what you want. What you can do, which is a gold mine, isto transform this data or just rephrase it as other forms. And this just makes the data a gold mine because Twitter does have very high quality content here and there. Absolutely. 52:05 (Speaker C) If you can, and transform it and rephrase it to a different form if you want an example. So the paper textbooks are all you need. Basically, they just take data and make it into a tutorial, make it into a textbook, like perfect, clean and everything. 52:22 (Speaker C) It is very easy to do, and you don't need a powerful model to do that. You don't need chachi PT. You can use it to do it with a small model. 52:30 (Speaker C) I'm currently doing off the record, I'm currently doing it myself in a large model I'm training. It doesn't it doesn't matter matter anyway. It's a gold mine. 52:43 (Speaker C) What I'm saying, it's a gold mine. 52:45 (Speaker D) About Twitter. 52:46 (Speaker A) An additional thing before I get to Farrell and then gabriel additional thing. NISten I talked about yesterday at length in our late night line cook space. That's not going to be scheduled. If you guys are on, feel free to join that one. 53:00 (Speaker A) Twitter Spaces is also a gold mine. Transcribing Twitter spaces and seeing all the reaction emojis that they have in real time. Like the space that Elon ran with RFK Jr. For example, if you know in the audience who are actual people instead of bots, and you're able to get like emoji reactions in real time, that's a definite, definite, very high signal kind of training set that they have and almost nobody else has. 53:25 (Speaker A) And through how to get Pharrell, you are next, I think. And then gabriel yeah, I wonder what. 53:30 (Speaker D) The relation is and how useful the Twitter data will be for their goal of building a sort of math reasoning machine. Right. Also, do weknow if they're open source, as in truly open source or not? 53:49 (Speaker A) No, we don't know yet. Hopefully tomorrow we'll be able to answer questions. However, we've seen Elon take Twitter's algorithm to open source, and now he's like, boasting this comparatively competitive advantage versus something like Threads. He's saying, like, hey, opensource. 54:07 (Speaker A) If you go to Threads, you're under the Zucks influence algorithm. So there is definitely an attempt to open source from their side, but wedon't know anything about that beyond that. Gabriel. 54:17 (Speaker A) And then Johnny. 54:20 (Speaker C) Yeah. 54:22 (Speaker E) First of all, I think it's funny that Elon's s**t posting is polluting his data set. I would say that. 54:34 (Speaker A) By the way, if there's anybody with the option to detect S**t posting, it's them, right? They're going to be able to build a model.Understand, this is s**t post. This is like somebody who made an effort to give us clean information. But sorry, go ahead. 54:49 (Speaker E) Yeah, that's exactly my point that I was going to make, that Elon wason this crusade before he bought Twitter. And this is kind of why he got forced into buying Twitter, because he was going after the bots and he made a big deal about the bots. And I think they spent a lot of resources on figuring out what's good content and what's bought content. And another thing is that we each are kind of experiencing adifferent Twitter, right? Because we're within whether it's an ML Twitter or Israel based Twitter, and there's many different communities and their Twitter is very good at segmenting those communities and figuring out which content belongs to what community.54:55 (Speaker E) And they'll have the ability, I think, to segment this data and trainmany different models that are good at different things because they're in a literature community or in an ML community or MMA community or whatever. 55:37 (Speaker A) I actually saw a map of like 5 million, 7 million tweets all embeddedin Nomic Xai Atlas. I don't know if you guys follow Nomic, they just recently announced like a 17 million round A, by the way. So kudos toNomic good friends. Andre, the GPT for all team, and they have like an embedded map before the API was shut down that they were able to siphon, et cetera. 56:00 (Speaker A) And Gabriel, what you're saying is actually visible in the embedding map. You can actually see those tweets and then different areas of the political Twitter. There was a journalist Twitter until all of the journalists started living there's like a bunch of different pockets of Twitter that we don't get exposed to, not to mention the different languages. 56:20 (Speaker A) There's a whole Japanese Twitter that's like insane. And people go super, super hard. And translating is easy. 56:26 (Speaker A) We talked about Cloud being able to translate. So they have a bunch of very interesting data. And I think Zuck is also going after that data with Threads. 56:31 (Speaker A) And I think this is the reason why we'll see Threads getting continued work and we'll see a lot of investment from their side. Butto compare to Threads, and we talked about this yesterday, is that Twitter has back history and a lot of historical data that they can train others. Threads is fairly new as well. 56:54 (Speaker A) So definitely a bunch of interesting data sets. Johnny and then Lentil. Hey. 57:00 (Speaker H) So one I think about when I think about the data from Twitter that ispotentially lacking and some of the other data sets is colloquial language. Because what Twitter has that Facebook doesn't have and a lot of other things don't have, especially from what you're talking about, like historic, is the way that people actually interact with each other. You know what I mean? 57:26 (Speaker A) Not only that, how it evolved as well, right throughout exactly. 57:35 (Speaker H) To be honest, I think the data sets from earlier is probably better and stronger because it's just gotten out of hand. But I agree with what I'm not sure it was Yam or who said the filtering because all right, this is black box, it's not open source. Elon has not been shyabout his kind of response to what he perceives as wokism and all of that stuff. I'll be super curious. 57:36 (Speaker H) I mean, there's a big team on this, but I will be super curious to see what that bears out in the actual model. Because, God, there's equal parts or more parts disinformation on Twitter than there is information. So if we're talking about source of truth, that rings some alarm bells for me, for me personally. 58:21 (Speaker H) So those are just my thoughts. 58:29 (Speaker A) Yeah. Thanks, johnny Lentil. Go ahead. And then Gabriel. 58:33 (Speaker A) Let's finish on the Gabriel and then we'll move on to the next topic.58:36 (Speaker H) Cool. 58:37 (Speaker A) Yes. 58:37 (Speaker H) So I think it's going to be hugely bullish for this data. And from the perspective of relating idea space and people and the relations between those, I think that's probably going to be more of a goat information than conversation because you can build so much from that. Like dating this is just one like a dating thing. Or finding people, finding brain power compute, that's going to be huge. 58:40 (Speaker H) And to touch on the open sourceness of the data, I think not open sourcing it at some point is going to be hugely politically bad for Elon to do. 59:23 (Speaker A) That'S. 59:23 (Speaker H) My thoughts on that. 59:24 (Speaker A) Awesome. Thanks, Lance. Gabriel, let's end up and then, Matt, we're going to talk about some interesting stuff. 59:31 (Speaker E) Yeah, just on the kind of data. I think for those of us who ran, like, the early versions of Llama before they got fine tuned in all kinds of ways, and you run it, and especially the smaller models, youput in a prompt and it spits out some generic Facebook type of content. It sounds like a Facebook post of like a 15 year old or something like that. That shows what you get when you use all this kind of unfiltered data. 59:59 (Speaker E) But I think the interesting thing is that Llama was then fine tuned in many different ways and some really powerful models are built on top of it. So I think in some sense, almost any data is valuable in the sort of pretraining stages and maybe you need really high qualityfor the fine tuning, but I think that big volume might be really useful, maybe not the most economical. 60:21 (Speaker A) So I want to wrap up things why they potentially have like a leg up versus not a leg up. We definitely know that Twitter was used to train other models that we currently use. We know this for a fact. This was the reason why Elon and Sam Hoffman, who used to be friends,are no longer friends, sheet posting about them. 60:40 (Speaker A) And the current models we use. Do use this data set, but it's old forthem. It's no longer like recent and relevant. 60:40 (Speaker A) And we know for a fact that Twitter is significantly biased and probably the best place in the world for uncovering news as they happen before the bias sets in, before the narrative sets in, before folks know how to before folks get their marching orders from MSNBC, from the Other Side, how to think about things when not. The Twitter is really good at talking about issues as they arise, the second theyarise. And I think that on its own is going to teach the models a very great deal. 61:16 (Speaker A) Naval Ravican, if you guys follow Namal, he always said Twitter makeshim a better writer. So we definitely know also that tweet in short form condense information better. And if their model trains on that, obviously taking all the precautions we talked about before, bots ands**t, posting, et cetera, if they're able to actually get this into the model, likely their model will be more up to date and more fine tuned like reaction. 61:20 (Speaker A) So with that, I want to close. We'll see about Xai. It's definitely exciting, right? We're potentially getting another big one, potentially open source one. 61:20 (Speaker A) So we'll see. I'm going to wrap up this update and I think the next one I want to move on. Matt, let me know if you're still around if you want to cover. 61:20 (Speaker A) So we have Matt, who introduced himself in the beginning. So I'll letyou do this quickly again because maybe and then we're going to talk about the stuff that GitHub Stars is rising on, which I think is super cool. And I invite you to give us a little bit of an interview about this. 62:16 (Speaker A) Go ahead, Matt. 62:17 (Speaker D) Yeah, sure. So I'll try to summarize it a bit better than the last time. A lot of practice, but very long story short, co founder, CEO of Other Side AI, creator of Hyperwrite, and a number of other things. Basically, we've been around for a number of years now. 62:30 (Speaker D) We're one of the first companies in the space working with LLMs. The goal always has been to build a personal assistant that scales to everybody, just like a real human personal assistant, but at scale, way cheaper, digital. The tech wasn't there at the beginning. So we built other products to sort of learn and gather resources, whether that's users, revenue, bunch of other things that we can do. 62:50 (Speaker D) What we do today. Today we are actually building that personal assistant. So an AI that can operate a computer, any software to do what a human can do on pretty much anything. 62:53 (Speaker D) So it'll help you with your tasks. It's very simple. Today it's a Chrome extension that lets you sort of like control Chrome just by sort of talking to it. 62:53 (Speaker D) So you could say, go order me a pizza, or go send this person an email or go filter my email, or anything else it works okay today. The idea is that over time, it's going to get a lot better, a lot cheaper, a lot faster, to the point where six months from now, a yearfrom now, it might actually be as good as, if not better than a humanon many tasks. But that being said, while I work on this, I also liketo learn about getting the most out of these technologies because they're so fast moving and you really have to stay on top of it to beeffective, or you. 63:34 (Speaker A) Can every week and then stay up to date with us together. But yeah, go ahead. 63:40 (Speaker D) Exactly. I mean, a lot of what I do to learn really, is just build things that I find interesting, and I find that often, even if I'm not expecting it, a lot of those learnings do translate to stuff we're doing at other sides. So this sort of just came out of that. Happy to sort of dive into the project, or if you want to sort. 63:56 (Speaker A) Of stop me and let's pause here for a second and I'll just tell folksthat I pinned Matt's Tweet from a couple of days ago with the introduction. Since then you got a few thousand stars, I think, on GitHub, and we're going to talk about the GPT Prompt Engineer projectand the different reasons why Matt and folks kind of written this andwhat it's here to serve. So maybe give us an introduction to the GPD Prompt Engineer and what kind of made you come up with this and how it works. Yeah, go deep, man. 64:29 (Speaker A) Sure. Yeah. 64:30 (Speaker D) So forget about rambling in advance. Essentially, I find prompt engineering so fun. I've been doing it pretty much every day for everything, honestly, to the point of excess, from what I would do for work to having it decide what I'm making for dinner for years now. And as I've gone through this process, sort of like learning howto use these models, it's become very clear that especially as these models evolve, there's no best practice for anything. 64:54 (Speaker D) Prompts change ways to prompt change. Something that works for one task might not work for a very similar task. And the only way sort ofget out of that is to sort of get an intuition of the model and try alot of things, but that doesn't always work perfectly. 65:01 (Speaker D) And also you don't really know kind of what works and what doesn't. Even when you're trying things right, you have to do it sort of like in a very scientific way, but there's no real right answer to anything. It's kind of like alchemy. 65:18 (Speaker D) So starting to think I think this was right. When GPD Four came out, I was using GPD Four pretty often to just ideate prompts. I would say, here's what I'm trying to do. 65:20 (Speaker D) I would say, write a prompt me, and I would use the ideas from that to help me improve my own prompts and that actually got a lot of interest. We ended up building a sort of thing similar to that into the hyperwrite platform. At the time it was really cool, but really wasn't something that would replace what I do every day, which is really hardcore prompting. 65:43 (Speaker D) Eventually I was just sort of thinking about it, and I think this wason the 4 July, I was just sitting there kind of thinking, what if we tried it? And I started thinking about how could you design a system that actually comes up with good prompts? Not just a prompt that doesthe job, but something that's actually optimal, because as humans, right, we can only try so many things at once. But the magic of theseLLMs is they're creative and they think faster than we do. In the time that I could write half a prompt, LLMs could write 5100. 65:48 (Speaker D) And what if you could leverage that? Because even if the average prompt isn't very good, you're going to luck into one or two that happen to be exceptional for your task. So I started by doing it actually with a classifier. I only released this notebook yesterday just because it's like a step on the road. 65:48 (Speaker D) And what we ended up using it for was actually something at other side where we needed to build a classifier for something with personal assistant. And I just wasn't getting good performance out ofthe prompts that I was writing. So I said f**k it, what if we have the AI try to do this? And I built this so that essentially I describe the task, I give it some test cases, so I'll give it some true false test cases. 66:11 (Speaker D) Because the classifier was classifying things as true or false. It was like classified the statement as true or false. And it was like New York is in America, it would be true. 66:54 (Speaker D) If it was new York is in Paris it would be false. And I basically created like ten or 20 of these test cases. I described the task and I had GPT generate something like, I think 20 or so prompts. 66:57 (Speaker D) And surprisingly, the quality of them just at first glance was prettygood, right? It was kind of shocking considering I spent so much timetrying to do this manually. Then what I did was I just basically had each of these prompts test against each of these test cases. And I plotted sort of the success of each and turns out some of them actually outperformed what I did. 66:57 (Speaker D) I was kind of shocked, right? Like you wouldn't expect that, especially doing this for years. 67:30 (Speaker A) Just to recap real quick on this, the GPT four, I assume that's what you're using generated prompts actually performed better than Match rumors. Prompts and Matchroomr is the founder of a prompt company with a lot of prompt use cases for a long time, from GPT-3 to four, et cetera. And some of the ones that it came up with performed betterthan yours. 67:52 (Speaker D) Yeah, it was kind of scary. Some of them performed way worse. But theidea is that you're going to sort of luck into something that is better. Maybe two out of 20 will be better, but they're great. 68:02 (Speaker D) So I was sort of just so fascinated by this, I was like, how do you take this further? Because classification is one thing, but real prompts where you're actually having it generate text, those are harder. How do you judge that? You could use GPD four to judge them, right? If you have two prompts and you say each of them generate me something and they give you your responses and you want to know whichis better, you can ask GPD four. And so I figured we could apply that. 68:29 (Speaker D) Turns out there's some issues with that and there are some papers written about this where essentially it'll be sort of like more favoring the one that's on the bottom. So just do it twice, flip the order and see if one wins. And I took that approach and I sort of combined it with sort of like an ELO style tournament where essentially you have each of them go head to head, like one on one, and each of them gets their ELO score either bumped up or down based on whether they win, lose or draw. 68:53 (Speaker A) Can you give two sentences on ELO scores as a concept? Yeah. 68:57 (Speaker D) I'm actually not super familiar with it. Funny enough, I had GPC write the code for that part, but basically think of it like a ranking system in a video game. Yeah, think of it like a ranking system in chess or a video game where you have two people competing and the one that wins gets their score increased by x. The one that loses gets their score decreased by x. 69:18 (Speaker D) And it also sort of like weighted based on the previous scores. So ifsomebody that has a high score beats somebody with a very low score, their score won't increase that much because they're very likely going to win. So it's sort of just like a weighting system to help figure out what's the best so instead of just sort of getting a clearcut, yes, this is right, or no, this isn't what you can do with classifiers, because there is a right and a wrong ground truth answer. 69:39 (Speaker D) I just had each prompt sort of generate for a test case and the sort of opposite prompt the competition prompt would generate for that test case. So I was a little bit complex and they would have the model judge which one was better. And it's expensive, right? It mightcost like $20 in GPT calls to get to an answer, but turns out at the end, the prompts again were just kind of blowing me away. 70:04 (Speaker D) Awesome creativity in them. Like the words it used, the trigger words, it didn't do what I would do. And in a really good way. 70:10 (Speaker D) And it also opened up my eyes to sort of like new ways of prompting that I never would have thought of and just sort of like aren't standard. And that's kind of the magic of all this. I think that thissort of abstracts away the sort of atomic level of prompts, right? You talk about prompts as sort of a prompt in and of itself and then a system built around the prompts with many prompts kind of working together. 70:31 (Speaker D) This makes it so that you don't have to guess about, do I have the best prompts for this single atomic part of our system? Where the magic really comes in then, is how do you string these amazing individually crafted by AI prompts together to make something that actually works really well. 70:46 (Speaker A) And how you robustly build the evaluation system, right? Because the classifier is a simple example of evaluating, because maybe you know this, et cetera, but how do you actually scale up the evaluation system such that this could potentially run in loops and then generate the best of the best prompts for a task? 71:03 (Speaker D) Exactly. 71:03 (Speaker A) That's also like a very interesting piece. How do you think about evaluation going forward? 71:08 (Speaker D) Yeah, so I think it's sort of like that, where you could have this thing run in the loop three times and take the three winners and thenhave GPT read those winners right, and be like, here are prompts thatworked really, really well. Here are the test cases where they failed. Now I want you to write new prompts that take what's good about these but also mitigate the failure cases and generate a whole new set of prompts. Sort of like evolution really doesn't just have to stop at one point in time after the first run. 71:37 (Speaker D) It's like, let's learn from what these amazing ones still did wrong and continue to make this better and better and better. Obviously, this relies on a relatively large test set. I'm also experimenting with ways where you can have the test set autogenerate, but that's a little bit finicky. 71:50 (Speaker D) But I do think that sort of like evolution of this could lead to somereally exceptional prompts. But what I found was even on the first run I was seeing it outperform myself. For example, there was a classifier we were using GPT four with logic bias to do because it was such a hard challenge and we were getting some like 90% accuracy.71:50 (Speaker D) I had it do these prompts with GPT four, but then I had it run them using GPT 3.5 and it got 96%. 72:19 (Speaker A) We've talked about this pattern before where you can outsource kind of the hard work to GPD four, but then once you get really good at prompting, GPD 3.5 is actually very decent in many things and it's way faster, cheaper, and has a 16K context now that you can use. And so we've seen this pattern with many folks that if you don't need thefull power of the GPT four, human evil for coding, et cetera. You cango far into GPT 3. 5 and get very far along, especially as you're getting better prompts. And now, Matt, you have like a recursive crafter helper guy that's here. And my next question for you is, have you used anything else? So you mentioned GPD 3. 5 where you run the prompts. Have you tried them on different models,like Cloud maybe, or the open source llama ones? 73:07 (Speaker D) I actually haven't just because I wanted to see if this worked. It was sort of just an interesting thing for me and my time is really focused on other side and personal assistant, but it wouldn't be hardto get Claude in. I suspect Claude prompts would perform better on Claude. Open ad prompts would perform better on Open xai just becausethe models give the prompt them very differently. 73:18 (Speaker D) Claude is sort of like a more emotional thinker. Open xai is more of like a logical thinker. It's a very sort of simple, not perfect analogy, but I suspect you'd want to sort of like stick within the. 73:36 (Speaker A) Ecosystems, maybe, not to mention inflections pie, which is like a whole different beast. 73:41 (Speaker D) Yeah, that's an interesting one. 73:44 (Speaker A) We discussed by a couple of times and I've seen some reactions, but Idon't think maybe at the end of this, if we have time, matt, one question I will have for you on this and I think we'll move on. Is that where folks can find more work of this? Is it open source? What are you looking for contributions? If you are. And yeah, just give usa wrap up of this project. 74:07 (Speaker D) Yeah, so you can find it on GitHub. It's called GPT prompt engineer Currently there are two notebooks. It's all done in Jupiter notebook format, so it's pretty easy to edit. One is for the classification system, the other is for the generation system. 74:20 (Speaker D) We're honestly sort of like at a point where it works well, so it's like, what do you build around it? One thing that's missing is the classification version only supports true and false labels, but it's not hard to use TikTok into or TikTok and whatever it is to allow it to support arbitrary labels like happy, sad, angry, whatever. That's probably like a 20 minutes ad that if somebody goes in and does that opens up a whole new set of use cases. The evolution idea that I mentioned before, right? Taking the best prompts and then saying, here's where it went wrong on these test cases, and then throwing it back to GPT and having it generate more and rerunning it, that's interesting. 74:45 (Speaker D) The ability to use Claude would be awesome if anybody wants to add that. I could even see it evaluating each prompt on each model, right? Because right now we only generate with GPD four. We only evaluate with GPT 3. 75:19 (Speaker D) 5. But imagine if you generate with GPD four half of them, you generate half of them with Claude and then you evaluate each prompt on GPT four, GPT 3.5 and Claude. 75:27 (Speaker D) And you can see sort of the latency success rates for each along withscores. I think all that would be super interesting. Also sort of like just open to ideas. 75:40 (Speaker D) I'm not really sort of supporting this at all. So if anybody wants tokind of take it and run with it, I am all for that. Also sort of justlike a shameless plug right now or thing that we're looking for just because I have an audience here. We are at other side in hyperwrite, really looking for somebody to help on back end hopefully with a security set of expertise. And thenalso if anybody is experienced in training machine learning models, Iwould love some help there because we're doing a lot of LLM training.75:55 (Speaker A) So just quick thing and also to add that now with the Prompt Engineerthat's automated, the results of this would likely generate like a great data set that you can add and continue fine tuning, especially as GPT four fine tuning is coming soon. So Matt, definitely store everything you generate with the yellow score and everything and froma GPT prompt engineer that runs and doesn't know about the rest run, maybe there's going to be a path forward to actually fine tuning a prompting model, which could be exactly. Well, yeah, exactly. 76:28 (Speaker D) Imagine taking a prompt and taking one that has a slightly higher score and fine tuning a model to take the initial prompt and then sort of output the one that has a higher score and you can do that evolutionarily continue to get better prompts in theory. 76:40 (Speaker A) Awesome. So folks, if you want to work in a cool place, I can write, hit met up and also check out GPD Prompt Engineer on GitHub. Thanks for coming. Feel free to stay and kind of continue commenting and talking with us as we go through a bunch of other updates that we have. 76:57 (Speaker A) Just a quick check with NISten who promised me to follow Twitter and see if anything new comes up. Breaking news as we talk. I haven't seen anything besides the space of Xai. 77:04 (Speaker A) I will ask people's attention to the last pin tweet from Dr. Jim Fan that talks about the context length dip. Matt, you also touched on this context length dip. It's basically a paper, I think. 77:22 (Speaker A) Stanford I'm not sure that figured out. That even longer. Context windows, they have a dip in the middle, which means that at the beginning of the prompt at the end of the prompt, the model has more attention to what you actually asked it to or the details that you provide in the middle there's like a dip. 77:39 (Speaker A) And this was also released this week. However, the one thing I said previously I will repeat here claude and some folks who know about contact windows way more than me. They say the Claude is actually really good at this without the dip. 77:54 (Speaker D) Yeah, I feel like that's saying. It's an interesting paper. I feel like it's sort of saying like, hey, if you train on marketing copy, then it's going to be worse at coding, obviously. Right. 78:03 (Speaker D) We do a lot of long context stuff at other side. That's actually whatI'm focused on right now, training really long context massive models. And if you train it on data where there's context in the middle that matters, it is going to be good at that. 78:16 (Speaker A) Interesting. So what you're saying, I think I've seen this kind of opinion before as well. It's just the outcome of the data that was fed in and for blog posts and other places, people want to hook your attention in the beginning and then kind of finish strong. Basically you're saying that this is potentially an outcome of that and not necessarily the tech behind it. 78:38 (Speaker D) Yeah, I believe so. I mean, who knows, maybe wrong, but from my experience, right, why I was given that analogy before is like if youtrain it up to do one thing and then you're asking it to do another, it's not going to do that other thing as well. And I'm guessing the data set that they sort of did this evaluation on was something that didn't have a ton of information at all. Part of the reason that so few of the language model companies have super long context length models and why it was such a big deal that Anthropic did is because alot of the challenge in training them isn't actually in training them, it's in the data. 79:08 (Speaker D) Obviously, inference becomes a challenge. It's the cost and the overhead there. But the data to sort of do this is really sparse. 79:10 (Speaker D) It's not very available. Right. So that's I think part of it right there's not just like a sort of standard data set that has super longcontext link, that has information in the middle. 79:25 (Speaker D) We do actually we've been building one another side and that's sort of given me some of the ideas that I'm sort of spouting here. But my guess is that Anthropic part of the reason theirs works is because they focused on the data. The data is really important. 79:38 (Speaker A) Right. 79:39 (Speaker D) I will say model, it's just fine tuning. 79:41 (Speaker A) Yeah. I will say when I got access to Clouds Window, I did like a bunch of tests with my Twitter data. I just pasted like a bunch of JSON with Twitter numbers, twitter IDs numbers. And the smaller model, the not 100K, gave me back results that actually didn't inventthose numbers. 79:57 (Speaker A) The 100K model lost in the middle and started inventing those numbers. I literally saw this difference between the longer complex one and the previous one and I thought it's because of like it loses some complex in the middle. And I need to retry this on the new ones because the new ones, they claim this doesn't happen with that. 80:01 (Speaker A) I want to go to Al and yeah, one of you I think raise your hand firstto talk about the context length dip and that paper if you have read this, if you have thoughts and if you have noticed this as well. 80:29 (Speaker F) I just had a quick question for Matt about the differences that he found in prompting between say, Claude and GPT Four. I noticed like, the prompts aren't really reusable and maybe you could speak to that in the general case. 80:42 (Speaker A) Yeah, let's end with maybe this question and move on to other updatesas we have. Go ahead, Matt. 80:48 (Speaker D) Yeah, sure. So it's like talking to two people with two different personalities, right? They're both people, but they respond differently to different ways. You're sort of prompting them, if you will. Claude is sort of like more emotional, I guess, where open xai is sort of more logical. 81:03 (Speaker D) And it's hard to sort of pin that down to any one thing, and it's hard to give you sort of like techniques based on that because, again, every use case is very different, but it's very clearly it's aprompt them differently. I think also talking about the idea of fine tuning a prompting model will be very interesting is fine tuning a model that takes an Open Xai prompt and converts it to the idealized version of a Claude prompt and vice versa. I mean, I think that couldbe very powerful because there are ways to sort of intuit your way there. 81:29 (Speaker D) It's just hard to sort of distill into a set of rules. One thing I found actually quite interestingly with Quad two is that it is insanely resistant to sort of like jailbreak attacks. So I was able to get it to do it. 81:44 (Speaker D) Turns out the stupidest method worked. It was sort of like modifying that dan prop that's been going around like reddit but the more nuanced sort of like complex methods that typically work with OpenAI they didn't. So I think the model is just qualitatively different. 81:56 (Speaker D) I think it's going to take some time to fully explore it and understand why and how still super early days. 82:06 (Speaker A) I love the fact that all of us are getting an intuition about different models and how to approach them right. And that's like Sweet was here before. This is like a specialization of what I think he talked about as an AI engineer. We're getting to start to understand the differences between those to the little fine little things that you can say. 82:11 (Speaker A) And I think it will be very interesting if you have a model that's trained to actually convert them or translate them between the modelsto work the same. I have an idea where not to get locked into the GPDFour ecosystem with the functions. I have an idea of wrapping the GPDFour API package with something. 82:47 (Speaker A) They will actually kind of print the functions into the context because cloud now has a huge context window. And then try to see whether or not cloud is able to kind of without additional tech, without additional changes to the API to replicate the outputs of howa GPT with functions would do. And that's going to be an idea I'll betesting, hopefully, and talk about next week. 83:08 (Speaker A) Thanks, Matt. 83:10 (Speaker C) Today, there has been a thing today, maybe yesterday, but anyway, today there have been a model that generates prompts. By the way, by giving the data, you generate the prompt. I've written about it todayon Twitter. It is so powerful, it is such a cool method that you can take whatever you have, like, I don't know, scientific papers and generate instructions for them. 83:32 (Speaker C) Now you can fine tune a model that generate scientific papers. You got jokes. Now you can train a model that become funny. 83:35 (Speaker C) You can generate the instruction, convert whatever you want into instructions. Amazing it is today. One more thing about the deep in the middle thing. 83:51 (Speaker C) I don't know why it happens. I have no idea how Open Xai trained their models. But I think if you think about it, many missions, many instructions, paragraph, and before the paragraph, you tell the model, please summarize the following, or on the contrary, like a paragraph and at the end, what was that? Something. 84:10 (Speaker C) So it makes a lot of sense that a model pays a lot of attention to the beginning at the end, because of this. And on the same note, it'svery easy to fix. So I wouldn't just point fingers. 84:21 (Speaker C) It's good that they pointed it, but I think it's like, I don't know, a couple of minutes of training, open AI, like, fine tune for a minute and fix it. 84:28 (Speaker A) I just want to ask yum, yum. The the pin that I just tweet sorry, theTweet that I just pinned on top, this was the one that you talked about, the instructions generation and the problem generation. 84:38 (Speaker C) Yeah. 84:39 (Speaker A) Awesome. So folks, definitely feel free to check this out. I haven't seen this. You want to give a couple more words about that one. 84:44 (Speaker A) It looks like you wrote, like, a very deep dive. What's the model like eleven B, three B? 84:54 (Speaker C) Sure. Two models put into the models, whatever you want. Okay, let's go back. You got a data set of something, emails from your company, for example, and you want a model that will help you write emails. 85:01 (Speaker C) Okay, you can start thinking about how to train this model, or you can use this and now generate a text that basically says, help me write the following email to this following person of something something and the actual email. And all of a sudden, you have a modelthat is extremely you have a data set to train a model or to fuselageor whatever that is extremely tuned to this. So I think it's a very cool technique. 85:40 (Speaker C) It's very powerful, has a lot of potential. And the trick, in simple words, is training the model. What not to say? That's the missing piece here, that they added the trick. 85:51 (Speaker C) They took instructions and outputs that do not fit just a different random output from the data and train with a different laws. That themodel should not say this because this input does not with that instruction, does not result in this output. That's it. 86:11 (Speaker C) That's the trick. And it works perfectly and really cool. 86:17 (Speaker A) Awesome. I have some folks who want to come up and ask questions. I think we're almost there in terms of the updates. I will just brieflyrun to some updates. 86:18 (Speaker A) I don't even have time to go and look for the threads, but if you're not following Rama CPP, follow gerga is one of the groups that we have in the States. I think he single handedly is in charge of so many folks trying to get a MacBook, because it's incredible how much performance they've been able to squeeze out of Llama. And it's comparatives. 86:49 (Speaker A) And many people just, like, quantize their models, basically make them smaller to run on this GGML platform that they have. The recent news that I have from over there, there's like two pieces of news. Last week, for those of us who were here last week, we talked about CFG. 86:58 (Speaker A) I forgot something. I forgot the guidance scale. And we talked about the CFG parameter moving from diffusion models that we know. 87:17 (Speaker A) Like, in stable diffusion, you can define how close to your prompt should the model generate the image. Somebody decided, I think, an illusion reaction. Somebody said, hey, can we have this control of CFG to our LLM generation? CFG is a classifying guidance scale, something like that. 87:37 (Speaker A) And they did it. The Chad GGR added this to Llama CPP. And so now youcan actually kind of pass a CFG control and fine tune. 87:48 (Speaker A) It's almost like a running fine tune to an extent. You can test the model to be closer, farther away from the problem that you have. Contrasting this with the stuff that we have on a GPD, four API, which is temperature. 88:01 (Speaker A) And I think, Matt, you mentioned something to logic bias, logged bias, something like that, right? Where you can ask it not to say certain things. So contrasting CFG, it's like a different beast that we now have a different control. And so GGML just merged into their platform. 88:18 (Speaker A) Definitely worth checking out. And the second thing is, I need to find the Tweet. Yesterday, Georgia was like, oh, yeah, by the way, here's the 48% inference speed improval that somebody just merged in.88:30 (Speaker A) Have you guys play and try this. For the 33 billion parameter model of Llama, somebody just merged in a 50% increase on inference speed just on the way. And I find this incredible because Gmail already runs many stuff on Raspberry Pi or whatever, iPhones, and now somebody's like, oh, yeah, here's a 50% increase in infinite speed. 88:41 (Speaker A) And then I think Nissan was here before he was talking about GGML runs on the iPhone, because iPhones, even from three years ago, have the same neuron chip that like the latest Max or some such, and that this performance boost on GGML also applies to iPhones as well. So, incredible stuff. And as we hear every week, we keep seeing leaps, incredible leaps in speed and performance. 89:15 (Speaker A) Definitely worth checking out GGML and the five folks that work on those stuff. GML comments, folks who use Llama, CCP, feel free to hopup and raise your hand and give us more updates from that length. I denied it. 89:28 (Speaker A) You are gay at the spaces, but sometimes as a guest as well. Other than that, I think we'll move on to some more updates and then we just have questions. No? Cool. 89:41 (Speaker A) So the next update that I have is from the diffusion side that we sometimes cover. We don't cover it often, but we do cover it from sometimes time to time. So two things from stability stable diffusion. 89:46 (Speaker A) We talked about Sdxl, the new Excel model that can generate 1024 images. We've talked about last week about the 0.9 weights dropping. 90:01 (Speaker A) Sdxl 1.0 is now available in the Stable Diffusion discord. If you've played with Me Journey before and you looked at Stable Diffusion, it's like, it's not that great. 90:05 (Speaker A) Stable diffusion sdxl one is really impressive. And besides being really impressive, they plan to release this open source. So we're going to see a bunch of folks fine tune loras and specific versions of the specific things. 90:16 (Speaker A) And I think it's like, incredible. If you want to play with those models and you haven't yet, go to Stable Diffusion discord and hit upthat bot and then Netflix let us know how incredibly different that is. And we're waiting for the wait for the Sdxl 1. 90:47 (Speaker A) 0 to drop. And I will mention this every day until the year mark. It's been less than a year since table Diffusion. 90:57 (Speaker A) It's been less than a year. I remember I think it was August 22 when they actually dropped the full open source model. Less than a year. 91:12 (Speaker A) And we've seen just such incredible progress. So, like Matt said before, it's really hard to keep up, but it's also really hard to internalize how far, just how far we're coming with those incredible leaps and changes every week. And again, to just plug in this Thursday I space. 91:21 (Speaker A) This is why we're here. Every thursdai talking about everything and everything that's changed and updated. And the other thing that I want to I see art in the audience with apart. 91:28 (Speaker A) If you play the list, the Excel, feel free to raise your hand to comeup. The other thing that they released, I don't know if you guys familiar with Clip Drop. So Stable Diffusion bought Clip Drop as a company and started implementing that interface compared to their Dream Studio interface. 91:49 (Speaker A) So ClipDrop is like a way simpler interface day to day release, something called Stable Doodle. Stable Doodle is I don't know if folks in the audience remember this. Meme how to draw an owl. 91:51 (Speaker A) Step one, draw a circle. Step two, draw some eyes. And step three is like, draw the rest of the f*****g owl. 92:06 (Speaker A) And then you have, like, a beautiful owl painting at the end of this.This is now the go to test on how the Doodle models work. And I pinned my attempt at this, but definitely check out ClipDrop Doodle thing. It's really fun to play with. So those are, like, the updates from the diffusion world. 92:10 (Speaker D) Hey, real quick. I was just looking at the repository for Comfy UI, and then I saw that I don't know how to say his name. Scousekip is inhere. So I just wanted to come on and say, like, hey, this is incredible. 92:24 (Speaker D) This is what we've been talking about for months now, right? This node based character codex, if you will, of like there's just infinite possibilities. I just want to listen, but thanks. 92:35 (Speaker A) For bringing me up. 92:36 (Speaker D) This is really cool, man. I was just thanks for bringing up Comfy UI.92:42 (Speaker A) I feel guilt at not being up to date on every single possible thing. I know it's impossible. I really try, and Comfy I has been on my listto try, but then Quad was released and Code Interpreter was released.Comfy I seems like the thing we want, man. 92:42 (Speaker A) I think stabilization when they tried to bring up Dream Studio, they talked about, like, a node based thing where you can pipe models to other models, you can find filters, et cetera. Comfy UI for folks whohave tested it out, it looks like that's it. And I definitely want toagree with Art. 93:16 (Speaker A) It's something to watch out and maybe try because automatic one on one, even though it's, like, super advanced and has been there for a beginning since Stable Diffusion, it's just like a s**t show of a UX.Just like horrible, horrible. I'm sorry, guys. 93:30 (Speaker A) I've built a web UI before automatic. It's really hard to get Gradio to play as much as you want. It's really hard to maintain a good UX product with many, many people contributing, with many, many things are changing under your feet. 93:45 (Speaker A) So it's really not their fault, but it's a s**t show to get started with. And Comfy UI seems like a fresh, clean start. So definitely if you're playing with this, test this out and let us know. 93:55 (Speaker A) Max, you have your hand raised and you play with the Excel. Give us some of your thoughts. 94:01 (Speaker I) Yeah, I have played through the website in a studio, so I'm lately working with a company that make toys for kids. They want to start incorporating AI. And one of my concerns we're working with them is like, okay, we want to generate images for kids. Something that is going to probably freak them out is two things that diffusion models have been lacking. 94:27 (Speaker I) One is the ability of painting things like complicated shapes or intricate shapes like hands. SD. Excel is not better at it. 94:40 (Speaker I) Another one is this concept of what is named like concept bleeding, which is this diffusion model tends to mix objects that are similar in shape or form is not good at it, neither. Now, I was reading the paper from Stability or the report. They claim they are outperformingMid Journey in five of seven categories now, mid Journey 5. 1, right? 95:12 (Speaker A) Just to make sure. Mid Journey since then released the new version also because we're in same pace, but yeah, they've compared to Mid Journey 5.1. Yeah. 95:20 (Speaker I) Well, now this is a report internal released by Stability. It's a paper, it might have some credibility, I don't know. I like the results. It's very close to me journey, but I think there is still one or two steps behind, in my opinion. 95:36 (Speaker I) What is different is what you have mentioned, Alex. Once they releasethe weight and we can see Lotus about this, I'm expecting to see the results that we can get because probably that is what is going to position this model like a step above Mid Journey, but not yet. This is my opinion. 95:58 (Speaker A) Yeah, definitely. And thanks for that. And I love folks coming up andsharing their opinion about these things. I will say on the top. 96:05 (Speaker A) Thanks Mike. Or I guess I know you're a new name, but I'm not sure ifI can if I should. 96:10 (Speaker I) Yeah, totally, totally have it, in my view. I'm Juan Spanish, living in Mexico and I like these things. 96:17 (Speaker A) We appreciate you coming up here on the topic of UIs that we've mentioned with somebody or somebody folks released Pinocchio. They call this the AI browser. And I want to highlight this because I wantto give you practical tips. Janae, I think, is coming in with some breaking news. 96:28 (Speaker A) I don't know if Janae wants to come up or can, but if you can, feel free to come up and tell us there's some news from Bard. Until we talk about Bard, the topic of UIs for those things, and you guys knowwe're mostly focused on the LLM side and the Engineer side. Less thanthere's a fusion, but we sometimes have love for both the above tool that you can download and not deal with the terminal, not deal with the bunch of stuff, unifies all of them. 97:08 (Speaker A) It's really nice. Check out the Nokio AI browser. I think it's open source. 97:12 (Speaker A) You download this once, it's cross platform, Mac, PC, et cetera, and then you're able to download Llama CPP, and then you're able to also download table diffusion. And then fairly quickly, without knowing how to code, without going through the terminal, without installing packages, folks here know that installing the packages is like a whole pain we all share and we all hate without doing all of that. That's the promise that they have, you are able to pipe Llama outputsinto stable diffusion. 97:38 (Speaker A) So Yam previously mentioned kind of the model that can do, and Yam and Method are talking about a method of generating prompts for LLMs,but also we know that there's models prompts to actually generate prompts for diffusions and they're trained on different and fine tuned on different ways to generate diffusion prompts. Right, and this Pinocchio browser is actually allowing you to run like an and then pipe the output into stabilization model and then see the outputof that. I think it's incredible that this exists and is downloadable. 98:07 (Speaker A) I haven't tried this yet. If you in the audience or somebody on stagehave tried Pinocchio, please raise your hand. I want to bring you up and talk about Pinocchio and your experience with this. 98:19 (Speaker A) And if we haven't, I want to bring this to our attention so that nextweek we're able to talk about this. This is added to my list of things I like. The Comfy UI that I haven't tried it yet. 98:29 (Speaker A) Anybody use pinocchio yet? No? Cool. I wanted to get Cocktail Peanut.The guy who wrote Cocktail Peanut. 98:36 (Speaker A) If you're in the audience, feel free to raise your hand. I don't think you are, but feel free to follow the thread. He goes fairly deep. 98:44 (Speaker A) And feel free to use and try Pinocchio by next week and then come up next week and talk about the differences between this and running automatic one one. All right, folks, thanks everyone for coming to another Thursday. I space. 98:58 (Speaker A) Hope this has been helpful for a bunch of you. We tried a few new things here. We tried to give updates, but also deep dive into a conversation with Matt and looks from the reactions here that maybe this is worth putting down on paper and sending out an email for those of you who want to maybe sign up for this and not don't have the time to listen to two hour spaces, so I'll definitely try at least to do that. 99:19 (Speaker A) I want to thank a few folks on stage that have joined consistently and providing a lot of signal yum follow Yam. He has great insights into models and training and different things al in the audience. Thanks always for coming up. 99:33 (Speaker A) Junaid is running the Denver meetup, and if you're in the Denver area, feel free to join us next week. Thanks for coming. Haven't seenyou in a while, buddy. 99:45 (Speaker A) Juan sorry. Yeah, I think Juan great. Maxi and Lentos has recently been joining us. 99:51 (Speaker A) It's been great. We have some more folks in the Evans who are regulars, and we invite you to also be regulars and come up and talk about Thursday. I will say this one thing, tag me in anything that's new. 100:01 (Speaker A) I would love that. And help promote the message for other folks. If you did like the space, this also really helps for more folks to get to the bottom of this for those folks. 100:01 (Speaker A) I didn't get to their questions. I apologize. I'm trying to keep thisas a balance of a high signal thing versus letting everybody questions as well. 100:22 (Speaker A) Last thing I'll say is about myself, a little bit consultant. I stay up to date so you don't have to. That's my tagline. 100:29 (Speaker A) If you're in the company and needs consultancy for somebody who's up to date on everything, I try to be that guy. Feel free to tap me in the DMs. And, yeah, thursdai folks, keep tagging us everything that'snew. We're going to try to cover next week with that. 100:34 (Speaker A) I thank all of you. Thanks for coming. Thanks for giving us two and ahalf hours of your attention. 100:34 (Speaker A) I really appreciate it. Attention is sparse and very important, and Ireally thank everybody who gave us, like, two and a half hours. Thankyou, folks. 101:00 (Speaker A) Hey, Alex, we really appreciate you. 101:04 (Speaker B) Thanks, Alex. 101:05 (Speaker H) Thanks for doing a good space and keeping us on track, actually. 101:09 (Speaker A) Yeah, thank you. 101:10 (Speaker D) Yeah, alex definitely want to kind of. 101:13 (Speaker A) Give our thanks to you as well. 101:15 (Speaker E) For curating an awesome space. 101:17 (Speaker D) I think I'm definitely not the only one that gets a lot of good signal out of this. And I know a lot of hard work goes into keeping yourself up to. 101:27 (Speaker A) Date so that you can share it. 101:28 (Speaker E) With all of us. 101:29 (Speaker D) So just on my own behalf, thank you. And I'm sure that is echoed by. 101:34 (Speaker E) A lot of people on stage and in the audience. 101:36 (Speaker A) Humble man thank you. I appreciate you. Thank you, folks. Have a niceThursday and bye next week. This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe