ThursdAI - The top AI news from the past week cover image

ThursdAI - The top AI news from the past week

Latest episodes

undefined
Aug 10, 2023 • 16min

ThursdAI Aug 10 - Deepfakes get real, OSS Embeddings heating up, Wizard 70B tops tops the charts and more!

Hey everyone, welcome to yet another ThursdAI update! As always, I’m your host, Alex Volkov, and every week, ThursdAI is a twitter space that has a panel of experts, guests and AI enthusiasts who join to get up to date with the incredible fast pace of AI updates, learn together and listen to subject matter experts on several of the topics. Pssst, this podcast is now available on Apple, Spotify and everywhere using RSS and a new, long form, raw and uncut, full spaces recording podcast is coming soon! ThursdAI - Is supported by readers, and I promised my wife I’d ask, if you find this valuable, why not upgrade your subscription so I can keep this going? Get better equipment and produce higher quality shows? I started noticing that our updates spaces are split into several themes, and figured to start separating the updates to these themes as well, do let me know if the comments if you have feedback or preference or specific things to focus on. LLMs (Open Source & Proprietary)This section will include updates pertaining to Large Language Models, proprietary (GPT4 & Claude) and open source ones, APIs and prompting. Claude 1.2 instant in Anthropic API (source)Anthropic has released a new version of their Claude Instant, a very very fast model of Claude, with 100K, a very capable model that’s now better at code task, and most of all, very very fast! Anthropic is also better at giving access to these models, so if you’ve waited in their waitlist for a while, and still don’t have access, DM me (@altryne) and I’ll try to get you API access as a member of ThursdAI community. WizardLM-70B V1.0 tops OSS charts (source)WizardLM 70B from WizardLM is now the top dog in open source AI, featuring the same License as LLaMa and much much better code performance than base LLaMa 2, it’s now the top performing code model that’s also does other LLMy things. Per friend of the pod, and Finetuner extraordinaire Teknium, this is the best HumanEval (coding benchmark) we’ve seen in a LLaMa based open source model 🔥Also from Teknium btw, a recent evaluation of the Alibaba Qwen 7B model we talked about last ThursdAI, by Teknium, actually showed that LLaMa 7B is a bit better, however, Qwen should also be evaluated on tool selection and agent use, and we’re waiting for those metrics to surface and will update! Embeddings Embeddings EmbeddingsIt seems that in OpenSource embeddings, we’re now getting state of the art open source models (read: require no internet access) every week!In just the last few months: - Microsoft open-sourced E5 - Alibaba open-sourced General Text Embeddings - BAAI open-sourced FlagEmbedding - Jina open-sourced Jina EmbeddingsAnd now, we have a new metric MTEB and a new leaderboard from hugging face (who else?) to always know which model is currently leading the pack. With a new winner from this week! BGE (large, base and small (just 140MB) ) Embedding models are very important for many AI applications, RAG (retrieval augmented generation) products, semantic search and vector DBs, and the faster, smaller and more offline they are, the better the whole field of AI tools we’re going to get, including, much more capable, and offline agents. 🔥 Worth noting that text-ada-002, the OpenAI embedding API is now ranked 13 on the above MTEB leaderboard! Open Code Interpreter 👏While we’re on the agents topic, we had the privilege to chat with a new friend of the pod, Shroominic who’s told us about his open source project, called codeinterpreter-api which is an open source implementation of code interpreter. We had a great conversation about this effort, the community push, the ability of this open version to install new packages, access the web, run offline and have multiple open source LLMs that run it, and we expect to hear more as this project develops! If you’re not familiar with OpenAI Code Interpreter, we’ve talked about it at length when it just came out here and it’s probably the best “AI Agent” that many folks have access to right now. Deepfakes are upon us! I want to show you this video and you tell me if you saw this not in an AI newsletter, would you have been able to tell it’s AI generated. This video was generated automatically, when I applied to the waitlist by HeyGen and then I registered again and tried to get AI Joshua to generate an ultra realistic ThursdAI promo vid haha. I’ve played with many tools for AI video generation and never saw anything come close to this quality, and can’t wait for this to launch! While this is a significant update for many folks in terms of how well deepfakes can look (and it is! Just look at it, reflections, HQ, lip movement is perfect, just incredible) this isn’t the only progress data point in this space. Play.ht announced version 2.0 which sounds incredibly natural, increased model size 10x and dataset to more than 1 million hours of speech across multiple languages, accents, and speaking styles and emotions and claims to have sub 1s latency and fake your voice with a sample of only… 3 seconds! 🤯So have you and your loved ones chosen a code word to authenticate over the phone? Or switched to a verifiable communication style? While those of us with multiple accents don’t yet have to worry, everyone should stop believing any video or voice sample from now on, it’s just inevitable that all of that will be deepfaked and we should start coming up with ways to authenticate content. If you made it this far, and any of the above was new/important to you, why not support this pod/newsletter/community? If you’d like to sponsor us more directly, please ping me at altryne [at] gmail.com , I’m also open to consulting, and if you’re a great company, Developer Relations positions :) Finally, we’ve talked for a whopping 2 hours on the spaces, and that whole conversation can be heard on our Zealous page which has transcripts, AudioGrams of key moments, and space summarizations! And the Long form space recordings can be added to your podcatcher separately if you’d prefer the “ThursdAI raw feed” by using this RSS link, and will come as it’s own podcast very soon! Thanks to our friends at ZealousThank you, Alex Volkov.Host ThursdAI - Recaps of the most high signal AI weekly spaces CEO @ Targum.videoAI Consultant with free slots (Lets Talk) This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Aug 4, 2023 • 26min

ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news

Hi, today’s episode is published on a Friday, it’s been a busy week with at least 4 twitter spaces, countless DMs and research! OpenAI announces UX updates* Example prompts: No more staring at a blank page! * Suggested replies: ChatGPT automatically synthesizes follow up questions. Then you just click a button* GPT-4 by default: When starting a new chat as a Plus user, ChatGPT will remember your previously selected model! * 4. Uploading multiple files is now supported in the Code Interpreter beta for all Plus users.* 5. Stay logged in: You’ll no longer be logged out every 2 weeks and if you do, we have a sweet new welcome page! * 6. Keyboard shortcuts: Work faster with shortcuts, Try ⌘ (Ctrl) + / to see the complete list.ThursdAI - I stay up to date so you don’t have toAlibaba releases Qwen7b* Trained with high-quality pretraining data. Qwen-7B pretrained on a self-constructed large-scale high-quality dataset of over 2.2 trillion tokens. The dataset includes plain texts and codes, and it covers a wide range of domains, including general domain data and professional domain data.* Strong performance. In comparison with the models of the similar model size, outperforms the competitors on a series of benchmark datasets, which evaluates natural language understanding, mathematics, coding, etc.* Better support of languages. New tokenizer, based on a large vocabulary of over 150K tokens, is a more efficient one compared with other tokenizers. It is friendly to many languages, and it is helpful for users to further finetune Qwen-7B for the extension of understanding a certain language.* Support of 8K Context Length. Both Qwen-7B and Qwen-7B-Chat support the context length of 8K, which allows inputs with long contexts.* Support of Plugins. Qwen-7B-Chat is trained with plugin-related alignment data, and thus it is capable of using tools, including APIs, models, databases, etc., and it is capable of playing as an agent.This is an impressive jump in open source capabilities, less than a month after LLaMa 2 release! GTE-large a new embedding model outperforms OPENAI ada-002If you’ve used any “chat with your documents” app or built one, or have used a vector database, chances are, you’ve used openAI ada-002, it’s the most common embedding model (that turns text into embeddings for vector similarity search) This model is ousted by an OpenSource (nee. free) one called GTE-large with improvements on top of ada across most parameters! OpenOrca 2 preview Our friends from AlignmentLab including Teknium and LDJ have discussed the release of OpenOrca 2! If you’re interested in the type of finetuning things these guys do, we had a special interview w/ NousResearch on the pod a few weeks ago OpenOrca tops the charts for the best performing 13B model 👏Hyper-write releases a personal assistantYou know how much we love agents in ThursdAI, and we’re waiting for this field to materialize and I personally am waiting for an agent to summarize the whole links and screenshots for this summary, and… we’re not there yet! But we’re coming close, and our friends from HyperWrite have released their browser controlling agent on ThursdAI. Talk about a full day of releases! I absolutely love the marketing trick they used where one of the examples of how it works, is “upvote us on producthunt” and it actually did work for me, and found out that I already upvotedSuperconductor continuesI was absolutely worried that I won’t make it to this thursdAI or won’t know what to talk about because, well, I’ve become a sort of host and information hub and a interviewer of folks about LK-99. Many people around the world seem interested in it’s properties, replication attempts and to understand this new and exciting thing. We talked about this briefly, but if interests you (and I think it absolutely should) please listen to the below recording. ThursdAI - See ya next week, don’t forget to subscribe and if you are already subscribed, and get value, upgrading will help me buy the proper equipment to make this a professional endeavor and pay for the AI tools! 🫡 This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Jul 30, 2023 • 50min

🧪 LK99 - The superconductor that can change the world, and the K-drama behind it!

This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsFirst of all, let me address this from the get go, I’m not a material scientist! I am pretty good at finding information in twitter’s incredibly noisy info stream. (hey, this is how I bring you AI updates every ThursdAI) Since LK-99 is potentially groundbreaking and revolutionary, I’ve compiled a twitter list of everyone who I found credible, interested and a source of new information, and there’s now over 1.5K followers to this list alone!Since this clearly is interesting to a lot of you, I reached out to a few prominent people on this list, and asked them to join a twitter space, to try and stitch together an update on the current state of LK-99, replication attempts, history and lore, as it stands a week after the original papers release. If you found this interesting, you’re the type of person who wants to stay up to date, feel free to subscribe and keep this Substack alive!First of all, let’s do some level setting. Superconductors are real, we’ve used them in MRI machines for example, but the currently available superconductors need extremely low temperature and high pressure to well.., and the promise of a room temperature and ambient pressure superconductor is the holy grail of energy use. For a breakdown on what superconductors are, and what they can mean for the world, I strongly recommend this thread from Andrew Cote (published presciently a full two weeks before the LK-99 paper) or watch this incredible breakdown: July 22nd, the LK-99 arXiv day! On July 22nd, two papers describing “worlds first room temperature superconductor” were uploaded to arXiv: 2307.12008 - Sukbae Lee, Ji-Hoon Kim, Young-Wan Kwon (submitted by Kwon)and after 2 hours and 20 minutes another paper was uploaded2307.12037 - Sukbae Lee, Jihoon Kim, Hyun-Tak Kim, Sungyeon Im, SooMin An, Keun Ho Auh (Submitted by Hyuntak Kim)You may notice that the first two authors on both papers are Sukbae Lee and Ji-Hoon Kim, and in fact LK stands for Lee and Kim and 99 in the LK-99 name stands for the year 1999 they have started research on this.You may also notice that YW Kwon who submitted the first paper, is not included on the second one, and in fact, is no longer part of the Quantum Energy Research Institute (Aka QCentre) where he was a CTO (he’s no longer listed on the site) If this shakes out, and SC is replicated, there’s definitely going to be a Netflix series on the events that led to YW Kwon to release the paper, after he was no longer affiliated with QCentre, with limited information so let’s try to connect the dots (a LOT of this connecting happened on the ground by Seo Sanghyeon and his friends, and translated by me. Their original coverage has a LOT of details and is available in Korean hereLet’s go back to the 90sOn the LinkedIn page of Ji-Hoon Kim (the page turned blank shortly before me writing this), JH Kim showed that he started working on this back in 1999, and they estimated they have a material that contained “very small amount of superconductivity” and together with Sukbae Lee, in 2018 they have established QCentre to complete the work of their Professor Emeritus of Chemistry at Korea University, the late Choi Dong-Sik (1943-2017) who apparently first proposed the LK-99 material (following the 1986 bonanza of the discovery of high temperature superconductors by IBM researchers).Fast forward to 2017, a wish expressed in a last will and testament starts everything again Professor Choi passed away, and in this will requested follow-up research on ISB theory and LK-99 and Quantum Energy Research Institute is now established by Lee and Kim (LK) and they continue their work on this material. In 2018, there’s a potential breakthrough, that could have been an accident that led to the discovery of the process behind LK-99? Here’s a snippet of Seo Sanghyeon explaining this:Kwon Young-Wan the ex-CTOKwon is a Research Professor at Korea University & KIST, is the third author on the first arXiv paper, and the submitter, was previously the CTO, but at the time of the paper to arXiv he was not affiliated with QCentre for “some months” according to an interview with Lee. He uploads a paper, names only 3 authors (Lee, Kim and Himself) and then surprisingly presents LK-99 research at the MML2023 international conference held in Seoul a few days later, we haven’t yet found a video recording, however a few reports mention him asking for an interpreter, and talking about bringing samples without demonstration and proper equipment.Important to note, that Enter Hyun-Tak KimH.T Kim is probably the most cited and well-known professor in academia among the folks involved. See his google scholar profile, with a D-index of 43 and has 261 publications and 11,263 citations. He’s a heavy hitter, and is the submitter and listed as the author of paper number 2 submitted to arXiv, 2 hours and 20 minutes after paper number 1 above. In the second paper, he’s listed as the third author (and the submitter to arXiv) and his contribution is acknowledged like so: An author, Hyun-Tak Kim (H. T. Kim),’s knowledge on mechanisms of both superconductivity and the metal-insulator (gap-nogap) transition highly contributed to writing the mechanism part. The knowledge was acquired over 20 years by processes of performing national projects including project [Grant 2017-0-00830] funded by Institute for Information and Communications Technology Promotion (IITP) in MSIT of Korea government in ETRI. H. T. Kim left ETRI on Nov. of 2022.In the first paper H.T. is not acknowledged, and is only mentioned in in reference no. 52 to his paper from 2021. Ok enough about the people Alex! Does the rock levitate? In January, QCentre youtube channel uploaded an unlisted video that showed magnetic properties of LK-99 and another video, with partial levitation is widely shared on social media.The partial levitation shown is attributed to the Meissner Effect and is a supposed proof of room temperature super conductivity. However, these two videos are inconclusive and are not enough for us to take QCentre claims at face value. The scientific community, having been stung by a recent incident surrounding a supposed room temp superconductor, where the evidence was apparently falsified (Dais et. al.) are not so easily swayed. Adding to that, the mess around the multiple papers, showing different theories, the lack of peer review, or independent replication, the surprised publication, and a rushed follow up publication, all makes people wonder, what is going on here? This doesn’t seem like a fabricated attempt. Summary of replication attempts so far (Sun, Jul 20) Given the importance of this discovery, and the “relative” triviality of replication, common enough materials, the process is not extremely complex (but kids, do not try this at home) so we can bet that “furnaces in solid-state materials labs around the world have been cooking yesterday and today to try to reproduce” [Science Magazine]We have reports from China that supplies of Led Apatite are running dry as many are trying to replicate quietly? Additional reports from India where Dr. VPS. Awana, the Chief scientist at CSIR-NPL and team are trying to replicate, with results expected as early as tomorrow (Monday, Jul 31) and has been emailing with LeeIn addition to this, we’ve had Andrew McCalip from Varda space who has been live-tweetin, twitch streamin his “Meissner effect or bust” campaign to reproduce LK-99, while the world watches (Andrew joined the space as well) and provides ideas, materials and an outpour of support for this gung-ho, almost cowboy effort. We’ve also had folks from MIT who claimed that professors who want to remain anonymous, and went to MML2023 are also in contact with the team and are trying to test the material.Replication failure is … not a failure? Discussing the replication attempts with experts on stage, we all concluded that there are likely 2 ways for the world to know wether LK-99 is a superconductor. * Replication succeeds and scientists analyze the replicated sample* QCentre team provides a sample, and some very smart independent folks put it under a microscope, a magnetism analysis and a bunch of other measurements and confirm that it’s a superconductor at room temperature.While we wait for either of those, I encourage you to check out the resources, the space recording, and the list of folks I’ve collected to stay in the loop! Here’s a list of relevant links: * Paper 1 DOI* Paper 2 Arxiv* Paper 3 Arxiv* New Scientist Interview* ChosunBiz Interview (Korean)* Yonhap Interview (Korean)* Twitter ListAnd the list of folks who participated in the space, give them a follow: * Alex Volkov (@altryne)* Seo Sanghyeon (@sanxiyn)* Ate-a-Pi (@8teAPi)* Andrew McCalip (@andrewmccalip)* Andrew Cote (@Andercot)* Ely Rabani (@radsci)* Robotbeat (@Robotbeat)* Marsh Ray (@marshray)* Ben (@BenShindel)* Ken Condon (@KenCondon1)* Jesus (@jesuslares_me)* Danielle Fong (@DanielleFong)For your convenience, attached is an AI transcription of the space with speakers and timestamps (may be off by a few minutes) : [00:02:40] Alex Volkov (@altryne): Hello. Hello, everyone. There's a lot of you here, and I wanna welcome a shoot for up on stage while we wait for a few more guests, and then we can get started. Thank you so much for taking the time joining us. as you're as interested as all of us in this very exciting, very confusing, very potentially groundbreaking news. So I wanna introduce 2 folks up on stage 2 folks up on stage already, and bringing up another one just now. And hey, Andrew. Hey.[00:03:18] Alex Volkov (@altryne): Hey, How are you guys?[00:03:23] Ben (@BenShindel):Doing well. How are you?[00:03:27] Alex Volkov (@altryne): A little bit you know, the palms are a little bit sweaty. This is a insane turnout. Twitter is indeed a public space on because that we have. And, hopefully, spaces or two spaces, whatever they call it now, will hold. And I only invite Sam here to speak as well. Hey, Tobias. How are you?[00:03:51] Ate-a-Pi (@8teAPi):I'm good. I'm good. So good to good to, you know, hear from you guys in person, Alex. Thanks for putting the space together.[00:04:00] Alex Volkov (@altryne): Thirdly. Andrew, we're gonna introduce Andrew, but many folks who are here already follow you and and follow your work. How how's your evening going, Andrew?[00:04:12] Andrew McCalip (@andrewmccalip):Lee, this has been a wild ride. Thanks for putting all this together. It's gonna be great to get all the information in one place for the first time. This is my first time experiencing the full volume of the Internet, and just been a a lot of fun to see all the positivity around the progress.[00:04:29] Alex Volkov (@altryne): That's great. So I'll do my best that, you know, Mother think this. I will maybe preface this that I am not a scientist. Many of the terms that we'll hear today in the space I've heard for the first time a couple of days ago. What I am is a Twitter for many, many years, and I have collected a a list of folks who I I personally wanted to follow to kinda see the updates as they roll out, and we've seen many, many things roll out very quick. with a lot of confusion and different replication attempts from different places. And I just compiled the list for myself. I started following.[00:05:08] Alex Volkov (@altryne): 8 to buy had incredible incredible content diving into the the timeline. I found I I wanna introduce thank you. Am I saying this right? I think you need to hit the the mute button in a mute. If this is your first time talking on RESTASIS. let me know if you're able to do that. And if not, we'll try to solve this. And out as I was collecting folks, And I I started seeing that Andrew started doing their application attempts and even doing Twitch.[00:05:46] Seo Sanghyeon (@sanxiyn):Can you hear me?[00:05:47] Alex Volkov (@altryne): Can you hear me? We can hear you. Hey, Sam Kim. How are you?[00:05:57] Seo Sanghyeon (@sanxiyn):It it it's the noon in South Korea, and I'm fine.[00:06:01] Alex Volkov (@altryne): the afternoon. Right?[00:06:03] Seo Sanghyeon (@sanxiyn):It's 1. Yes. Yes. It's the 1 PM.[00:06:06] Alex Volkov (@altryne): Awesome. And so I was just doing an introduction maybe as you were telling up, you maybe not heard some of it. However, folks in the audience who followed this kind of thread and how we came to be here I have a a thread that I'll post on top here that has all the folks from the Twitter list that I forgot. And San Kyung and his his team is basically the reason for the space. Me and Nathan kind of found Sunqun. Am I saying Sunqun correctly? Is that is that the right way to say this?[00:06:41] Seo Sanghyeon (@sanxiyn):My name is. Your your, yeah, your pronunciation is not actually not.[00:06:48] Alex Volkov (@altryne): Okay. I'll I'll turn my best to put months at the at the right names. And so we both me and 8 to 5, a a 34 in Saint Kyung, who's in Seoul currently, and definitely speaks the language we don't speak, and so there's a lot of insight and translation. And so, yeah, I guess we'll will get started, so feel free to present yourself, and then talk a little bit about your last few days and how you came around getting in this topic. and then how kinda what you found so far.[00:07:28] Seo Sanghyeon (@sanxiyn):I I didn't really expect to to speak.[00:07:30] Alex Volkov (@altryne): That's okay. That's okay.[00:07:32] Seo Sanghyeon (@sanxiyn):That's put me put me on the spot. Yeah.[00:07:34] Alex Volkov (@altryne): I don't wanna put you on the spot, but give us maybe a brief summary.[00:07:44] Ate-a-Pi (@8teAPi):Maybe maybe do you do you want me to help Sanyon?[00:07:47] Seo Sanghyeon (@sanxiyn):Yes, please. Okay. You you have read my right top, so maybe maybe you can explain what's going on.[00:07:57] Ate-a-Pi (@8teAPi):Okay. So I'm I'm just gonna I'm just gonna just to preface everything, I I'm writing a work of fiction. So all of you guys are just participating in an experiment. So but I'm trying to keep everything to kinda, like, factual and trying to interpret what what is kind of happening on the ground. Right? Shyam is much more factual, and he he has actually been doing a primary source work. So he's been actually digging up the actual Korean language science papers. He's been sitting down with friends They've kinda, you know, summarized and kind of tried to understand what's going on.[00:08:36] Ate-a-Pi (@8teAPi):And he's really the one that's, you know, put together this that that the you know, the the the mentor, you know, whose name, I think, in some transliterations comes out to TS's chair, some Donsick He the mentor was basically in superconductors in this idea of this kind of 1 dimensional super and he had this theory.[00:09:00] Seo Sanghyeon (@sanxiyn):That so the name is che. che. Oh, sure. Yeah. Yeah. Yeah. He was a a professor in the Korean University's Department of Chemistry.[00:09:13] Ate-a-Pi (@8teAPi):Yeah. And and so he he had this idea, this theory, and he had graduate students. and one of those graduate students was Lee, and Lee kind of took up the mantle of this this theory. And then they, you know, tied up with who was an experiment list.[00:09:37] Ate-a-Pi (@8teAPi):And then they kinda discovered this trace this coast of a trace of a material in 1990 And at that point, what happens is having discovered this trace, their path kind of diverge this, and Kim, the experimentalist, goes on to do a masters, not in superconductors. So he does his masters in something else, and then he does the battery materials kind of PhD, and he graduates in 2008.[00:10:12] Ate-a-Pi (@8teAPi):while Lee continues on the superconductor path, does experimental any when he publishes his PhD. It's both a theory and synthesis of superconductors. And then he graduates, and then he he goes to work as a science adjunct professor, which we which we just found out. Like, a computer science adjunct professor, and he's there for about, you know, 4, 5 5 years. He doesn't publish. And and I'm guessing at this point, he kinda gets, like, you know, cashier out of out of academia completely, and he sets up a consulting firm, basically, Q Center.[00:10:50] Ate-a-Pi (@8teAPi):And they start taking on consulting work. And and then, again, the timeline is a little bit unclear on whether or not they continue to work on on the on on the product on what they discovered. And what happens then is in 2017, Chey Dongksik passes.[00:11:18] Ate-a-Pi (@8teAPi):And as he passes, he he gets his former students together, and he asked them to finish off what they started to find this superconducting material that they saw a ghost of a trace of in 1999. And he passes, and they have no money. basically. Song Young has done, again, primary source research, and, you know, the the office space is basically, like, like, a two story building, you know, somewhere in the you know, in in Seoul. It's a very modern kind of office. They don't have much money.[00:11:57] Ate-a-Pi (@8teAPi):My guess my guess is that they need Kim. because KIM is the experimentalist, and I'm guessing also that none of the theory works at this point. The only thing that they have to go on is that they actually did find something in 1999. And Kim, I'm guessing, is also quite practical because he didn't do he didn't pursue the superconductors for the PhD. Right? Because he's quite practical, he's like, dude, you get me money. I'll join you. You don't have money. I'm not joining you for your wild goose, Jason. Right?[00:12:36] Ate-a-Pi (@8teAPi):So Lee goes out and he recruits Kwan. And Kwan is kind of like you know, he's he's a US PhD. He has a research university, you know, position. recruit them, and they get funding. And I think I think Sam Young, you were you were saying that Kwon is the one on the, you know, National Science Foundation of Korea's like you know, list, like, grant. Right? I I think that's what you said.[00:13:08] Seo Sanghyeon (@sanxiyn):So the paper mentions the public grant from South Korea. called the National Resource Foundation, which is like National Science Foundation in United States. And Korn is listed as a primary invest mitigate our PI, if then.[00:13:25] Ate-a-Pi (@8teAPi):Right?[00:13:26] Alex Volkov (@altryne): Mhmm.[00:13:27] Ate-a-Pi (@8teAPi):Yeah. Yeah. That's right. Okay. So he he's the PI. So they recruit him as the PI, and Jade Kim, who is, you know, Lee's partner, basically leaves his very comfortable position as a research director in a hearing aid test.[00:13:44] Seo Sanghyeon (@sanxiyn):Yeah.[00:13:44] Alex Volkov (@altryne): Yeah. Yes.[00:13:45] Seo Sanghyeon (@sanxiyn):Yes. Yeah. Hearing aid Yeah. I Or the eye test there? Yeah. Yeah. For the ISER tech and in manufacture, the battery is specialized for the hearing aid. code. It is a medical device. They have a different standard from other batteries. And company a small business in South Korea, but seems competitive worldwide.[00:14:13] Alex Volkov (@altryne): So he leaves his let me let me -- Yeah. Go ahead. Just real quick and to give folks a quick summary. The main paper that we saw the explosion from that was published on July 22nd, so a week and and almost a day we're, like, almost 8 days into this. The three people that you you just said, besides the first professor, Choi or chair or Troy and or several places write it separately. So the the three people, SoftBank, Jihoon Kim, which is the LK in LK 99, right, Lee and Kim. And the third person you just mentioned is Young Wan, Kwan. Yes.[00:14:52] Alex Volkov (@altryne): Those are the the 3 authors on the paper that kind of was published on our side out of the blue. 8 days ago. Please continue.[00:15:03] Ate-a-Pi (@8teAPi):Right. And then so at this at this point, they're in 2017, And, you know, Lee goes out and does the fundraising. He recruits Kwan, who's the research professor, Kwon is basically he's on the paper. He he's he's the principal investigator on the grant, but he's still a professor at university. So he's basically, I'm guessing, like, a day a day in the, you know, in the office at Q Center, very modest place. I think the grand size is pretty small, and they get this ESR machine.[00:15:41] Ate-a-Pi (@8teAPi):And again, from what I can tell, the ESR machine only came knows how to use it. Because none of the other people are actually synthetic, you know, synthesis people. They're all like theory guys, Kuan is a physicist. And Kim himself, JH Kim himself, he's looking for something which you have to know what you're looking for, right? Because that's what he says in his LinkedIn. He's like, I'm looking for some if you don't know what you're looking for, then forget about it. Right?[00:16:19] Ate-a-Pi (@8teAPi):But he he knows what he's looking for, and they refine, they refine, and they refine, and he keeps doing experiments. He keeps refining the experiment, and he goes through, like, a 1000 iterations. And somehow, starting in 2018, somehow, By the middle of 2018, they find it. So that that's a surprising thing for me because they've I I I suspect they they've been working on it you know, before or, you know, Jay and Lee had a breakthrough on their theory, so they knew how to narrow the workspace down. But somehow in at the end of the day, Kim is the one grinding.[00:16:58] Ate-a-Pi (@8teAPi):Through that 1000 experiments, finally, to get, you know, a sample that works.[00:17:03] Seo Sanghyeon (@sanxiyn):And then they start by -- No. No.[00:17:05] Alex Volkov (@altryne): No.[00:17:05] Ate-a-Pi (@8teAPi):No.[00:17:05] Alex Volkov (@altryne): No.[00:17:05] Seo Sanghyeon (@sanxiyn):No. No. No. No. No. No? So so besides the two papers, there is a paper published in April returning query. And In their own words, they describe what what prompted their breakthrough in 2018.[00:17:27] Seo Sanghyeon (@sanxiyn):and it said that so so they are putting the material in a quartz tube And because they called it to best courts to cancel and Brooke, And the material left after the breaking of the glass was had the property they wanted. So so it was an accidental discovery.[00:18:02] Ate-a-Pi (@8teAPi):So can can you repeat that? Like, they what what happened? They put it in the quartz tube, and the quartz tube broke accidentally?[00:18:10] Seo Sanghyeon (@sanxiyn):Yes.[00:18:10] Alex Volkov (@altryne): Yes. Yes.[00:18:11] Seo Sanghyeon (@sanxiyn):I see. And and And that what's the breakthrough in 2018? I see. It's what I'm saying.[00:18:19] Alex Volkov (@altryne): Yeah. I just wanna confirm what I hear. The breaking of the course you led to the incidental discovery. This is this is the the breakthrough as it's written in the first paper in Korea? Yes. Yes. Okay. So I'll just call ASAP, I'll just give it back for some logistics. Folks, if you look up on on top of the space, there's a few tweets we're pinning. And as we go along, we're gonna add some information on top of this. The 3rd the third we pin from dystopian breaker has a link to the original kind of Korean paper. So please go ahead, Datapai.[00:18:54] Seo Sanghyeon (@sanxiyn):So so quick -- Okay. point.[00:18:56] Alex Volkov (@altryne): Yeah.[00:18:56] Ely Rabani (@radsci):Go ahead. Go ahead. This this could be important because, you know, as as soon as you expose it to the atmosphere, your getting hydration. And hydration, you know, might be harmful, might be helpful. From this, like, little account, it seems like it it it either didn't do anything or was helpful. But, like, no what temperature it was at when it broke, and and things like that could could actually be really pertinent.[00:19:30] Ate-a-Pi (@8teAPi):Yeah. So, absolutely, like so it's not they he does do the 1000 experiments, but the 1000 experiments, whether that gets him there or not, at one point in the experiment, the quartz tube breaks, that gets them there. They get lucky. Right? So they get they get lucky. And then after that, things proceed pretty quick They isolate they isolate it, and then they they get the crystallization. They start working on the papers. They start on the patents, and they start also trying to figure out the chemical vapor deposition process. They seem to have made some way some headway on the chemical vapor deposition process.[00:20:06] Ate-a-Pi (@8teAPi):And then, you know, sometime around September 2021, something start happening. Quant takes a position, sabbatical at, I think, Seoul University at that point. I'm not sure whether that means he's putting more time in the office or not. And then that fast forwards to yeah. Go go ahead, Sunggham.[00:20:33] Seo Sanghyeon (@sanxiyn):No. No.[00:20:33] Alex Volkov (@altryne): No.[00:20:33] Ate-a-Pi (@8teAPi):You go ahead. Okay. So that fast forward about March 2023 when basically the international patent has been filed. And Kuan leaves the team at this time. I'm not sure when Kim comes on board. That's not very to me at what point Yum Tuck comes on board.[00:20:57] Ate-a-Pi (@8teAPi):So I'm guessing it's after the nature, the nature paper gets dinged in 2020, And and and, you know, the the other thing that strikes me also is that every single person on the team is very aware of every single hoax in superconductors to date. Right? They they they all know the space well, They've seen every single hoax before. They know they know what the hoaxes look like. They know what to look for. They know what diamagmatism is. So I I I don't think yeah.[00:21:29] Seo Sanghyeon (@sanxiyn):Go ahead. So the date is So the day before the yesterday, Andrew McCully posted on his Twitter the translation of the Korean paper at Doctor Lloyd. Is that correct? And can can you so so how did you translate and can Can you say something about it?[00:21:59] Alex Volkov (@altryne): Andrew, I think he's Frank to you. So I can just ring to you. You posted a translated paper also. Right?[00:22:08] Andrew McCalip (@andrewmccalip):Yes. Now that was just a machine translation from Google. That was just a very cursory translation.[00:22:19] Seo Sanghyeon (@sanxiyn):Okay.[00:22:19] Ate-a-Pi (@8teAPi):So in basically, quantity is team in March, and then you have the kind of papers being released, you know, haphazardly. The next the next point that of them is that they had started releasing the papers app as early, like, late last week.[00:22:42] Alex Volkov (@altryne): And and then and then we have -- And by the way, I think it's it's important to highlight by Kwan, the guy who's no longer affiliated with with QCenter. Like, this this sole endeavor a business venture that's funded for for this for this purpose. Kwan is no longer affiliated with that. We've seen Sankyo posted an interview in Korea from Friday where I think both of the and Kim say that Kwan, the guy who published the first paper, is no longer affiliated.[00:23:12] Alex Volkov (@altryne): there were some speculation as to maybe the limit of three people on the paper is the limit of the Nobel Prize or 2 or 3 authors. I don't have this confirmed, but this is speculation going around. And it's important to note like, both of them say that the paper was not ready when it was released, and it was released by Juan, the guy who left the first paper. 2 hours later, 2 than 20 minutes later, another paper gets released in the in the same archive with, I wouldn't say, 5 authors. not including Kwan. Right?[00:23:48] Ate-a-Pi (@8teAPi):So Lee -- Yeah. And -- The user the the user name is TumTuk team, the the college professor from, you know, Virginia is the username who who pushes the r archive paper at that Yeah.[00:24:04] Seo Sanghyeon (@sanxiyn):Chantakim is a big name with the 18 days of 45, and If you look at the paper, there is an error message in Korean saying that Bloomberg could not be found. It is a neutral error message when you did the some of the typesetting wrong.[00:24:27] Seo Sanghyeon (@sanxiyn):And You just don't probably see the room temperature, sugar conductor paper with the error deaths that had to bookmark cannot be found if you are following if you are in not in emergency.[00:24:52] Alex Volkov (@altryne): So so it does feel to us at least from the summary so far that the paper that Quang released has different information than than the second paper, and the second paper feels like it was released in the Harry and included more people that currently work at Q Center, including Hyundai Kim. And Sonja, owner, you this question. You mentioned his h h score or something score. Can can you explain the importance of that score for him talking?[00:25:20] Seo Sanghyeon (@sanxiyn):creates someone else to the explanation.[00:25:24] Ate-a-Pi (@8teAPi):Okay. So so the h score is, you know, because we have a web web savvy audience here. It's kind of like a page rank for, you know, researchers. It shows you how influential how influential the researcher was, and so a higher score means that more people have been citing your paper.[00:25:45] Ben (@BenShindel):Go ahead, Ben. Yeah. More precisely. So, like, an h index of, say, 40 means you have 40 papers that each have 40 citations or more. That's a little tricky to understand. So, like, if I get another paper that has only 30 citations, it won't affect my h index at all. I have to get a 41st paper that has 41 citations to to to make it rise.[00:26:07] Alex Volkov (@altryne): So I think it's it's safe to say HUNTAKIM, the guy who submitted the second paper, potentially haphazardly. Correct? Like, we're we're we're saying there's 2 hours after the first one. So likely prompted by these events is a well well sighted very well sighted scientist with a very high kind of confidence score. It's not like a random person of the street that decide that there's now a superconductor of room temperature and, you know, verified it.[00:26:41] Seo Sanghyeon (@sanxiyn):Okay. Sorry for being side tracked, but I just checked the the motion related to Korean paper or not to talk through it by Andrew. And on the page 5, we clearly said that the quartz tube was destroyed due to internal pressure during rapid cooling of reaction and etcetera. So I think, in fact, nobody really read ready carefully. It is it is just there about the quartz tube once destroyed.[00:27:19] Ate-a-Pi (@8teAPi):Yeah. So I think I think it's yeah. Definitely, like, probably the the rest of us are are are not very close readers. of of that paper.[00:27:29] Seo Sanghyeon (@sanxiyn):So so We can we can continue on after the upload to the archive.[00:27:42] Ate-a-Pi (@8teAPi):Indeed. So okay. So they they they it goes into our our archive, and then all of the events of the last week happen you know, I don't think any of us expected any of the events to happen. So we've all just been kind of, like, following along and seeing what happens next. I had no idea that there was a metallics conference in South Korea, and I I definitely had, like, no idea that you know, one of the authors would show up there, and it gets posted on Twitter. And so and then and then Seung Young points it out on the FM Korea Football message board.[00:28:20] Ate-a-Pi (@8teAPi):And so we translate, you know, what the audience reaction was in in in a bad translation to get -- So -- -- whatever message was across.[00:28:30] Alex Volkov (@altryne): -- mind let me interject here because this is around the that I found out about this. Alex, frozen coffee. Alex, I forgot his nickname. We invited him here. He posted a a very long Twitter thread that got the attention of the algorithm and then boosted of this room template ambin pressure, superconductor paper from Korea. I think he only started talking about the first paper, and then after the second paper also came out. And I think at this point, or somewhere around there. Andrew, you found out about this. What what did you first hear about, you know, Twitter drama around LK 90 Right?[00:29:08] Alex Volkov (@altryne): And, Andrew, feel free to at least produce you know, introduce yourself officially and BARDA and how you're interacting with this.[00:29:16] Andrew McCalip (@andrewmccalip):Yeah. So I was just cruising the Internet at night, and this came across. I think my my Twitter feed And so I I'm incredibly curious. This is something that has been a bit of a a hobby for me. And so I was always interested in superconductors, so it it caught my attention. I'm a mechanical engineer. So full disclosure. I am not a subject matter expert. I am simply an aerospace engineer that has a lot of curiosity and some assets at his disposal.[00:29:50] Andrew McCalip (@andrewmccalip):And so reading this paper, it it struck me just the simplicity of of the process. And so I realized that I probably had the ability to replicate with full fidelity, the process that was described in the paper. And so that within about 30 minutes, I I realized I should simply start down this road that Twitter was already picking up at the time.[00:30:21] Andrew McCalip (@andrewmccalip):There's some conversations going back and forth and the it was the classic scenario where on every superconductor discussion, there is the same conversation that happens over and over again. And this synthesis appeared so simple that it seemed that the most expedient thing was to simply test it physically. And so my my work is very receptive of of after hours projects. I'm I'm known as the the guy that has really aggressive hobbies, let's say.[00:30:57] Andrew McCalip (@andrewmccalip):And so I'm always in the back doing something interesting with materials or automation. So within 30 minutes of reading the paper, I had ticked off orders to various chemical suppliers. I've reached out to overseas vendors. to try to procure a couple of the the elements. And so it was just kind of an offhand comment that I made on Twitter and and then the ball really started rolling, and I realized that everyone wanted to see this this made.[00:31:32] Andrew McCalip (@andrewmccalip):And so it was just supposed to be a a a fun little project, but I was really overwhelmed by the the response. Everyone wanted to to see this done. I think there's this incredible curiosity, there's this incredible drive. People wanna see, like, incredible things happen for the the the human race. And so something if this magnitude pops up, everyone's motivated to drop everything and investigate. And I think that's where we're at.[00:32:08] Alex Volkov (@altryne): And I think you met the algorithm at the right place where folks were excited about the future and think this could bring a lot of changes around the future, and you started saying, hey. You know? Here's a here's a direct approach. Let's try to replicate this. And I I wanna just highlight the fact the the materials involved in creating this. And the process, some folks say and please talk about this. Some folks say that has been an attempt at a hoax, it wouldn't be as simple. They wouldn't have released a simple instruction manual kind of quote, unquote simple that many labs around the work they replicate given the materials and and the right equipment. Right?[00:32:48] Ely Rabani (@radsci):So -- Yeah.[00:32:48] Alex Volkov (@altryne): So -- -- straightforwardness of this potentially shows some stuff.[00:32:51] Ely Rabani (@radsci):So this this is a good time for for a PSA. I mean, I know that that Andrew is well aware of this, and and and many of peep of the people who've been following it. But in case anybody who's listening isn't. The these compounds in vapor form at any rate are are highly talked music, and you you have to know lab safety. If you're gonna start trying to experiment with them, you need things like, a glove box and, you know, all kinds of PPE, a fume hood, everything else. Taking risks with this kind of thing is just really not worth it.[00:33:31] Alex Volkov (@altryne): I I I can't stress that. Absolutely. Don't try this at home.[00:33:36] Andrew McCalip (@andrewmccalip):kids definitely. Yeah. Absolutely. There's a lot of chatter in the beginning in the first couple hours about this can be replicated in a garage And, you know, I thought it was interesting. I thought maybe we've got the opportunity to to do it safely. we've got all the right equipment. We've got, you know, the the 1,000,000 of dollars of equipment that support our spacecraft business. that allow us to do some of these things safely. And so I thought Twitter wants to live vicariously through somebody why not do this?[00:34:12] Andrew McCalip (@andrewmccalip):I ended up being in sort of an interesting middle ground because I'm not in academia. I'm also not trying to commercialize any part of this tech. really just doing it for fun because it's incredibly interesting. So I've got no skin in the game except for making this work in a transparent manner. and then getting the materials into the hands of the experts.[00:34:34] Andrew McCalip (@andrewmccalip):So I thought if we can leverage some of our equipment and some of our, you know, very smart people that we have, to speed this timeline up, I didn't see anybody in the United States being vocal about trying to do replication there are so many stories coming out of other parts of the world that all the labs, there must be thousands of furnaces burning right now trying to replicate this. But I wanted to get material into the hands of some local experts in California.[00:35:09] Andrew McCalip (@andrewmccalip):And so that's really our our goal is, you know, can we can we sort of be the face of of the Internet do this experiment in a safe manner and then help advance the science and be sort of a forcing function to to doing this replication.[00:35:27] Alex Volkov (@altryne): So, Andrew, just before just a a small pause before you continue, I want to ask the other, Andrew, here. The Andrew code, if if you're able to unmute and and and talk us if you're available about the potential reasons why all of Twitter jumped on this. Andrew Kot, you had a thread on room temperature superconductors. About 2 weeks before this, like, almost a permanent is kind of a threat. And could you give us some summary first of all, feel free to introduce yourself, but also some summary of what this means if this replicates, what this means for the world.[00:36:07] Alex Volkov (@altryne): Applications, you know, give us, like, some excitement of what happens if this is an actual ambient pressure in room temperature superconductor? Andrew? Does not look like Andrew is Oh, hey.[00:36:33] Andrew Cote (@Andercot):Sorry. My my audio cut out for a second. I I missed the prompt. Oh, here you are. Let you only -- Sure. Yeah. Thanks. Thanks very much.[00:36:44] Alex Volkov (@altryne): So so folks so so I I explained to folks your thread about MBN, you know, pressure room temperature superconductors that you've offered, what, 2 weeks before the paper came out. And then suddenly, this dropped. And I wanted you to highlight some of the potential applications of superconductors and give us some of the highlights of what happens in this replicating. This is an actual, you know, real thing.[00:37:08] Andrew Cote (@Andercot):Yeah. Sure. So it's kind of a funny thing. Yeah. I put that thread out there 7 weeks before this story broke. You know, just I have worked with this kind of stuff in in a few different areas now, so it's very, you know, superconducting radio frequency cavities are standard technology in accelerator physics to fill these to work in.[00:37:31] Andrew Cote (@Andercot):Like, my first job in physics was actually in a condensed matter lab using a a scanning tunneling microscope to look at, you know, electronic structures of potential high temperature superconductors So this has always been sort of like a holy grail of material science, like sort of a holy grail of applied physics. It's one of these properties it's one of these materials where the bulk properties come from its quantum mechanical behavior. And and, you know, when quantum mechanics and its effects escape the realm of the very tiny, it can really manifest as as magical phenomenon at our scale in the world of the kind of the bulk matter or the big stuff.[00:38:10] Andrew Cote (@Andercot):So, you know, superconductors are used currently today, You know, it's it's they've reached engineering applicability through decades of continuous refinements and improvements. And and some of the biggest things to think about in what lets these things get used in industrial applications is their ability to superconducts at higher and higher temperatures And, also most also importantly, is to operate at higher and higher background magnetic field strengths. And so the way to think about this is that a superconductor, it's allowing current to move through it with zero resistance, but it also perfectly spells magnetic fields.[00:38:48] Andrew Cote (@Andercot):And there's an operating point of these materials where it's basically the current density and the temperature and the magnetic field kind of put the bounds or the performance envelope on the material. So some conductors can carry tons of current, but they can't exist in a very high field. And so, you know, those are hard to make as useful. You can use them for carrying, like, electricity, which is awesome, but often what you really wanna do is generate very strong magnetic fields. So I think maybe the most familiar to the most people here would be, like an MRI machine. Right?[00:39:27] Andrew Cote (@Andercot):Magnetic resonance imaging. So the idea there is you're generating very high strength field, and magnetic fields are measured in Tesla, for example. So just for just for context, you know, 3 Tesla is a is a pretty strong field, and that's what is about the strength using an MRI. So, you know, MRIs use these cryogenically cooled magnets, or or they're not I don't think cryogenically cooled. They're actually often just copper, but they do have cooling. But they generate this high strength field, and then, you know, it kind of sets all these little protons in your body spinning and dancing in a little, you know, kind of radiating energy.[00:40:03] Andrew Cote (@Andercot):And then you have a pickup coil, which is like an antenna, and the antenna is trying to pick up that energy and kinda reconstruct what's going on in your body. And this is how we can get, like, a really high detailed, high fidelity, three-dimensional image of what's going on inside someone without any invasive surgery. So it's, like, you know, MRIs are a real kind of amazing breakthrough in medical imaging. Superconductors if they could work without cryogenics would really simplify and make cheaper and more available, high resolution, high fidelity, three d images of people's bodies.[00:40:35] Andrew Cote (@Andercot):not just for making the magnetic fields, but also for picking up the signal emitted by the protons that get put into motion by the field in the first place. So it's kind of, like, one sort of off the shelf example. I think another one that's kind of under the radar, we don't think about it's not just in carrying electricity without resistance, which is useful for long range, like energy transmission, that kind of stuff. But if you look at the national grid, I mean, only 5, 7 percent of energy total, which is still significant, but it's, you know, single digit percentage ends up, you know, burning as weight You're suddenly muffled.[00:41:11] Alex Volkov (@altryne): I don't think yeah. You're suddenly a voice like your -- Oh, better.[00:41:18] Andrew Cote (@Andercot):Now it's better. Okay. Sorry about that. Yeah. So just gonna say so, you know, National Grid Scale Energy Production. Right? So trans transmitting the energy to its endpoint consumption, there's a bit of waste heat along the way. But what's what's also important to think about is how that energy is produced. It's produced also using high strength magnetic fields. And I was looking into this. There's a a experiment where these guys used sort of more modern high temperature superconducting tape to, you know, retrofit a large DC generator then it had, like, a 36 percent power improvement, right, which is pretty substantial. That's that's a that's a serious win.[00:41:58] Andrew Cote (@Andercot):Yeah. So there's there's, you know, sort of thousands of places this stuff could be used that would really just, like you know, it would either greatly improve the performance efficiency, reduce the cost, increase the accessibility of what we think of as, like, high technology like MRIs or particle accelerators. But it would also just decrease the cost of basic things like electricity generation and distribution And that's just the beginning. Right? So, you know, this kind of stuff there's a really good analogy here actually with the transistor, you know, for for years, scientists, then electrical engineers and physicists, they had this idea of a transistor. Right?[00:42:35] Andrew Cote (@Andercot):If only we could have some kind of simple, reliable, current model supplier. We could design all these wonderful things. We could design all these different kinds of logic functions and so forth. And so there was this search for the transistor people were searching for something that could do that, and they had anticipated all the places it could be used ahead of time. And it wasn't until at Bell labs, you know, a very kind of funny crossover here. One of the guys that's on the patent for the transistor is John Bardine. and John Bardeen's actually the only guy to win 2 Nobel Prizes. 1 was for the transistor. The other was for the theory of superconductivity, right, which is Barting Cooper Schiffer Theory, BCS.[00:43:14] Andrew Cote (@Andercot):So, again, it's one of it's one of those things where, you know, physicists, scientists, engineers kinda thought about this for a long time, realize this be amazing. And there's been this, you know, really complicated random walk through the configuration space of possible materials, right, which is so high dimensional. There's so many things you can construct. So I think it's I'm very optimistic about the field in general. I think one thing to think about with this particular result there's so much artisanal craft and and mastery that goes into producing these materials in a reliable, consistent way You know, science people don't often recognize. It's a lot of art involved too. Right?[00:43:52] Andrew Cote (@Andercot):Like like, things that are reduced to expert practice us and know how. And so I'd I'd just be cautious on, you know, jumping to conclusions either on this particular result, if it's if it's valid right now. But, also, if some labs can't fail to reproduce it, it doesn't actually rule it out entirely. I I think there's scientists that have traveled to Korea to work with the original authors. I look closely at that. You know, I'd also you know, I my internal odds are kind of like a 1 in 6 chance, this pans out, and it and it could be big.[00:44:21] Andrew Cote (@Andercot):But that doesn't mean that it's the end of the search or the end of the race, and I'm and I'm also optimistic that Getting people to understand what the massive long term and large scale social benefits of this kind of discovery could be could help direct a lot more basic science research towards this field. You know, I think we spend a lot of things on, like, how to make smartphone cameras better and not a lot of things on and not as much as we could spend on things like high temperature superconductors. And this is a final example.[00:44:48] Andrew Cote (@Andercot):I mean, so right now, you know, I work as a accelerator engineer, accelerator is a type of magnetic confinement fusion reactor The reason the company I work for can't exist, and and the reason there is this current burn and boom in nuclear fusion, is because we've engineered these high temperature superconductors to work in higher and higher magnetic fields, at at higher and higher temperatures. And and the big economic breakthrough there came when we can have these superconductors that can work at liquid nitrogen temperatures, right, which is 77 kelvin. And it's a lot cheaper to make liquid nitrogen and run that kind of cryogenics than it like liquid helium at, like, 4 Kelvin.[00:45:24] Andrew Cote (@Andercot):So, you know, we're already reaping some of the benefits of this sort of tech stack maturing over time. And I think really just getting started in terms of, like, the hunt for promising materials. I mean, I'm hoping this results in positive publicity and more effort, more energy, put into the field. I think if this doesn't pan out as the thing, you know, don't give up hope. Right? I mean, this is a long term game. Science sees by starts and stops. There's no fundamental physics here that's impossible. Right? There's no physical principle that says this can't work. Right? This isn't like a a momentumless or massless propulsion drive like the EM drive.[00:46:04] Andrew Cote (@Andercot):isn't, like, superluminal neutrinos. Right? Those things kind of break laws of physics. This is very much in the realm of, yeah, physically possible. seems seems very you know, in my mind, seems likely there could be something out there given the complexity of state space of electronic structures and given how you know, how large that space of exploration can be. And, yeah, so I think I'm just kind of you know, this is a great time to be interested in material science to appreciate basic science research and educating ourselves on on how good the future can be. You know, I think there's a lot of narratives right now in society and cultural in general. that kinda say, like, you know, you know, we we can't solve our way out of our biggest problems today. Right?[00:46:43] Andrew Cote (@Andercot):And and I'm very much on the other side of that debate. I think we can. I think it's through efforts like this. I think it's through people like Andrew at Varda that are willing to do stuff in their backyard or their garage or their fact or their their work workplace on their extra time. You know? I mean, this is the kind of this is the the let's build mentality. Right? And so I think we can build our way out of the world's greatest problems, and I its fundamental scientific advances like this discovery could be that that kind of paved the way out of there too. So, yeah, overall, very optimistic.[00:47:11] Andrew McCalip (@andrewmccalip):Andrew? That that's incredibly well said. That is an incredibly well balanced viewpoint. So how would you advise people to absorb the the next week of the new cycle? I mean, we're very much on a you know, we're we're back dead. We're back type of hype cycle. So how do you advise people to think about the results that they're seeing knowing that this is a a very difficult thing to replicate when it just because it a negative result is shown in a lab that doesn't mean it's not physically possible.[00:47:49] Andrew McCalip (@andrewmccalip):It's very difficult to prove the negative here. So tell us how we should absorb the new cycle coming up in the next few days.[00:47:59] Ate-a-Pi (@8teAPi):So I I I I I I might I might say something about that. I think I think this is basically tacit knowledge transfer, and you Kim Kim seems to have been this kind of, like, artisanal, like, you know, experiment list. So you need people to actually sit there in the lab with this guy, and he needs to demonstrate to them. And they need to pick up and and there might be things that he does, which he didn't write down. That that's the like, my my take on it given that He is the experiment list. He's the synthesis on on the team.[00:48:38] Ate-a-Pi (@8teAPi):Given that the team seems to have been only, like, 5 or 6 people, is that this guy is the maybe the only person in the world as of, like, you know, 18 months ago. I'm guessing that, you know, he managed to transfer some of that to the JungTux team. So I'm guessing that at at least one more one more team on on earth has this now. And I'm guessing that this knowledge transfer is now happening to a couple more people. So so you need to see this progress maybe 2 or 3 cycles for, like, a bunch of other people to have learned the skill, and then that's when that's when things get interesting.[00:49:14] Seo Sanghyeon (@sanxiyn):I mean, you don't really need to replicate to to verify this. There, the the team can just the team has the working samples. they can adjust the samples to the laps around the world.Hey, the rest of the episode is for paid subscribers to thursdai. I encourage you to subscribe or upgrade your subscription to access it, there’s almost 2 more hours of in depth conversation, stitching of facts, experts on material science, physics, electrical engineering and MIT folks chiming in. It’s really a great space, around 25K folks have listened to it on twitter so far.
undefined
Jul 27, 2023 • 19min

🎙️ThursdAI - Jul 27: SDXL1.0, Superconductors? StackOverflowAI and Frontier Model Forum

⏰ Breaking news, ThursdAI is now on Apple Podcasts and in this RSS ! So use your favorite pod-catcher to subscribe or his this button right here: Our friends at Zealous have provided an incredible platform for us to generate these awesome video podcasts from audio or from twitter spaces so if you prefer a more visual format, our deep thanks to them! P.S - You can find the full 2 hour space with speakers on our Zealous page and on TwitterHere’s a summary of the main things that happened in AI since last ThursdAI: 🧑‍🎨 Stability.ai releases SDXL1.0* Generates 1024px x 1024x stunning images* High high photorealism* Supports hands and text* Different (simpler?) prompting required* Fine-tunes very well! * Supports LORAs, ControlNet in-painting and outcropping and the whole ecosystem built around SD* Refiner is a separate piece that adds high quality detail* Available on Dreamstudio, Github, ClipDrop and HuggingFace* Also, is available with incredible ComfyUI and can be used in a free Colab!Image Credit goes to ThibaudSuperconductors on Hugging Face? What? Honestly, this has nothing immediate to do with AI updates, but, if it pans out, it’s so revolutionary that it will affect AI also!Here’s what we know about LK-99 so far: * 2 papers released on arXiv (and hugging face haha) in the span of several hours* First AND second paper both claim extraordinary claims of solving ambient superconductivity* Ambient pressure and room temp superconductive material called LK-99 * Straightforward process with a clear replication manual and fairly common materials* Papers lack rigor, potentially due to rushing out or due to fighting for credit for nobel prize * The science is potentially sound, and is being “baked and reproduced in multiple labs” per science mag.Potential effects of room temperature superconductivity on AI: While many places (All?) can benefit from the incredible applications of superconductors (think 1000x batteries) the field of AI will benefit as well if the result above replicates.* Production of GPU and CPU is power-constrained and could benefit* GPU/CPUs themselves are power-constrained while running inference* GPT-4 is great but consumes more power (training and inference) than previous models making it hard to scale* Local inference is also power-restricted, so running local models (and local walking robots) could explode with superconductivity * Quantum computing is going to have a field day if this is true* So will fusion reactors (which need superconductors to keep the plasma in place) As we wait for labs to reproduce, I created a twitter list of folks who are following closely, feel free to follow along! AI agents protocol, discussion and state of for July 2023* Participated in an e2b space with tons of AI builders (Full space and recap coming soon!) * Many touted AI agents as a category and discussed their own frameworks* Folks came up and talked about their needs from the agent protocol proposed by e2b* Agents need to be able to communicate with other agents/sub agents* Tasks payloads and artifacts and task completion can be async (think receiving a response email from a colleague) * The ability to debug (with timetravel) and trace and reproduce an agent run* Deployment, running and execution environment issues* Reliability of task finish reporting, and evaluation is hardFrontier model forum* OpenAI, Anthropic, Google, and Microsoft are forming the Frontier Model Forum to promote safe and responsible frontier AI.* The Forum will advance AI safety research, identify best practices, share knowledge on risks, and support using AI for challenges like climate change.* Membership is open to organizations developing frontier models that demonstrate safety commitment.* The Forum will focus on best practices, AI safety research, and information sharing between companies and governments.* Some have expressed concern that this could enable regulatory capture by the “Big LLM” shops that can use the lobbying power to stop innovation. StackOverflow AI - “The reports of my death have been greatly exaggerated” Stack overflow has been in the news lately, when a graphic of it’s decline in traffic has become viral. They have publicly disputed that information claiming they have moved to a different measuring and didn’t update the webpage, but then also… announced Overflow AI!* AI search and aggregation of answers + ability to follow up in natural language* Helps drafting questions* AI answers with a summary, and citations with the ability to “extend” and adjust for your coding level* VSCode integration! * Focusing on “validated and trusted” content* Not only for SO code, stack overflow for teams will also embed other sources (like your company confluence) and will give you attributed answers and tagging abilities on external contentThis has been an insane week in terms of news (👽 anyone?) and superconductors and AI releases! As always, I’m grateful for your attention! Forward this newsletter to 1 friend as a favor to me if you learned something new? Or alternatively, retweet us on twitter for bigger reach! Thank you! See you next ThursdAI (and on Sunday when I release the State Of Agents recap 😅 ) ThursdAI - Get in on this, and share w/ 1 friend 🫡 This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Jul 23, 2023 • 37min

ThursdAI - Special Episode, interview with Nous Research and Enrico Shippole, fine-tuning LLaMa 2, extending it's context and more

Hey there, welcome to this special edition of ThursdAI. This episode is featuring an interview with Nous Research, a group of folks who fine-tune open source large language models to make them better. If you are interested to hear how finetuning an open source model works, dataset preparation, context scaling and more, tune in! You will hear from Karan, Teknium, LBJ from Nous Research and Enrico who worked along side them. To clarify, Enrico is going in depth into the method called Rope Scaling, which is a clever hack, that extends the context length of LLaMa models significantly and his project LLongMa which is an extended version of LLaMa with 8000 token context window. The first voice you will hear is Alex Volkov the host of ThursdAI who doesn’t usually have a lisp, but for some reason, during the recording, twitter spaces decided to mute all the S sounds. Links and acknowledgments: * Nous Research - https://nousresearch.com/ (@nousresearch)* Redmond Puffin 13b - First LLaMa Finetune* LLongMa - LLaMa finetune with 8K context (by Encrico, emozilla and KaioKenDev)* Nous-Hermes-Llama2-13b-GPTQ - Hermes Finetune was released after the recording 🎊Psst, if you like this, why don’t you subscribe? Or if you are subscribed, consider a paid subscription to support #ThursdAIShow transcription with timestamps: Alex Volkov - targum.video (@altryne)[00:00:55] Yeah. That's awesome. So I guess with this, maybe, Karan, if you if you are able to, can you you talk about Nous research and how kind of how it started and what the what are you guys doing, and then we'll dive into the kind of, you know, Hermes and and Puffin and the methods and and all of it.karan (@karan4d)[00:01:16] Absolutely. Nous research. I mean, I I myself and many other of us are just, like, enthusiasts that we're fine tuning models like, you know, GPTJ or GPT 2. And, you know, we all are on Twitter. We're all on Discord, and kind of just found each other and had this same mentality of we wanna we wanna make these models. We wanna kinda take the power back from people like OpenAI and anthropic. We want stuff to be able to run easy for everyone. And a lot of like minds started to show up.karan (@karan4d)[00:01:50] I think that Technium's addition initially to Nous research, Jim, kinda showing up. And himself, I and human working on compiling the Hermes dataset was really what came to attract people when Hermes came out. I think we just have a really strong and robust, like, data curation thesis in terms of that. And I think that have just some of the most talented people who have come to join us and just volunteer and work with us on stuff. And I absolutely must say, I can see in the in the listeners is our compute provider, Redmond AI.karan (@karan4d)[00:02:30] And, you know, none of this none of these models would be possible without Redmond's generous sponsorship for us to be able to deliver these things lightning fast, you know, without making us through a bunch of hoops just a a total total pleasure to work with. So I would I have to shell and say, you know, I highly recommend everyone check out Redmond as because they really make our project possible.Alex Volkov - targum.video (@altryne)[00:02:52] Absolutely. So shout out to Redmond AI and folks give them a follow. They're the the only square avatar in the audience. Go take them out. And, Karan, thanks for that. I wanna just do a mic check for teknium. Teknium. Can you speak now? Can you? Can I hear you?Teknium (e/λ) (@Teknium1)[00:03:08] Yeah. My phone died right when you were introducing me earlier.Alex Volkov - targum.video (@altryne)[00:03:10] Yep. What's up, Eric? -- sometimes on Twitter basis. Welcome, Technium. So briefly, going back to question. I don't know if you heard it. What besides the commercial and kind of the the contact window, what kind of caught your eye in the llama, at least the base until you guys started, or have you also, like, the other guys not had a second to play with the base model and dove into fine tuning directly?Teknium (e/λ) (@Teknium1)[00:03:35] Yeah. The only thing that really caught my eye was the chat model and how horribly RLHF it was.Alex Volkov - targum.video (@altryne)[00:03:41] Yeah. I've seen some conversations about and kind of the point of Ira, RLHF as well. And okay. So so now that we've introduced Neus research, sorry, I wanna talk to you guys about what you guys are cooking. Right? The we've seen, the the Hermes model before this was, like, loved it as one of the, you know, the best fine tunes that I've seen at least and the the the most performing ones. Could you guys talk about the process to get to the Hermes model, the previous one? and then give us things about what coming soon?karan (@karan4d)[00:04:16] Teknium, you got this one. man.Teknium (e/λ) (@Teknium1)[00:04:22] Yeah. It was basically I saw Alpaca, and I wanted to make it like, remake it with GPT 4, and then from there and just pretty much exclusively included anything that was GPT 4 only, and that was the beginning of the thesis for that. Going forward, though, We still have a lot of low quality data, I think, in Hermes data set that can be cleaned out, and then there's a lot of new data sets that have come out that I wanna start merging into there. also wanna move to something like chat ML or even Vikura format so that we can do some multi turn stuff. It's not very great, long chat.Alex Volkov - targum.video (@altryne)[00:05:03] Yeah.karan (@karan4d)[00:05:03] Within within within the Hermes dataset, you know, a lot of it is public available stuff that's particularly GPT 4. Of course, Technium's massive GP teacher dataset. We also have a bunch of GPT 4 data we had generate that we didn't release necessarily just yet, as well as an instruction set that's particularly focused on tasks like Python, transformers, linguistics, very small dataset of that. That's inside Hermes that, you know, we don't really talk about much, but figure that we'll put some exposure to right now on the spaces. And yeah.Alex Volkov - targum.video (@altryne)[00:05:42] That's awesome. And so the previous Hermes was released on top of LAMA 1, and for many folks, know, obviously, they couldn't use this for different commercial points. And now that this model relates, what the models that you guys release, are you thinking about the license of them? And could you talk about, like, the availability of folks using them in commercial standing now that, you know, the the back of it is commercially available.LDJ (@Dogesator)[00:06:07] Mhmm. I think we have puffin licensed us MIT I'll have to doublecheck on our own own model. I think that's right, Curran, right, or Tech?karan (@karan4d)[00:06:18] Yeah. I think so either that or Apache 2 point Like, if if the base model is commercially usable, you know, the stuff we put out is you're good to go. It's -- Yeah.LDJ (@Dogesator)[00:06:29] So And, like, in our announcements, I put in kind of, you know, one of the main things. It's it's commercially available. the first Nous as far as I think yeah. I'm pretty sure it's the first commercially available Nous model that's released, and a big differential data from Hermes is the fact that, like tech was saying, Hermes is pretty much all single turn data. And it's surprisingly can do pretty decent at multiturn conversations when you actually use it. But then puffin is almost kind of, like, a 180 where it's a vast majority really on context multi turn data.LDJ (@Dogesator)[00:07:09] And oh, I think can you guys hear me so? I can hear. Okay. It's just something's up with that. Okay. Yeah. So puffin is a vast majority, multi turn data, GPT 4 specifically, and a lot of it is actually real human conversations with GPT for that go on for some of them 4k 6 k context, like, even all the way up to the max 8 k context length of GPT 4. And then we took those few thousand conversations of real humans interacting with GPT 4. And now after that, I'm not sure if you've A lot of people probably heard of Camel AI.LDJ (@Dogesator)[00:07:46] So they have the physics, biology, chemistry, and mathematics data set. And then within those, there's a bunch of subtopics that you can carry it through. And I just pretty much spent a good few days curating just handpicking the right subtopics, like differential geometry, logic problems, optimization problems, a bunch of different GPT, for examples, and responses from those different subtopics. And then I specifically added those in certain ways to the puffin dataset.Alex Volkov - targum.video (@altryne)[00:08:17] Awesome. So just just looking for the audience maybe. The puffin model that I think the official name is the red redmon puffin 7B or, sorry, 13B. Yes. This is this is the model that you guys fine tuned, and one of the first is maybe not the first fine tune of llama v Two. that's now publicly available, like you said, maybe with MIT license on Huggingspace, and I think you even added the GGML quantized version. Correct? Mhmm. So and so folks can can go and download that and and already start playing with this. And so first of all, thank you for contributing to the open source. That's great to see. And the speed with which you guys are fine tuned on this is also great to see.Alex Volkov - targum.video (@altryne)[00:08:55] And maybe now that we've introduced this, maybe this is like repeating a bit. So could you speak about the the difference so the difference is the in the data set, in the task that you fine tune? Like, what is the actual difference between the Hermes or the Hermes that's coming out and the Puffin model? What would people use them for differently? Is that like that? That's a question.Teknium (e/λ) (@Teknium1)[00:09:21] The profit model definitely be better at multi turn stuff. That's for sure. Yeah.nisten (@nisten)[00:09:28] So if you want to do anything like OpenAI I'll I'll paste the link above the GGML version of it because I I really I'm I'm gonna test it thoroughly, but I I really think because you guys have use GPT 4, high quality, multi turn conversations, then this can have actual, like, practical use for whoever else was to use it either as, like, something that tells you about the documentation on the site or walks a user through. In other words, this should be better than Hermes then in for, like, customer service stuff, which is just one example.nisten (@nisten)[00:10:08] Anyway, yeah, I'm gonna try. I'll I'll paste the the link above.karan (@karan4d)[00:10:14] It's it's likely better for production use alongside, like, stuff that you have with, like, a retrieval pipeline, like, with lang chain, etcetera. Like, I I would believe that without to get it, you know, or just talking, of course. But, you know, there is even though, you know, with this Lima tech unique of of small examples where we can get, like, a a really good really good model that does really well.karan (@karan4d)[00:10:41] The thing about Hermes dataset and just its size and the various types of data and topics that are in there, I think you get a totally different like, role play or storytelling experience or completion experience with Hermes. Personally, I feel that way.Alex Volkov - targum.video (@altryne)[00:11:01] Awesome.Teknium (e/λ) (@Teknium1)[00:11:01] So and that. Another thing about Puffin Dataset is that it does go up to, like, 8K and Enrico here. has been doing a ton of work on extending Llama's context.Alex Volkov - targum.video (@altryne)[00:11:13] Right. So I wanna I wanna give an introduction then introduce Enrique and and talk about this real quick. Right? LAMA version 1 was released with, again, 2,000 tokens in the contact window. And then many folks, including KaioKendev, and Emozhila. Right? And and some other folks, I think, were involved in bringing some of the quote on quote tricks about what eventually ended up being named rope, scaling, if I'm if I'm not mistaken. And we follow this, and we've talked about the previous news ThursdAI, I. And Llama V2 was released with 4000 tokens in the context window.Alex Volkov - targum.video (@altryne)[00:11:52] And, you know, we're now still used to kind of Claude and the 16k GPT 3 that four didn't seem like a lot. And then many folks were wondering, and, meanwhile, Enrico was working, whether or not the rope scaling method would apply to the next plumber and look like it did. And so I wanna introduce Enrico uh Enrico Shippole. I hope on staying this right. Welcome to the state. Hopefully, you can unmute and and this place works with you. And The second finetune that I saw rest of the was also back with Nous, the Nouse research, and this was the extended version, what's called Longma.Alex Volkov - targum.video (@altryne)[00:12:28] So Enrique will go out of the stage and feel free to introduce yourself, your affiliation with news and LlongMa with with the context window.Enrico Shippole (@EnricoShippole)[00:12:38] Hello. So I'm actually a independent researcher. I'm sponsored by Stability AI, Eleuther AI, and a few other different organizations, including NewsNow. Awesome. I work with different people like Tanishq from Medark, Aaron Komatsusaki, who also is from a Luther and Duck AI. John Ney from Nomosai. So I I have a I have a lot of affiliation with a bunch of different organizations. including together. We're starting a project right now with them.Alex Volkov - targum.video (@altryne)[00:13:13] That's that's so great to hear, and so welcome to Thursday. Welcome to this day. And can you talk to us a little bit about kind of the ROPE scaling method and and how how were you able to, like, find them like this quickly and how the results looked so far? I wasn't able to run this myself. But hopefully, yeah, talk to us aboutEnrico Shippole (@EnricoShippole)[00:13:34] Okay. So initially, The the thing is I actually was hoping that both Emozilla, Bowen, and KaioKenDev would have been able to make it because It was kinda like a equal parts effort on, like, all fronts from each of us. Initially, I had trained some pathways models at 8,000 context length about 4 months ago based on the exposition paper, which did rotary embedding scaling initially. They were one of the first people did it. They based their methodology off of ofer presses alibi.Enrico Shippole (@EnricoShippole)[00:14:11] I would imagine that most people are pretty familiar with Ofir Press in this work on the alibi positional bias that's been used in a wide range of models now. So Emozilla and I came into contact based off of the work that he had seen me doing with the Palm models scaling those to 8000 context length pretraining, not fine tuning. So what we had initially done is basically take a section of c 4 in different data sets that had examples that were all over 8000 context length that pretrained on them packed together.Enrico Shippole (@EnricoShippole)[00:14:50] with a beginning of string and end of string token to help with, like, the attention masking portion of that. After he had seen that, Emozilla actually became into contact with kaikode dev I believe Kaiokendev is how you pronounce it. Kaiokendev had also been following Ofir Press's research. He had started working on his own version of scaling the rotary embeddings, I believe based off of both alibi and exposition.Enrico Shippole (@EnricoShippole)[00:15:22] And what he found is that by scaling the max position all embeddings and the rotary embedding from something like 2048, which you would initially train with. He scaled it up to 8000 or 8192. And he found that by applying, like, in interpolation to the encoding by scaling basically like the the positional index in the rotary embedding, that you were able to essentially turn down the frequency window and rope by like a factor of 0.25.Enrico Shippole (@EnricoShippole)[00:16:01] The scaling depends on the length that you're trying extrapolate to and the initial context length that the model was trained with. So if you were training with LAMA 2, which had an context window of 4096, and you wanted to do the linear interpolation positional scaling to something like 8192. then you would use a scaling factor of 0.5. If you were trying to do it from 2048, which is the original LAMA was trained with, and you wanted to scale it to 8192, then you would use a scaling factor of 0 point 25.Enrico Shippole (@EnricoShippole)[00:16:39] So basically, after we had done all of this, Meta had released a paper around the same time that Kaiokendev had released his blog. They both found very similar finding. They had shown in the meta paper that you only had to fine tune for 1000 steps with the linear positional interpolation scaling to be able to get the benefit of doing a full pretrain at a context window of 8192.Enrico Shippole (@EnricoShippole)[00:17:13] So this is actually like a a big step because it shows that you no longer need to pre train right off the bat at a longer context length. Then you're able to do the fine tuning on essentially a a lower resource like, computational budget and still be able to get the, like, greater results of the longer context window. I know a lot of the major AI companies had been doing just for my work in in personal research with many of them had been doing staged scaling of the context window during training.Enrico Shippole (@EnricoShippole)[00:17:46] So they would pre train basically, when pre training, they would separate the initial examples from a dataset into multiple stages.Enrico Shippole (@EnricoShippole)[00:17:54] So anything that is under the window of 2048, you'd separate from the initial dataset then you take things between 2048 4096, then 4096, and 8192, and you would basically chunk the data sets into those different parts you'd first initially train on the 2048 chunk of the data, then you would train on the data between 2048 and 4096, and then you would do the same thing from 4096 to 8192, or if you want to scale that to 16k or 32k context length. But what we have shown now with both the meta paper and this thing, you don't even need to go through that extensive pretraining and staged process, you can just go from a context length of 2048 to 8192.Enrico Shippole (@EnricoShippole)[00:18:47] scale the rotary embeddings by whatever type of factor that you want to use. So like I was saying, if you're going from 2048 to 8192, you'd be using a scaling factor of 0.25. It only needs 2 lines of code to be able to do that. In the LLongMa post, I had provided an example of scaling the rotary embeddings. The the code was written by Emozilla or Jeff.Enrico Shippole (@EnricoShippole)[00:19:15] We also came into contact with after all these experiments we then came into contact with Bowen, who had worked a lot about the dynamic NTK scaling with Emozilla, and he had also done NTK by parts which we're we're currently training a lot of models on. So we have the Longma 1 models trained on the open llama series, like the suite of those models that use the linear interpolation scaling.Enrico Shippole (@EnricoShippole)[00:19:45] We now have the llama 2 models or the longma 2 suite, which is what we're calling it, again, trained on the linear interpolation scaling And then we have another suite of models coming out very soon that uses the the NDK by parts dynamic scaling. That was really specialized by Bowen, so I do not wanna speak on his behalf. It'd it'd probably be good to get him to talk about it in another one of these.Alex Volkov - targum.video (@altryne)[00:20:14] Absolutely. So let's get in touch after this and and and and set it up. So Thank you for the a very in-depth kind of explanation because we did cover the the the kind of the RoPE killing and how Kaioken in the image boards are ready to wherever he started this in his blog post, and then how it's gonna rotate it. So it's great to to actually hear from the folks who are doing this. I just for the audience, I've attached Enrico's tweet about LLongMA 2, which is now currently trained at AK contact length.Alex Volkov - targum.video (@altryne)[00:20:47] And and Rico, you told us that we may see even double from the So could you think about the next the next version?Enrico Shippole (@EnricoShippole)[00:20:56] Okay. So the the initial training process of doing this up to a context, like length of 8192, can be due with be done, basically, with deep speed, 02. and activation checkpointing. And you're able to fit the model on a A100 80 gigabyte node. Now, we are working on the process of scaling it both to 16 k and 32 k. This requires a different methodology during training, you either need to use deep speed 0.3 or fully sharded data parallelism.Enrico Shippole (@EnricoShippole)[00:21:35] Both of those are are very similar for people who aren't aware. Basically, you're just sharding the optimizer states. The model states across, like, different nodes. You can also use things like tensor parallelism to help with the scaling as well. And then we're going to be basically just adjusting the scaling factor again, collecting a large we've already collected large quantity of data at 16k context length, and we're going to be doing the fine tuning to 16k and be releasing those models Soon, all of this computing is sponsored by stability AI.Enrico Shippole (@EnricoShippole)[00:22:12] They've been very generous what helping with a lot of the independent research.Alex Volkov - targum.video (@altryne)[00:22:17] That so I wanna shout out Stability AI for not only given, you know, the world's stability diffusion, also participating in this kind of next wave of AI. Many folks kinda coined the stability AI moment when released the the stable diffusion of the. I wanna say 1.4 back then almost a year ago now, and many folks are saying the about the Llama 2 release now this commercially open source, and and folks can start, like, doing things for you know, for profit companies can join So we definitely wanna shout out stability for for the effort here. And, Enrico, thank you. And, folks, please follow Enrico, and and we'll stay tuned.Alex Volkov - targum.video (@altryne)[00:22:56] I wanna ask Karan and and Teknium, and other folks from Nous the efforts that that Enrico was talking about. the longer context windows. How would they kinda interplay with the stuff that you're working on with Hermes with with Pufin? Are are kind of the efforts interchangeable? We're gonna see building a top of each other?karan (@karan4d)[00:23:16] So I I think LDJ can definitely speak to this, but I'd like to happily say that once we did Longbow 1 on the 1st Llama generation of models, we already had puffin 2k, 4k, and 8 for that -- Yeah. -- already prepared and ready. So as the LLongMa models for 13B are released, we will also be doing equivalent, puff in fine tunes, and Potentially Hermes fine tunes. We can talk a little bit more about the future of Hermes at a a little bit later, though.LDJ (@Dogesator)[00:23:51] Yeah. I mean, I was pretty much going to say the same thing, but kind of elaborate on that about how before when LLongMa V1 and everything. And during the development of LLongMa, there was actually, like you know, of course, me Enrico who are usually just called concepts of mind and and and Emozilla. Like, we've all kinda, like, been butting shoulders a lot together and just kinda working closely, you know, in the same Discord and whatnot. And it's like, hey. Like, you know, working on this, like, experimental LLongMa with thing. Like, hey. You wanna try, like, fine tuning, and then the plan just kind of ended up being like, okay. Just gonna have this Puffin thing.LDJ (@Dogesator)[00:24:31] that Puffin dataset is already containing a ton of high context conversational data. from GPT 4 and, like, human high quality data. So it's like it's like the perfect fit to have something that's high context capable will be fine tuned on that. And then LLaMa 2 came out, and it's like, oh, Yeah. Let's let's get this out ASAP, and then we'll figure out what we're gonna do later.Alex Volkov - targum.video (@altryne)[00:24:58] Yeah. Great. And it's just great to see, you know, how many opportunities is like this where with open source can the stuff that we're able to now run and gonna iterate on are building on top of each other. They're just incredible. and this is maybe a watershed moment. And I I wanna thank all of you for being here. I wanna kind of let the other folks who usually hear on Thursday, I need to ask you a question or 2 for Nous visitors. Yam and Nisten, if you if you have a question for news or for Enrico, go ahead. I I will stay young.Alex Volkov - targum.video (@altryne)[00:25:29] I know you if you have to ask the super deep technical stuff, and the audience will, like it will fly over their I I won't be using the DM with LBJ and and Rico. But yeah. Of course, the stuff that we haven't covered and interesting tough news. Feel free as it pertains to LAMA 2 is gonna be very interesting, I think, for everyone.nisten (@nisten)[00:25:47] Just to quickly clarify, you guys fine tuned the plain model. Right? Not the chat 1.Teknium (e/λ) (@Teknium1)[00:25:55] Yep. Okay. Yep. The base model. We wouldn't fine that model. The chat 1 at all.Alex Volkov - targum.video (@altryne)[00:26:00] Actually, to -- Yeah. The -- -- to maybe continue this stratigram for interrupting. Just one sec. To continue this question, the there are models they were released by Meta, and you have to, like, register and get the email and everything. And then they put some stuff on Hugging Face. And then the the those models were delineated with, like, dash HF. Have you guys use the HuggingFace or the Meta 1, and do you guys know the difference? I felt somebody that, like, maybe doesn't work as well and to inform her Yeah.Teknium (e/λ) (@Teknium1)[00:26:30] The one on Hugging phase is an FP 16 and the original Llama 2 models in bf16, but we tested the difference between the two models at Carper, and there's such a negligible difference in their quality that it's irrelevant, but we trained on the Hug and Face f P Sixteen ones, but in the f Sixteen ask them.Alex Volkov - targum.video (@altryne)[00:26:52] Sorry. Yeah. Goran, for interrupting. Go ahead.karan (@karan4d)[00:26:56] No. All good.Alex Volkov - targum.video (@altryne)[00:26:58] I I totally forgot what -- That's not it. interrupted today. Yes, Randall. Okay. Nispen, if you have a question for Kiran to follow-up with feel free, and And if not, then, Yum, if you have anything that you wanna ask the the fine folks from Nous, feel feel free as well.Yam Peleg (@Yampeleg)[00:27:17] Yeah. Sure. First, thank you for what you're doing, guys. You're really making a difference for anyone. There aren't many demos online, so anyone that didn't try Hermes, I highly encourage you to try. I don't know why there aren't them. Okay. I know why there aren't demos that cost money, but just try it. Okay? And now I got a question because from my experience, if you train on the open datasets of Hermes, you get a significantly less quality of a model. No. Now I'm fine I'm fine if you don't release datasets. Don't don't get me wrong.Yam Peleg (@Yampeleg)[00:27:54] Just I wanted to ask, is there anything else besides the data that is different? What what tips can you give for, I don't know, someone else that want to train high quality model besides having high quality data.Teknium (e/λ) (@Teknium1)[00:28:08] Everyone understands this. Yeah. The hyperparameters can make key difference. LBJ knows very well because we had to do a ton of different tests. We don't have our freight owners for puffin model. But I'm not sure if those are on the model card for Hermes. If they're not, I can put them And Karen your card can probably talk about the Nous datasets that weren't made public.karan (@karan4d)[00:28:38] Yeah. We've got, like, maybe around, like, 50 k items of data, like, versus, like, total 300 k instructions there that are not released. And to be frank with you about 45 k of them is just more GPT 4, like, alpaca style instructions. The 5000 or so, the, like, 4500 them compose this dataset we have we've been working on that, you know, at this point, I'm pretty comfortable talking about a we call it the p dactyl dataset.karan (@karan4d)[00:29:14] I won't speak on everything that's in it, but, essentially, And I don't know if this is the thing that made the big difference, but it's, like, the the one place where I guess you deviate from just using the open datasets more GPT 4 instructions, but it's got some transformers instructions, some linguistics instructions, some calculus 1, instructions, etcetera. It seems to be pretty good.Teknium (e/λ) (@Teknium1)[00:29:41] Also, Yam, do you have links or anything to the models that tried it with just the makeup of the datasets that we're public from Hermes because I haven't actually seen that before.Yam Peleg (@Yampeleg)[00:29:57] And again, can you repeat that?Teknium (e/λ) (@Teknium1)[00:29:58] didn't hear. Do you have any links to the models that trained with just the open datasets from Hermes that you could share with me later?Yam Peleg (@Yampeleg)[00:30:06] No. No. It's just it's just from my experiments -- Oh, okay. -- on training. Pretty much following the same idea of let's take only GPT 4 from all the open datasets, and the the model that you get is is different. for sure. And and it might be that hyperparameters, you know.Teknium (e/λ) (@Teknium1)[00:30:25] Another thing that we did too is pretty extensive, like, cleaning. We did do deduplication. We removed things like a URL. Like, any response that had a URL in it, we removed in case it was gonna like, hallucinated URLs. Instead of, like, maybe 8 different filtering processes too that might have made our data quality higher.LDJ (@Dogesator)[00:30:48] So as an AI language model?nisten (@nisten)[00:30:51] For anybody -- What do you say? -- for anybody in the audience that hyperparameter meters are are just like the settings in the oven. So it it looks here, like, the ingredients were all okay, but yam mess something up, and before selling as a token -- Yeah. -- came out half baked at the model.LDJ (@Dogesator)[00:31:08] So we're gonna have to check that out.LDJ (@Dogesator)[00:31:10] I'm a big proponent personally of hyperparameter optimization being underrated right now, like, in -- Yeah. -- the current space. And that's something I've kind of focused on a lot specifically for things like puffin and just trying to help others around and use some stuff like trying to optimize they're doing, and even just something like like what you just said about the settings for the oven, I mean, double the amount of time you're putting something in the oven, and it's not gonna come out twice as good. It's not even gonna come out 10% as good. It's gonna come worse. You know?LDJ (@Dogesator)[00:31:45] And although it depends, like, what is your baseline for how how much time you're putting it in the oven and all these different variables that kind of are dependent on each other and affect each other. So it's definitely something you kind of have to build an intuition about to some degree. And then the other end is really I feel like there has to be more investment and more time and energy invested into actual tools that make hyperparameter optimization easier for people that are doing these things.Yam Peleg (@Yampeleg)[00:32:13] Yeah. Yeah. And the thing is that the models are are really big, so it's really expensive to run them. So you have you have a trade off of how many how much computer you're investing in searching hyperparameters rather than actually using it for training. But but I completely agree So one one last question, actually, too.Teknium (e/λ) (@Teknium1)[00:32:33] Actually, one thing before we go on. Something great about the puffin dataset is that it's just like, 3000 or so examples, I believe. And so it makes tuning a lot less expensive because you can finish the whole training in just a couple of hours. So, like, with Hermes, if we wanted to try full ablations and dozens of them, it would take weeks weeks to do.LDJ (@Dogesator)[00:32:55] Yeah. Yeah. Well, to be fair, it's not like it only takes a couple hours on one GPU. We use a a 100 80 gigabytes. So Yeah. Yeah.Teknium (e/λ) (@Teknium1)[00:33:04] Courtesy of Redman.Alex Volkov - targum.video (@altryne)[00:33:05] Thank you, Redman.Enrico Shippole (@EnricoShippole)[00:33:08] Mhmm. I should also probably clarify that when doing the context length, extrapolation, We're doing it on 1,000,000,000 tokens and 64, 80 gigabyte a 100.Yam Peleg (@Yampeleg)[00:33:20] OOf Mhmm.Alex Volkov - targum.video (@altryne)[00:33:23] Yeah. Yam is getting over excited. Alright, folks. I wanna -- Yeah. Yeah. -- maybe maybe ask her on this one less and we'll move on to the the the regular ThursdI update camera cadence. But I will say that, like, folks from Nous research and and Rick and and some other here. Thank you so much for coming up and giving us kind of the insights into how this actually happens. Lama2 just released, you know, a few days ago, and you guys are already pumping out, like, open source fine tuned models. And it's great to see. And just so you know, there's always a stage for you here to come in and and announce things.Alex Volkov - targum.video (@altryne)[00:33:53] And If you do wanna announce, like, a release or something, maybe just, you know, right now, Karan and and Teknium and some folks, I would love to hear like, when the next Hermes is coming?karan (@karan4d)[00:34:06] Before we say that, I just would like to clarify something about Hermes. So we have the original Hermes dataset on LAMA 2 as something that we will release, but also a sequel to the Hermes dataset, Hermes 2. There will be a distinction between these 2, and you'll see you'll see the the the prior come out first and the latter come out after. But as for release, etcetera, I will absolutely let Technium take the stage with those final words.Teknium (e/λ) (@Teknium1)[00:34:36] So the training is nearly done. At least it was about 2.8 epochs out of 3 a few hours ago. So it might be done already. Before I release it though, unlike puffin, I didn't we wanted it puffing out, like, same day that llama 2 came out, so we didn't run any benchmarks. And we had to put all the compute we had on Hermes immediately after we were done with that. So we don't have any compute to do any benchmarks or puffing until Hermes is done.Teknium (e/λ) (@Teknium1)[00:35:06] But before I release Hermes, I do wanna do, like, a full range of benchmarks and stuff like that to make sure everything's good and have a pretty detailed model card, but that should probably only take the rest of tonight at the most. So probably tomorrow morning would be when Hermes comes out.Alex Volkov - targum.video (@altryne)[00:35:22] That's some folks. And you you heard it here first and definitely follow Teknium, Karan, Enrico, LDJ, and the rest of, like, Nous Research folks, and stay tuned. Enrico, go ahead.Enrico Shippole (@EnricoShippole)[00:35:34] Yes. I just wanted to to piggyback off of Teknium comment a little bit. So we did do pretty sense of the valuation of the Lauma 2 AK models. We had run different things on perplexity using Gov Report in a couple different other data sets to make sure that the length extrapolation in the context was working properly. We did passkey retrieval. We also did a lot of extensive human evaluation, which took a little bit. I had wanted to get the LAMA 2 AK models out yesterday, but we decided to push it back one day.Enrico Shippole (@EnricoShippole)[00:36:08] So and what we were doing is we were feeding in research papers and seeing if it could pull out even, like, relevant pieces of information from the context length. And so far, it has been quite successful. So we're we're still running more evals, but the ones so far have shown that there's been, like, no performance degradation, no matter what context length that you're basically using with these extended models.Alex Volkov - targum.video (@altryne)[00:36:32] That sounds great. and now that this this, you know, LLongMa lies out and the next versions are gonna come out as well. I'm sure that some other folks who also contribute to this research and tell you, like, from their own experiences and vibe. So, yeah, I wanna thank folks. Again, this has been very illuminating, and very glad to have you. And, obviously, the stage is yours whenever you want to come here, and we appreciate you. And you guys are welcome to stay tuned and kinda chime in to the rest of the updates. And with that, I think, for folks in the audience, we're moving to the next thing.ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Jul 21, 2023 • 15min

ThursdAI July 20 - LLaMa 2, Vision and multimodality for all, and is GPT-4 getting dumber?

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.If you’d like to hear the whole 2 hour conversation, here’s the link to twitter spaces we had. And if you’d like to add us to your favorite podcatcher - here’s the RSS link while we’re pending approval from Apple/SpotifyHappy LLaMa day! Meta open sourced LLaMa v2 with a fully commercial license. LLaMa 1 was considered the best open source LLM, this one can be used for commercial purposes, unless you have more than 700MM monthly active users (no 🦙 for you Google!)Meta has released the code and weights, and this time around, also a fine-tuned chat version of LLaMa v2 to all, and has put them on HuggingFace. There are already (3 days later) at least 2 models that have fine-tuned LLaMa2 that we know of: * @nousresearch have released Redmond Puffin 13B * @EnricoShippole with collaboration with Nous have released LLongMa, which extends the context window for LLaMa to 8K (and is training a 16K context window LLaMa) * I also invited and had the privilege to interview the folks from @nousresearch group (@karan4d, @teknium1 @Dogesator ) and @EnricoShippole which will be published as a separate episode.Many places already let you play with LLaMa2 for free: * https://www.llama2.ai/* HuggingFace chat* Perplexity LLaMa chat* nat.dev, replicate and a bunch more! The one caveat, the new LLaMa is not that great with code (like at all!) but expect this to change soon!We all just went multi-modal! Bing just got eyes!I’ve been waiting for this moment, and it’s finally here. We all, have access to the best vision + text model, the GPT-4 vision model, via bing! (and also bard, but… we’ll talk about it) Bing chat (which runs GPT-4) has now released an option to upload (or take) a picture, and add a text prompt, and the model that responds understands both! It’s not OCR, it’s an actual vision + text model, and the results are very impressive! I’ve personally took a snap of a food-truck side, and asked Bing to tell me what they offer, it found the name of the truck, searched it online, found the menu and printed out the menu options for me! Google’s Bard also introduced their google lens integration, and many folks tried uploading a screenshot and asking it for code in react to create that UI, and well… it wasn’t amazing. I believe it’s due to the fact that Bard is using google lens API and was not trained in a multi-modal way like GPT-4 has. One caveat is, the same as text models, Bing can and will hallucinate stuff that isn’t in the picture, so YMMV but take this into account. It seems that at the beginning of an image description it will be very precise but then as the description keeps going, the LLM part kicks in and starts hallucinating. Is GPT-4 getting dumber and lazier? Researches from Standford and Berkley (and Matei Zaharia, the CTO of Databricks) have tried to evaluate the vibes and complaints that many folks have been sharing, wether GPT-4 and 3 updates from June, had degraded capabilities and performance. Here’s the link to that paper and twitter thread from Matei. They have evaluated the 0301 and the 0613 versions of both GPT-3.5 and GPT-4 and have concluded that at some tasks, there’s a degraded performance in the newer models! Some reported drops as high as 90% → 2.5% 😮But is there truth to this? Well apparently, some of the methodologies in that paper lacked rigor and the fine folks at AI Snake Oil ( Sayash Kapoor and Arvind) have done a great deep dive into that paper and found very interesting things!They smartly separate between capabilities degradation and behavior degradation, and note that on the 2 tasks (Math, Coding) that the researches noted a capability degradation, their methodology was flawed, and there isn’t in fact any capability degradation, rather, a behavior change and a failure to take into account a few examples. The most frustrating for me was the code evaluation, the researchers scored both the previous model and the new June updated models on “code execution” with the same prompt, however, the new models defaulted to wrap the returned code with ``` which is markdown code snippets. This could have been easily fixed with some prompting, however, the researchers scored the task based on, wether or not the code snippet they get is “instantly executable”, which it obviously isn’t with the ``` in there. So, they haven’t actually seen and evaluated the code itself, just wether or not it runs! I really appreciate the AI Snake Oil deep dive on this, and recommend you all read it for yourself and make your own opinion and don’t give into the hype and scare mongering and twitter thinkfluencer takes. News from OpenAI - Custom Instructions + Longer deprecation cyclesIn response to the developers (and the above paper), OpenAi announced an update to the deprecation schedule of the 0301 models (the one without functions) and they will keep that model alive for a full year now! Additionally, OpenAI has released “Custom Instructions for ChatGPT” which allows a chatGPT user to store custom instructions, information and custom prompt that will be saved on OpenAI server side, and will append to every new session of yours with chatGPT. Think, personal details, preferred coding style (you love ruby and not python) and other incredible things you can achieve without copy-pasting this to every new session! Don’t forget to enable this feature (unless you’re in the UK or EU where this isn’t available) Thanks for tuning in, wether you’re a newsletter subscriber, twitter space participant, or just someone who stumbled onto this post, if you find this interesting, subscribe and tell your friends!“We stay up to date so you don’t have to” is the #ThursdAI motto! 🫡In other news this week: LangChain has gotten some flack but they are looking ahead and releasing LangSmith, an observability framework for your agents, that does NOT required using LangChain! It looks super cool, and is very useful to track multiple prompts and tokens across agent runs! And the results are share-able so you can take a look at great runs and share yours with friends! Don’t forget to share this with your friends and come back next week 🫡— Alex Volkov This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Jul 14, 2023 • 1h 42min

ThursdAI July 13 - Show recap + Notes

Welcome Friends, to the first episode of ThursdAI recap. If you can’t come to the spaces, subscribing is the next best thing. Distilled, most important updates, every week, including testimony and tips and tricks from a panel of experts. Join our community 👇Every week since the day GPT-4 released, we’ve been meeting in twitter spaces to talk about AI developments, and it slowly by surely created a community that’s thirsty to learn, connect and discuss information. Getting overwhelmed with daily newsletters about tools, folks wanted someone else to do the legwork, prioritize and condense the most important information about what is shaping the future of AI, today! Hosted by AI consultant Alex Volkov (available for hire), CEO of Targum.video, this information-packed edition covered groundbreaking new releases like GPT 4.5, Claude 2, and Stable Diffusion 1.0. We learned how Code Interpreter is pushing boundaries in computer vision, creative writing, and software development. Expert guests dove into the implications of Elon Musk's new XAI startup, the debate around Twitter's data, and pioneering techniques in prompt engineering. If you want to stay on top of the innovations shaping our AI-powered tomorrow, join Alex and the ThursdAI community. Since the audio was recorded from a twitter space, it has quite a lot of overlaps, I think it’s due to the export, so sometimes it sounds like folks talk on top of each other, most of all me (Alex) this was not the case, will have to figure out a fix. Topics we covered in July 13, ThursdAI GPT 4.5/Code Interpreter:00:02:37 - 05:55 - General availability of Chad GPT with code interpreter announced. 8k context window, faster than GPT-4.05:56 - 08:36 - Code interpreter use cases, uploading files, executing code, skills and techniques.08:36 - 10:11 - Uploading large files, executing code, downloading files.Claude V2:20:11 - 21:25 - Anthropic releases Claude V2, considered #2 after OpenAI.21:25 - 23:31 - Claude V2 UI allows uploading files, refreshed UI.23:31 - 24:30 - Claude V2 product experience beats GPT-3.5.24:31 - 27:25 - Claude V2 fine-tuned on code, 100k context window, trained on longer outputs.27:26 - 30:16 - Claude V2 good at comparing essays, creative writing.30:17 - 32:57 - Claude V2 allows multiple file uploads to context window.32:57 - 39:10 - Claude V2 better at languages than GPT-4.39:10 - 40:30 - Claude V2 allows multiple file uploads to context window.X.AI:46:22 - 49:29 - Elon Musk announces X.AI to compete with OpenAI. Has access to Twitter data.49:30 - 51:26 - Discussion on whether Twitter data is useful for training.51:27 - 52:45 - Twitter data can be transformed into other forms.52:45 - 58:32 - Twitter spaces could provide useful training data.58:33 - 59:26 - Speculation on whether XAI will open source their models.59:26 - 61:54 - Twitter data has some advantages over other social media data.Stable Diffusion:89:41 - 91:17 - Stable Diffusion releases SDXL 1.0 in discord, plans to open source it.91:17 - 92:08 - Stable Diffusion releases Stable Doodle.GPT Prompt Engineering:61:54 - 64:18 - Intro to Other Side AI and prompt engineering.64:18 - 71:50 - GPT Prompt Engineer project explained.71:50 - 72:54 - GPT Prompt Engineer results, potential to improve prompts.72:54 - 73:41 - Prompts may work better on same model they were generated for.73:41 - 77:07 - GPT Prompt Engineer is open source, looking for contributions.Related tweets shared: https://twitter.com/altryne/status/1677951313156636672https://twitter.com/altryne/status/1677951330462371840@Surya - Running GPT2 inside code interpreter tomviner - scraped all the internal knowledge about the envPeter got all pypi packages and their descriptionswyx added Claude to to smol menubar (which we also discussed)SkalskiP awesome code interpreter experiments repoSee the rest of the tweets shared and listen to the original space here:https://spacesdashboard.com/space/1YpKkggrRgPKj/thursdai-space-code-interpreter-claude-v2-xai-sdxl-moreFull Transcript: 00:02 (Speaker A) You. First of all, welcome to Thursday. We stay up to date so you don't have to. There's a panel of experts on top here that discuss everything. 00:11 (Speaker A) If we've tried something, we'll talk about this. If we haven't, and somebody in the audience tried that specific new AI stuff, feel free to raise your hand, give us your comment. This is not the space for long debates. 00:25 (Speaker A) We actually had a great place for that yesterday. NISten and Roy fromPine, some other folks, we'll probably do a different one. This should be information dense for folks and this will be recorded and likely we posted at some point. 00:38 (Speaker A) So no debate, just let's drop an opinion and discuss the new stuff and kind of continue. And the goal is to stay up to date so you don'thave to in the audience. And I think with that, I will say hi to AlanJanae and we will get started. 00:58 (Speaker B) Hi everyone, I'm NISten Tahira. I worked on, well, released one of the first Docker chat bots on the market for Dr. Gupta and scaled it,and now we're working on getting the therapist bought out once. We can also pass more testing and get Voice to work at a profitable manner because we don't really have VC. So at the scale of few hundred thousand users, the API bills matter quite a bit. 01:31 (Speaker B) So, yeah, these spaces have been pretty helpful because I have some trouble with running a Voice transformer, trying to run it on the browser on web GPU, and then the person that wrote Transformers JS comes in here and just says, oh yeah, that back end is messed up. Just try blas and synth and stuff. So these have been very interesting and technical spaces. 01:54 (Speaker A) Yeah, we need to get Zenova in here. Zenova is the guy who NISten wasreferring to. Al Janae, do you want to give a few words of intro and say hi and then we'll start? Just briefly, please, because I think weneed to get going. 02:09 (Speaker C) Sure. Hi, I'm Janae. 02:11 (Speaker D) I'm the resident noob, I started messing around with AI at the beginning of. 02:16 (Speaker E) The year, and I also host the. 02:18 (Speaker D) Denver AI Tinkerers coming up next week. 02:20 (Speaker A) And if you're in Colorado area, greater Denver, please join us. It's going to be a blast. 02:27 (Speaker F) Hi, I'm Al Chang. I'm kind of an old school technologist. Just getting started with the AI again and just here to help. 02:36 (Speaker A) Yeah. All right, folks, so I think we've had a whole space on this. Simon Wilson and me and many, many other folks chimed in. The second this was released. 02:50 (Speaker A) Was that six? Was that Sunday? It's hard to keep track of actual days. Saturday, Saturday, last week, exactly during those spaces, by the way, as we were talking, Chad GPT, Logan and everybody else from OpenAI announced general availability of Chad GPT with code interpreter. So GPT four with code interpreter. 03:12 (Speaker A) And I think we just heard from Matt that even some folks who got access to the slept on it a little bit because it's maybe potentiallybecause of its very horrible name that's really hard to type interpreter and get lost in the R's. But it's an extremely powerful new superpower that we've got. And we've had the whole space talking about use cases that people already had. 03:37 (Speaker A) It was like three days into it and since then I bet that many more people tried it. I think Swyx 20,000 listens to that space, plus the pod. At least people definitely want to hear new use cases, right? 03:53 (Speaker G) Yeah, not much else to add about it. I think it's the feature for Switch. 03:59 (Speaker A) Posted a whole deep dive essay and coined it GPT 4.5 between us friends. And one of the interesting things about it is that we think at least that's where we are currently after playing around with this, is that it's a fine tuned model. So they kept training this on actually running code and executing code. 04:21 (Speaker A) That's what we believe. We don't know, nobody confirmed this and thenthat it's fine tuned from an earlier checkpoint of GBT Four. And so we actually had some folks on spaces talking about that it's less restricted and better like previous times. 04:36 (Speaker A) So it's an interest, I think NISten right. We have some folks who tell us they're using code interpreter without the code part. They just stopped the GPT Four just because it's that model. 04:48 (Speaker A) And I think also they took down the 25 messages per hour restriction on code interpreter. I've had like four hour sessions and it stopped like I didn't saw complaints. 05:03 (Speaker G) So it's just better. 05:06 (Speaker A) It's also fast. I think it's fast because not many people maybe use this by default and this could be the reason for the speed, but it's definitely faster for sure. I think also context window, was it Yam? Somebody summarized the context window and they told us the context window for code interpreter is eight k versus the regular GPD for actually that could be also a kick. 05:29 (Speaker G) You mean Yam copied and pasted. 05:34 (Speaker A) I would encourage you and Yam need to kiss in the cup because Yama isdoing a lot of legwork to take down the stuff that he posted and Yamais working on that and it's very visible and you guys need to do there you go, yam, you need to clear the air. However, Pharrell and Gabriel bring you up as well. And we're going to keep talking about code interpreter because that's what we're here to do. NISten and a few other folks and we started cooking with code interpreter. 05:59 (Speaker A) And by cooking I mean we started stretching the complete boundaries of what's possible there. And I think Simon Willison kick started this with the latent space Pod. So for folks who are not following latent space pod, feel free to follow SWIX, his main account, not this hidden one. 05:59 (Speaker A) And SWIX reposted the spaces we had simon Wilson was able to run nodeJS and Dino within code interpreter, even though OpenAg didn't allow for that by uploading like a binary and asking code interpreter to generate. Simon then promptly said they fine tuned the model away from that and we found ways anyway to ask it to do some stuff. I havea thread on how I was able to run a vector DB chroma inside code interpreter. 06:10 (Speaker A) I ran whisper CPP. We saw some folks running GPT-2 inside code interpreter, right? So imagine an Ll GPD Four running another and talking to it. It's like a little brother inside. 06:10 (Speaker A) I personally love that inception. I don't know if the person who ran GPD Two is in the audience as Dan I think was the nickname NISten. I don't know. 07:22 (Speaker A) Surya. 07:23 (Speaker B) Surya. He also wrote the search to PDF plugin for GP Four plugins andhe wrote that in like two days and it's more used than any other enterprise thing, which is pretty hilarious. 07:36 (Speaker A) We need to get surya. 07:38 (Speaker B) Yeah, he just did that as I'm just going to do a search plugins for PDF and it's like the most used. 07:45 (Speaker A) So dope pretty amazing. Again, in that space we've talked about having like a living manual, so to speak, for code interpreter use cases because it's coding. So it covers pretty much everything that we can think of as coders, maybe just in Python, maybe restricted to an environment. And I've been trying to do that with the code interpreter can hashtag and I encourage all of you, let me pin this to the top of the space, to the jumbotron if you have an interesting code interpreter thing and I'll bring up Skalsky P to the stage as well. 08:03 (Speaker A) And Lantos, so many good friends. If you have a very interesting codeinterpreter technique or skill or new thing that people can do without coding skills, please tag with this hashtag so folks can findthis. Otherwise I will cover the main three things the code interpreter gave us besides the new model. 08:42 (Speaker A) One of them is uploading files. And since we've talked, we've noticedthat you can upload up to 250 megabyte files and those can be zips ofother files. So we've uploaded like full models weights. 08:55 (Speaker A) We've uploaded bin files. It's incredible that you can now drag and drop whole directory and have JPT just know about this and read aboutthis. We've uploaded weights in embeddings. 09:08 (Speaker A) You can then obviously execute code in a secure environment, which isagain incredible, and you can download files, you can ask it to actually generate a download for you, which is also super, super cool. Maybe one last thing I'll say before I'll give it to the audience for a few more cool use cases. And folks in the stage, please feel free to raise your hand. 09:21 (Speaker A) I'll get to you in the order that you raise your hand if you have a use case. Some folks built like a built in memory built in brain within code interpreter just to save to a file. That's what I try to do with my vector DB and then they download that memory at the end ofevery session and then upload this to the next one and have some likea prompt that reminds the jgpd like to start from that point. 09:50 (Speaker A) So in addition to the context window, they're also having a separate offloaded file persisted memory. So code interpreter incredible. Again. 10:00 (Speaker A) Potentially GPT 4.5. And if you haven't played with this, feel free to if you don't know what to play with, follow the code interpreter can hashtag and let's get to Skowski. 10:11 (Speaker A) What's up, man? 10:14 (Speaker H) Hi, hello. Do you hear me? 10:15 (Speaker A) Yeah, we can hear you fine. 10:19 (Speaker H) Yeah, I've been playing a lot with code interpreter over the past five days, mostly with computer vision use cases because that's what I do. I haven't introduced myself. I'm pretty much doing computer vision full time for the past five years and was focusing on like when I saw that you can input image and video, that was immediately what I was thinking, we need to make it to computer vision. So I wentthrough some low effort tasks. 10:46 (Speaker H) So I managed to run old school computer vision algorithms, face detection, tracking of objects, stuff like that. But I also managed to exploit it a little bit. So you can add yolo object detection models to the list of models that were run in code interpreter. 11:15 (Speaker H) There are some problems with memory management, so I'm not yet fully happy with the result. But yeah, I managed to run it on images and onvideos and the things that are super cool and are kind of like underrated right now, false positive. So when the model detects something that shouldn't be detected, you can really use text to ask code interpreter to filter out false detections. 11:48 (Speaker H) You can just give it your feeling like why that stuff is happening orwhen or where. And it's very good at cleaning the detections, which was kind of like mind blowing for me. And one thing that I noticed that it sucks at is I managed to create an application that counts objects moving on the video when they cross the line. 11:55 (Speaker H) And I didn't use any off the shelf libraries, I just had detector andsay, okay, now draw a line and count objects when they cross the line. It's terrible at that, writing math logic to figure out that something crossed something, we had like ten prompts or twelve prompts exchange and I basically bailed out on that, forget it. So there are some things that blow my mind, but there are something thatprobably not. 12:49 (Speaker A) So folks, feel free to follow Skowski. And also I just pin to the topof the Tweet his brand new awesome code interpreter use cases, git repo, and there's a list, there's a bunch of use cases there. This could also serve as a de facto manual. So feel free to go there at PRS and follow that for updates. 12:52 (Speaker A) And I want to get to Lentos because he seems to be unmuting. What's up, Lentos? 13:12 (Speaker H) I was just going to say I can't follow him because he's blocked me. 13:15 (Speaker C) Sad face. 13:16 (Speaker H) Oh, no, I noticed that, but I'm not sure why. I will undo that. 13:20 (Speaker A) All right, I'm the peacemaker in the status. Please kiss and make up.You two as well. Everybody should get along. 13:26 (Speaker A) Yay. I want to get to some other folks who came up on stage recently.And Gabriel, welcome to talk about code interpreter and your use cases. 13:35 (Speaker A) Jeanette, if you play with this, I would like to hear two more opinions before we move on to the next incredible thing. Yeah. Oh, you guys are talking about let's get together and then June sorry, I should have been explicit about the order. 13:54 (Speaker E) No worries. So I just posted a comment on this space about the message cap on a conversation. So even though in the UI, it still says 25 messages per 3 hours, if you look at the network request, youcan see that. And I posted this, it's actually 100 messages per 3 hours now. 14:12 (Speaker E) And I don't know if they're scaling that up and down as demand increases and decreases, or they're just trying to trick people into conserving their messages, but it's definitely been on 100 for a little while now. Can you confirm same thing you can see in the network? 14:32 (Speaker A) Can you confirm the same for the regular mode, or do you think the regular mode is still restricted? Well. 14:41 (Speaker E) Based on just the fact that there's only one message cap, they don't have message cap per model. So I think it's just consistent across all the GP four models. And that's also my experience in the last it's been a little while now. It's probably at least a couple of weeks that it's been higher. 14:51 (Speaker E) And same thing we discussed, I think, on Saturday about the context window. And you can also see it in the API that the context window iseight K for plugins and code interpreter, and it's 4K for the base GPT four model. 15:16 (Speaker A) That's awesome. Like suicide. Better in every single way. 15:22 (Speaker D) Yeah. 15:23 (Speaker A) Awesome. Thanks. 15:24 (Speaker E) Yeah. In terms of use cases I can share, I've been digging around a lot in the code interpreter, and I was really trying to hone in on why are the packages that are installed there, the Python packages inthe environment? Why are they there? Some of them seem really random,and some of them make a lot of sense. And they released it, saying it's for, basically data analysis. And a lot of them make sense for that, but some of them are just really wild, like the ML packages. 15:54 (Speaker A) And the Gabriel folks in the audience. If you look up at the jumbo tone where we pin Tweets two Tweets before there's a Tweet by Peter Zero Zero G, who actually printed all the packages and asked GPT Fourto kind of summarize what they do. So if you have no idea about the potential capabilities of what it can do, feel free to pin that tweetfor yourself. And then it has a bunch of descriptions of what's possible. 16:11 (Speaker A) So go ahead. Gabriel. Yeah, cool. 16:28 (Speaker E) Yeah, I've done the same kind of thing with just a short yeah, I got it to do a four word description for each one. So if you're looking for a really short description of each package, I'll post that tweet.And if you're looking for a long one, I think Peters is great. And what you can see there is that there are packages for web development, right? There's Fast API, there's Flask, there's a bunch of other packages for Web development. 16:40 (Speaker E) And besides the fact that there's no network access, which obviously other people using it might be turning it on, but it was just interesting to me. My perspective is that OpenAI has been using this internally throughout all their teams for development and testing it internally, but probably also using it pretty consistently. They probably have access to the Internet. 17:14 (Speaker A) Yeah, I'm sure they have access to. 17:15 (Speaker E) The Internet and they can install new packages. But I think they alsohave the ability, instead of uploading files and downloading files, they have the ability to just mount persist memory, I don't think, topersist. I think they just mount their local working directory on their computer right wherever they're working. So they have their active directory where they have their project, and they just mount that and give the code interpreter access to the whole directory withtheir whole repo of their project. 17:48 (Speaker C) Yeah. 17:48 (Speaker E) And then Chat Gvt is just writing code to the working directory and reading from there and it can explore their whole project. We can do that now by uploading, you can zip your whole project and upload the whole thing zipped and have it unzipped. And then it can kind of explore your whole project. But then once it makes some changes, you want to commit them, you have to ask it to zip the whole thing back, download it and upload it. 17:48 (Speaker E) And then I think what they're able to do is more of like a kind of peer programming thing where the developer makes some changes and then Chat GPT makes some changes and they're kind of working together. This is taking it one step further. I don't know if they have this or not, but it would be super. 18:29 (Speaker A) Cool in the realm of updates unless there is no speculation. But I would love to explore this more with you in the next stage because this applies to open source and how people already saw somebody tag us after the last space and said, hey, I'll build this open source. Iwould love to pin this to the top of the space. However, I want to move on to new space and then move on to other updates. 18:51 (Speaker A) Sorry to interrupt, but thanks. I think that the collaborative, persistent code superpower that probably maybe at some point will come to us as well. Plus the internet access is like another ten x I want to get to Skowskin and lent us and I think we'll move on to Claude. 19:08 (Speaker A) Thanks Gabriel. 19:11 (Speaker H) Yeah, I have a question. I'm not really sure guys, if you notice thatI was obviously experimenting with PyTorch because I needed it for computer vision. I noticed that the PyTorch version that is installedin the environment actually pre compiled to work with CUDA. So it's aGPU version of PyTorch. 19:31 (Speaker H) Even though that in the environment you don't have access to GPU, youonly have CPU. So I'm curious guys, what you think about that. Why isthat? Any ideas? 19:42 (Speaker A) Ideas that just come from what Gabriel just said? Likely we're getting the same Kubernetes container. However, the open AI folks have like unlimited stuff. They probably also have CUDA that would make sense right there is probably connected to a GPU as well, but that's just an idea. Lantos, I want to get to you and then we'll moveon to Claude. 20:02 (Speaker A) Folks and folks in the audience, feel free to hit the little right button on the bottom left looks like a little message and leave comments through commenting as well. Moving on to Claude V Two. Folksin the audience and folks on stage, feel free to hit up the emojis plus one. 20:19 (Speaker A) Minus one if you have tried Claude V two if you like it and you haven't liked it. I'm going to cover this anyway because I think somebody called me, I think Roy from Python called me a Cloud V Two fanboy yesterday and I first got offended and I told him that I'm just a fanboy for 24 hours. Before that I was a code interpreter fanboy and then I figured with myself whether or not I am a fanboy ofClaude V Two. 20:43 (Speaker A) And yeah, I am and Sweet told me to relax and in fact I invited him here to be the red blanket on the other side of the list. Anthropic the company that we can definitely consider number two after opener. I think that's fair in terms of quality. 21:02 (Speaker A) Have long released Claude version and they made some ways when they released Claude AKS clong with 100K complex window, they have released Cloud V Two and let me paste some Claude sorry, pin some Claude thingies in the jumbotron, sorry. However, Cloud V Two released with multiple stuff and I want to focus on two stuff and I think we'll cover the UI first and then we're going to talk about themodel itself, UI wise and product wise. My hot take and I'll pin thisto the top. 21:38 (Speaker A) Unfortunately not debate this, but I love you, all of you. Is that asproducts, Cloud V Two right now beats JPD as a product. My mom can gointo two websites and she'll prefer one versus the other one. 21:51 (Speaker A) Or my friends that don't know Xai as plugged in as we are, theirs is free. And I think Cloud V Two beats GPD 3.5, which is also free, and 100K context window with the model being traded, 200 unleashes, a bunch of use cases that were not possible before. 22:12 (Speaker A) It just frees you up. If you heard Skowski just say the limitations of code interpreter. A bunch of these limitations stem from the eightK context window. 22:13 (Speaker A) If you print a bunch within the code that you're doing, code interpreter sometimes forgets what you guys talked about 20 minutes ago. And the 100K context window also means a long, long conversationhistory with the model. And I think it's really great. 22:37 (Speaker A) Not to mention that you can drag and drop full books in there. Those books need to be in like one or two files and they still don't acceptzip files. And I'm planning to release an extension soon that does this for us and unifies and single files. 22:51 (Speaker A) So hopefully by next week we'll have some updates. However, once you upload that much or you can upload like a transcript or a podcast, you can do a bunch of stuff because Cloud V Two is also better trained on code and we saw a significant jump in wait, I'm switching to the model, so let me get back to the UI. The UI allows you to upload files. 23:09 (Speaker A) The UI has a command k interface, which I personally love. I hit Command K in every website and see if they support it. You can just start a new chat real quick. 23:21 (Speaker A) It doesn't have Share, but it's definitely refreshed and free UI. It's called Cloud AI and that's the URL, and if you haven't tried it,definitely try it. Comments about just the product side and the UI side before we move to the model? Anybody play with this? Anybody like it? Anybody loves the upload files feature? I would love to hearhands and comments. 23:42 (Speaker A) Go ahead, Matt. 23:44 (Speaker D) A bit of a weird thing, but what I've noticed is it's actually quite frustrating if you want to paste text in it actually, if it's over a certain length, will paste in as a file. Little small thing. Hopefully they'll change it, but it is really annoying because then you can't edit it. Chat GP does do that much better, but I generally agree with you that overall the product experience on Claude is. 24:03 (Speaker A) Significantly the new one. The fresh coat of paint they released for us. I will say that Cloud so far was kind of a hidden gem, that only folks who got access to the API actually got access to their UI, and that UI was very restricted and folks who have access to Cloud API know what I'm talking about. I think that UI is still around. 24:22 (Speaker A) It still shows your history. It's like very restrictive. It's not as cool as this it's not as leak as this. 24:27 (Speaker A) So we like cloud AI, definitely a plus. Check it out. Now, let's talkabout the model behind this UI, because that model also changed and several incredible things that changed with it. 24:38 (Speaker A) First of all, they released a new model, same price as the previous one. We love to see this. Please everybody, including opinion, continue giving the same price and cheaper and cheaper down the line.24:41 (Speaker A) We love to see this. Second of all, they claim it's been fine tuned on several things. One of them is code. 24:54 (Speaker A) And we actually saw a bump in the evaluation called Human Eval, whichis a set of questions that OpenAI released and I think the bump was from like 55% to 78%, which I think beats 3.5 and is not there compared to GPT four. Correct? 25:14 (Speaker C) Yeah, and four and four on past first on the first, not on GPT four that is allowed to refine and fix it there, but on the first trial. Yeah, by a little bit. 25:33 (Speaker A) So, news to me and thank you for joining in the past numbers is how many times it's able to reflect upon the sensors and improve them. 25:43 (Speaker C) The past time is kind of what I meant by reflection is even stronger GPT four. If GPT four sees the exception, it can come up with a solution. So this is not in the Human Eval test, but if you use GPT four this way, you get to 90 something percent, which is which I think it's more realistic if you think about it. No programmer writesthe whole code in a one go. 26:10 (Speaker C) You write it intuitively, six bugs and so on. And also in code interpreter, you see it. But it is remarkable to see state. 26:19 (Speaker A) Of the art on first and it's significantly better in code. And I suggest folks who previously tried quad and haven't impressed to try as well. An additional crazy thing that they've trained on is 100K contacts window and they've actually trained, they claim on 200K contact window, so twice as much as the previous round. And we followthis one guy of your press, the guy behind Self Ask with Search and the guy behind Alibi, the ability to extend complex windows. 26:55 (Speaker A) He just defended his PhD and he talked about complex windows and he was impressed with the way they presented and the way they showed their loss curve. And so this could be we saw the paper maybe this week the folks saw the paper where the window dips in the middle. There's like less attention in the middle of the beginning at the end. 27:03 (Speaker A) And it looks like that's not the case for Claude as well. So I suggest you try the huge context window and al you have your raised hand and then we'll talk about some other model changes. 27:26 (Speaker F) Yeah, I would talk a little bit about I used Claude about a month anda half ago to win Best Solo Hacker at the Craft Ventures hackathon david Sachs won. Yeah, it had like 200 entries, but it's exceptionally good at creative writing and also like comparing and contrasting. I don't think people have really taken advantage of whatthe context window is capable of doing. It's more than just loading single files in. 27:53 (Speaker F) So what I did for the project was I loaded these large legislative bills, these like 50 page unreadable bills, and you turned them into relatable narratives. So one of the things that Claude can do is you can adopt a persona. So a lot of times with summaries, summaries justcompress the text that you see, but you can tell it to say, write 1000 words from a social conservative point of view, or a bus driver's point of view, or a social liberal point of view. 28:21 (Speaker F) And what that does is it takes all of its knowledge about the outsideworld and gives you not a summary, but it gives you essentially an essay about the practical effects of something like a bill. I've actually been working with the idea of reading a book and having it tell you what I would have learned from this, because that's actuallyprobably what you're more interested in. What it can do in terms of comparing and contrasting large essays is exceptional. 28:51 (Speaker F) So you could have it say, write 2000 words from a social conservativepoint of view, 2000 words from a social liberal point of view, and then have it contrast the essays, which is something that would be very difficult for a human to do. So you get to give it multiple files and have it just give you a more balanced approach so you get rid of some of the bias that comes in. 29:18 (Speaker A) My dream, go to my dream project that I never get to is to create this for Twitter as like a Chrome extension that I can select a bunchof tweets and then say, remove the bias from this and just give me the debiased version of all of this. Yeah, completely. Like the crossreference ability of Cloud between because of this context window is incredible for many, many use cases. 29:41 (Speaker F) Yeah, I would say that as far it's not as good as GPT Four for certain things. But that context window is fantastic. And I would saya lot of people that are using embeddings and retrieval, you can actually just put the whole thing in the context window and ask questions to that and then you have a baseline to compare your results from it. Most people, if they're chatting to a website or something like that, you actually can just put the whole thing in there as opposed to trying to chunk it up and do questions and you'llsee that your results are much better that way. 29:51 (Speaker F) And for most people, that would be good enough. 30:17 (Speaker A) So additional thing that the additional thing that Cloud was trained on, they've talked about the output tokens, just the number. Of output tokens of how much cloud is able to generate. And they've saidthat previous models, I don't know if the same about GPT, I haven't seen numbers on GPT Four, but they've said that previous Claude models were focused on shorter outputs just as they were trained. Andthis latest model was trained to output up to 4000 tokens in output. 30:47 (Speaker A) This is added to the fact that they also fine tuned it and trained tooutput JSON files, complete JSON files as responses, which we as engineers, we waited for this and Open Xai gave us functions via kindof here you go, there's the function interface. And we love the function interface. The function interface kind of locks us down to the OpenAI ecosystem. 31:04 (Speaker A) And it's great to see another model that's like very close to state of the art in human evil that also is now fine tuned to respond in full intact JSONs. And those JSONs can be 4000 tokens at length. Any thoughts on these? 31:28 (Speaker F) Yeah, I can confirm on it being able to write large amounts of output. I mean, I was having it write like 2000, 3000 word like sort of essays and outputs and it was fine with that. 31:40 (Speaker A) Yes. And I think it's I'm going to. 31:45 (Speaker B) Stick with GPT Four myself. But this might be pretty useful for just dumping in an entire code base, given the 100k context window and then getting some reviews and stuff, and then maybe moving some of the stuff. 32:02 (Speaker A) Once I stop posting status and build that chrome extension that you upload the zip and it flatlines it to one file and then upload it, then we'd be able to do, like, a proper comparison, because code interpreter can take zip files and then extract them. Oh, one difference that I want to for folks in the audience, GPD Four with code interpreter allows you to upload zip files, et cetera. We talkedabout this. It does not load them into context window, right? So there's like eight k context window. 32:30 (Speaker A) The files that you upload are not automatically in the context window. The model doesn't it has to write Python code that actually prints the files. And it usually does like the first few lines, hint,hint. 32:30 (Speaker A) The folks in the audience who get my drift. But it doesn't usually read all the unless you specifically ask it to and Claude does. So everything you upload to, Claude goes directly to the immediate working memory of the complex window. 32:38 (Speaker A) And that's a major difference to watch out for and also take care of.Go ahead. 33:00 (Speaker C) I would like to ask everyone before I say my opinion, what do you think about it in comparison to GPT Four about the performance? What do you think? 33:10 (Speaker A) I would like comments from folks who actually use both and did the comparison. And before I get to folks, please raise your hand to answer. I want to call out SWIX's small menu bar which allows you to actually Swyx. Can you give us like a brief two minutes on the menu bar thing? 33:28 (Speaker G) Yeah, well, you don't have to choose. Just run it all the time on every single chat. So it's a little electron app that runs in the menu bar. And I've been maintaining it and I just added Cloud Two this week. 33:42 (Speaker G) Cloud Two is not super stable yet. Sometimes it will fail to submit the button. So you just have to retry manually to submit the button. 33:50 (Speaker G) But yeah, it's a great way to a B test models, but then also just amplify every question with between four to five different chat models with the answers. So I've been trying it. It's up to you if you want. 34:07 (Speaker A) To. 34:10 (Speaker C) Find it. 34:14 (Speaker A) With the announcements, if you can. Yeah, awesome. Yeah, just basically and maybe for instance, you don't have to stop using, you don't have to choose. So I think the last thing that we need to acknowledge it's, Claude, is the multilinguality. 34:28 (Speaker A) So they actually focused on showing us how much better, like, the newones from previous ones, and they posted blue scores, Bleu scores, clock Two is significantly better at languages than the previous versions. I think, to answer your question, I think it's close to GPDFour, if not better at some things. Hebrew goes fluently, and usuallyHebrew is not that great. 34:57 (Speaker A) Russian and Ukrainian that I use also go fluently. And that part is really good with a lot of context because you sometimes need to do a lot of translation, or at least I need to do a lot of translation. 35:11 (Speaker C) Yeah, multilinguality works great. I was surprised. Absolutely. What I think if you just compare the two on the same prompt, the same question, I have a feeling that GPT Four is slightly better, but I just don't have an example to tell you. 35:31 (Speaker C) Okay, here I don't know, it's a strange situation, but I really wanted to ask you, like, what did you try and work better here and there? 35:38 (Speaker A) So here's my use case that GPT Four currently cannot do. Yesterday, Lex Friedman interviewed Israel's Prime Minister Benjamin Netanyahu in one of the weirdest turns of history this podcast was, and given that I know kind of who Benjamin Netanyahu is from, before I decided to not listen to this, I decided to use the tools that we have at ourdisposal. So I ran this through Whisper with Diarization. So I have, like, a very nice transcript of who's talking. 36:10 (Speaker A) When I took that, I just dumped this as a text file. And I agree withMatt, it's a little bit annoying that Claude turns whatever you pasteinto like, a little text file uploads. That because you can't edit it. 36:21 (Speaker A) However, I uploaded that transcript directly to Cloud, and then I asked it to do sentiment analysis, entity extraction, and sentiment analysis and entity extraction. Something that if I'd asked GPT code interpreter, it would probably write some Python code to do this, andQuad just kind of did it. And I haven't seen GPT Four being able to do this for bigger files. 36:38 (Speaker A) And once I could just let me just this point. I continued by saying, hey, because of the new coding abilities of Quad, I asked it like, hey, print me a Python file that dumps whatever table of topics he mentioned and sentiment, negative, positive, dump it into a word cloud. That's something the code interpreters can actually do and show you. 37:03 (Speaker A) But I asked it from Quad because previously Claude was s**t at codingand it gave me Python files that ran from the first time. I didn't have to change anything, there was no bugs. And then showed me a wordcloud of everything that was mentioned by BB in that podcast and it all took like maybe seven minutes. 37:11 (Speaker A) And I don't know if for bigger complex windows, GPT Four can currently do this. Go ahead, Al. 37:28 (Speaker F) Yeah, I've actually been putting a lot of transcripts for podcasts inthere and you can actually have the because it seems so much about the speakers and it knows about the speakers, you can actually have them continue a discussion about things that they didn't actually discuss. Yeah, so it's like you can have it say, okay, well, what aresome topics they disagreed on and then some things that they didn't cover? Tangentially, you can just have it give you another two minutes of interview and it does a pretty reasonable job, especially with public figures that it actually has a lot of their background on. So it's pretty interesting. 38:01 (Speaker A) And not to mention free, ngbt Four needs a $20 a month payment and quality is free. 38:08 (Speaker F) That's a good point, too. For those of you that have eval keys, you'll notice that they're actually not charging you for them, so youcan actually go on as long as you want. The limitation is that you can only do one request per organization. So if it's just a single person, they only charge you basically when you start deploying for commercial purposes. 38:21 (Speaker F) So that's something that people may not have realized. 38:32 (Speaker A) So I think we've covered everything right, trained on 200K context, which they can enable tomorrow for us, and we'll get like two X. It'sgoing to be insane. There is some stuff that they have in Cloud in a tropic called Constitution AI, so they have a mix of Rlhf access and Constitution AI. So they're working on their model to actually be more helpful, but also more safe and less jail breakable. 38:57 (Speaker A) They talked at length about this. We talked about human evil better and same price and free playground. I think we've covered most of it.39:03 (Speaker A) So anything else about Quad that we haven't covered, feel free to raise your hand and tell us, and if not, I think we can move on. Whatdo you guys think? 39:17 (Speaker G) I'll mention briefly, did you talk about the multiple file uploads? 39:21 (Speaker A) No, go ahead. 39:24 (Speaker G) So I think it's just an interesting way difference between co interpreter and Claude code interpreter. You can only upload one file, right? But it can be a zip file with multiple files in Zion. Soit's de facto multiple files, but then you can only run code on that.Whereas what Cloud here is doing is something slightly different, which is to me is interesting, which is you can upload multiple files, it just reads the file straight into the context and it's using that 100K context to synthesize answers. 39:24 (Speaker G) So you can do, for example, PDF A and PDF B and give me a comparison between the two of them or synthesize knowledge across them. And I think that is something that code interpreter cannot do because code interpreter will only run code across files. So I think that's noteworthy. 40:15 (Speaker G) It's called genuinely coming up with one new thing that is not copying chat GBT and good for them. 40:23 (Speaker A) Yeah. And unfortunately no zip allowed. But we're going to fix this with an extension and hopefully talk about this next week. I want to say hi to Weather Report. 40:33 (Speaker A) Feel free to chime in. Sorry you raised your hand open to come up before. So if you have a comment about code interpreter, we've moved past it, but if you have a comment about Claude, feel free to tell uswhat's up with the report. 40:46 (Speaker A) Actually, I had only one thing about code interpreter that in the previous space I talked about that there was a hypothesis I had aboutcode interpreter, which. 40:56 (Speaker B) Is to use it as a huddle because it's recorded. 40:59 (Speaker A) We'll move on and let's talk about code interpreter next time. I think that some folks are saying that their audio is glitching and sothey're not able to and I want to see if I think Joseph has comment about code interpreter. Joseph Polak. We'll give him a second to log in and then I think we'll move on to other updates because we have many other things to talk about. 41:29 (Speaker A) What's up, Joseph? Welcome to stage. 41:31 (Speaker G) Hi there, folks. 41:33 (Speaker A) Thanks for taking my question. I didn't even know all about that codeinterpreter stuff with the file. 41:40 (Speaker G) So I'm really happy to have heard it. About Cloud, though. 41:46 (Speaker A) For Cloud. Well, I'm still on waitlist. First of all, it's free now. You can access it right now. 41:53 (Speaker A) Cloud AI. There's no waitlist anymore unless you live in the States and you'll have to get a VPN. Okay, I'll definitely check that out. 42:03 (Speaker A) My question was about using Cloud and actually code interpreter through API. Do you think that's ever going to exist or if it's coming so clogged API? But I think that's waitlisted. I have talked with Claude folks and they said the waitlist is now going faster. 42:24 (Speaker A) So they are ready to get more people in. I think because of the new safety updates, they're less afraid. So definitely apply for the waitlist on quads account. 42:35 (Speaker A) Code interpreter is not available via API, and we've seen some folks who hack it together with like, I think a browser plugin that proxy something. Sweets I don't know if you remember the unofficial quote unquote code interpreter API and it's how to access this, but it's not available in the official OpenAI APIs as of yet. We haven't seen them. 42:56 (Speaker G) No. For the record, there's no unofficial code interpreter API. There's the browser side thing that we are trying to but nobody's made any. 43:07 (Speaker D) Adapter for it yet. 43:08 (Speaker G) I think you can, if you want, using puppeteer. 43:12 (Speaker A) I would not recommend definitely, if anything, there was some folks that tagged us and I need to go and find this that they're working onlike an open source version of code interpreter that uses laws and stuff. And that one this will likely be the way forward. If you do want something programmatic that has code interpret capabilities, go ahead. NISten. 43:35 (Speaker B) There's also Chatbot UI on GitHub. So yeah, for the other people thatare hacking something together, I'll wait until there is something public before, because then. 43:45 (Speaker D) We don't know everything. 43:47 (Speaker G) Open source is going to be worse. Because you are missing the model. 43:51 (Speaker A) Yeah, because we think that it's fine tuned on actually knowing how to run code. Right. That's kind of the highlight that we get with from the less space. We think it's smarter because of that. 44:01 (Speaker A) And one of the main things again, sorry, going back to code number just real quick, it is able to then fix itself and ask itself, oh, oops, I made a mistake. Let me try again. Matt, I saw you unmute yourself. 44:13 (Speaker A) Feel free to go ahead. 44:16 (Speaker D) Well, yeah, just a quick thing. So from what I know, openi will be offering fine tuning relatively soon. So at that point, you theoretically could go and fine tune your own code interpreter like Model, even if they don't offer it, which is going to you. 44:31 (Speaker A) Can also theoretically not that we would recommend, but theoreticallyright now you could start distilling some stuff from code interpreterby asking it questions. Generate code and store it to a file. Ask it to download and then quote, unquote, generate the data set. But not that you should, but you can theoretically as well, so that when it'stime to fine tune, you have some data set. 44:52 (Speaker D) Yeah, theoretically. I don't know if a shared GBT currently supports those types of conversations, but if it does, I'm sure that's going to happen really soon. 45:00 (Speaker G) I don't think it's maintained because chat GPT itself well, I want tospeak for share GBT. I know, Steven, but I can help you move the conversation back to cloud. 45:11 (Speaker A) Yes, please. Let's move back to cloud. Thank you. 45:14 (Speaker G) So just between the how many people are listening to this chat anyway? I think it's like 60 people. Email support@anthropic.com for the Cloud API. 45:26 (Speaker A) Yes, email them, state your use case and they'll likely get you in and you can use SWIX's menu bar to actually kind of run them in parallel with the megaprom feature. Megapron super prompt, what is itcalled? I think SWIX dropped. There is like one prompt that you type and then it all goes to both to all the models. I want to recognize some folks in the audience. 45:50 (Speaker A) Hey, feel free to regime if you. 45:52 (Speaker D) Want to come up. 45:52 (Speaker A) Obviously, I saw some other Euro I saw in the audience. Max AI. Welcome, Dexter. There's a bunch of folks who are usually here and it's great to see, and I think we're moving on to a very spicy one. 46:06 (Speaker A) What do you guys think about Xai? So I'm pasting the summary of the people. Elon Musk and a bunch of other folks have announced X. AI they're essentially answer to OpenAI. 46:22 (Speaker A) We've all seen Elon kind of talk about safety and talk about helping open Xai and then could not be open since then. He talked about truthGPT at some point. And finally they announced Xai as we were talking.46:37 (Speaker A) By the way, I have an application from Xai which they're going to have spaces tomorrow to go deep into deeper into Xai. But so far there's not a lot of detail. There are some details about the folks who work there. 46:50 (Speaker A) So they have folks who wrote the Adam Optimizer. There are other folks thoughts about Xai before we get to hear what they do. Obviously, there's no product yet. 46:59 (Speaker A) I don't think they've started training. The one thing that I will sayis that they will have premium access to Twitter, obviously, because Twitter is now rebranded.com Xai. After closing down the APIs and closing down the scraping for Twitter, xai will now have a data set that's insane to train on Twitter. 47:21 (Speaker A) And we wish them, quote, unquote, good luck. I would love to hear from folks on stage. What do you think about the announcement, the direction, the people? And we're going to wait for tomorrow to actually hear them talk. 47:24 (Speaker A) I know. NISten, you have some ideas if you want to share to get started. 47:40 (Speaker B) Well, this is more of an old lady babushko opinion that's just talking about stuff. I found it interesting that they went from, whatwas it? Base GPT through street taking on GPT four and this entire competition to doing something more noble like dedicating it to be better at math and discovering new things in physics. So the way I see that, that's pretty noble. But at the same time, I feel like that's a result of having problems hiring in order to be competitive with the other ones. 48:26 (Speaker B) So, yeah, this will be interesting. But the way I see the whole set up right now is, as the kids say, it's pretty mid, in my opinion. 48:39 (Speaker A) As the kids you don't use with that. I will say that we will see tomorrow from their space. They're probably going to use Elon's Cloudto maybe try to hire and it's probably harder now to hire because everybody knows how quick they're getting fired and how much. It's not like super fun to work for X, but we're in for a nice ride because they do have access to the cross pollination from Tesla as well, right? So if they have big questions, tesla does have a few good folks still, even after Andre Capati left, and so they'd be ableto ask them for assistance. 49:20 (Speaker A) There's obviously the whole Dodgy thing in play, which we can I don'tthink we have time to talk about Dodgy, and it's not new, but there could be something there. Gabriel, you wanted to come up? Maybe you have. Yeah, go ahead. 49:33 (Speaker A) Gabriel. 49:34 (Speaker E) Yeah, I was just going to say about Xai, I mean, you mentioned Twitter's data, and I'd be interested in hearing other people on the stage opinion on this because recently there's been a lot of work done on quality of data over quantity of data. And of course, Elon also has a ton of GPUs. Reportedly, he's bought tens of thousands of GPUs. So that's definitely important in building these big models. 49:58 (Speaker E) But I'd be interested in hearing from people on the stage if they think Twitter's data and the kind of data that Twitter has is actually going to be really powerful for training good models. 50:11 (Speaker A) Anybody wants to take this? 50:13 (Speaker F) Yeah, I'll take a little of it. One of the things that Twitter has that other people don't is that people are actually debating issues. So I think that's one of the reasons why he's really focused on the idea of Twitter being a source of truth and being sort of unrestricted so that you're not just following like, one thread, you watch the narratives being debated and he has access to all that. 50:35 (Speaker A) Data and community notes. And it's really hard to scrape. Like, I don't think it's API ball at all. It's not super simple to scrape at all. 50:42 (Speaker A) I want to get yum before I think Matt wanted to unmute and go and then yum. If Matt, you still want to chime in and then yum. 50:53 (Speaker D) Yeah, I mean, nothing too much to add here. I think the community notes are very interesting as a way to sort of like, reduce hallucinations. I think one of the things that they're going to want to do heavily is invest in sort of filtering that data set because there's a lot of great stuff on Twitter. There's a lot of crap on Twitter. 51:07 (Speaker A) A lot of yeah. 51:09 (Speaker D) And the more of that that seeps in, the worse the model is going to perform. Obviously, scale is important, but data quality is incredibly, incredibly important and the scale kind of doesn't negatebad data quality. So I think if they do one thing right, it's going to have to be getting the sort of filtering of the data set down. Butthey do have a ton of incredibly high quality data. 51:27 (Speaker A) Yes, I think Yam was next and then we have a few folks wanted to comein. I think Pharrell wanted to come up. So yam. And then pharrell. 51:34 (Speaker A) And then Gabriel. 51:37 (Speaker C) I just want to say, of course, if you just take Twitter data and start training your model, you can expect it to be average Twitter, which is not what you want. What you can do, which is a gold mine, isto transform this data or just rephrase it as other forms. And this just makes the data a gold mine because Twitter does have very high quality content here and there. Absolutely. 52:05 (Speaker C) If you can, and transform it and rephrase it to a different form if you want an example. So the paper textbooks are all you need. Basically, they just take data and make it into a tutorial, make it into a textbook, like perfect, clean and everything. 52:22 (Speaker C) It is very easy to do, and you don't need a powerful model to do that. You don't need chachi PT. You can use it to do it with a small model. 52:30 (Speaker C) I'm currently doing off the record, I'm currently doing it myself in a large model I'm training. It doesn't it doesn't matter matter anyway. It's a gold mine. 52:43 (Speaker C) What I'm saying, it's a gold mine. 52:45 (Speaker D) About Twitter. 52:46 (Speaker A) An additional thing before I get to Farrell and then gabriel additional thing. NISten I talked about yesterday at length in our late night line cook space. That's not going to be scheduled. If you guys are on, feel free to join that one. 53:00 (Speaker A) Twitter Spaces is also a gold mine. Transcribing Twitter spaces and seeing all the reaction emojis that they have in real time. Like the space that Elon ran with RFK Jr. For example, if you know in the audience who are actual people instead of bots, and you're able to get like emoji reactions in real time, that's a definite, definite, very high signal kind of training set that they have and almost nobody else has. 53:25 (Speaker A) And through how to get Pharrell, you are next, I think. And then gabriel yeah, I wonder what. 53:30 (Speaker D) The relation is and how useful the Twitter data will be for their goal of building a sort of math reasoning machine. Right. Also, do weknow if they're open source, as in truly open source or not? 53:49 (Speaker A) No, we don't know yet. Hopefully tomorrow we'll be able to answer questions. However, we've seen Elon take Twitter's algorithm to open source, and now he's like, boasting this comparatively competitive advantage versus something like Threads. He's saying, like, hey, opensource. 54:07 (Speaker A) If you go to Threads, you're under the Zucks influence algorithm. So there is definitely an attempt to open source from their side, but wedon't know anything about that beyond that. Gabriel. 54:17 (Speaker A) And then Johnny. 54:20 (Speaker C) Yeah. 54:22 (Speaker E) First of all, I think it's funny that Elon's s**t posting is polluting his data set. I would say that. 54:34 (Speaker A) By the way, if there's anybody with the option to detect S**t posting, it's them, right? They're going to be able to build a model.Understand, this is s**t post. This is like somebody who made an effort to give us clean information. But sorry, go ahead. 54:49 (Speaker E) Yeah, that's exactly my point that I was going to make, that Elon wason this crusade before he bought Twitter. And this is kind of why he got forced into buying Twitter, because he was going after the bots and he made a big deal about the bots. And I think they spent a lot of resources on figuring out what's good content and what's bought content. And another thing is that we each are kind of experiencing adifferent Twitter, right? Because we're within whether it's an ML Twitter or Israel based Twitter, and there's many different communities and their Twitter is very good at segmenting those communities and figuring out which content belongs to what community.54:55 (Speaker E) And they'll have the ability, I think, to segment this data and trainmany different models that are good at different things because they're in a literature community or in an ML community or MMA community or whatever. 55:37 (Speaker A) I actually saw a map of like 5 million, 7 million tweets all embeddedin Nomic Xai Atlas. I don't know if you guys follow Nomic, they just recently announced like a 17 million round A, by the way. So kudos toNomic good friends. Andre, the GPT for all team, and they have like an embedded map before the API was shut down that they were able to siphon, et cetera. 56:00 (Speaker A) And Gabriel, what you're saying is actually visible in the embedding map. You can actually see those tweets and then different areas of the political Twitter. There was a journalist Twitter until all of the journalists started living there's like a bunch of different pockets of Twitter that we don't get exposed to, not to mention the different languages. 56:20 (Speaker A) There's a whole Japanese Twitter that's like insane. And people go super, super hard. And translating is easy. 56:26 (Speaker A) We talked about Cloud being able to translate. So they have a bunch of very interesting data. And I think Zuck is also going after that data with Threads. 56:31 (Speaker A) And I think this is the reason why we'll see Threads getting continued work and we'll see a lot of investment from their side. Butto compare to Threads, and we talked about this yesterday, is that Twitter has back history and a lot of historical data that they can train others. Threads is fairly new as well. 56:54 (Speaker A) So definitely a bunch of interesting data sets. Johnny and then Lentil. Hey. 57:00 (Speaker H) So one I think about when I think about the data from Twitter that ispotentially lacking and some of the other data sets is colloquial language. Because what Twitter has that Facebook doesn't have and a lot of other things don't have, especially from what you're talking about, like historic, is the way that people actually interact with each other. You know what I mean? 57:26 (Speaker A) Not only that, how it evolved as well, right throughout exactly. 57:35 (Speaker H) To be honest, I think the data sets from earlier is probably better and stronger because it's just gotten out of hand. But I agree with what I'm not sure it was Yam or who said the filtering because all right, this is black box, it's not open source. Elon has not been shyabout his kind of response to what he perceives as wokism and all of that stuff. I'll be super curious. 57:36 (Speaker H) I mean, there's a big team on this, but I will be super curious to see what that bears out in the actual model. Because, God, there's equal parts or more parts disinformation on Twitter than there is information. So if we're talking about source of truth, that rings some alarm bells for me, for me personally. 58:21 (Speaker H) So those are just my thoughts. 58:29 (Speaker A) Yeah. Thanks, johnny Lentil. Go ahead. And then Gabriel. 58:33 (Speaker A) Let's finish on the Gabriel and then we'll move on to the next topic.58:36 (Speaker H) Cool. 58:37 (Speaker A) Yes. 58:37 (Speaker H) So I think it's going to be hugely bullish for this data. And from the perspective of relating idea space and people and the relations between those, I think that's probably going to be more of a goat information than conversation because you can build so much from that. Like dating this is just one like a dating thing. Or finding people, finding brain power compute, that's going to be huge. 58:40 (Speaker H) And to touch on the open sourceness of the data, I think not open sourcing it at some point is going to be hugely politically bad for Elon to do. 59:23 (Speaker A) That'S. 59:23 (Speaker H) My thoughts on that. 59:24 (Speaker A) Awesome. Thanks, Lance. Gabriel, let's end up and then, Matt, we're going to talk about some interesting stuff. 59:31 (Speaker E) Yeah, just on the kind of data. I think for those of us who ran, like, the early versions of Llama before they got fine tuned in all kinds of ways, and you run it, and especially the smaller models, youput in a prompt and it spits out some generic Facebook type of content. It sounds like a Facebook post of like a 15 year old or something like that. That shows what you get when you use all this kind of unfiltered data. 59:59 (Speaker E) But I think the interesting thing is that Llama was then fine tuned in many different ways and some really powerful models are built on top of it. So I think in some sense, almost any data is valuable in the sort of pretraining stages and maybe you need really high qualityfor the fine tuning, but I think that big volume might be really useful, maybe not the most economical. 60:21 (Speaker A) So I want to wrap up things why they potentially have like a leg up versus not a leg up. We definitely know that Twitter was used to train other models that we currently use. We know this for a fact. This was the reason why Elon and Sam Hoffman, who used to be friends,are no longer friends, sheet posting about them. 60:40 (Speaker A) And the current models we use. Do use this data set, but it's old forthem. It's no longer like recent and relevant. 60:40 (Speaker A) And we know for a fact that Twitter is significantly biased and probably the best place in the world for uncovering news as they happen before the bias sets in, before the narrative sets in, before folks know how to before folks get their marching orders from MSNBC, from the Other Side, how to think about things when not. The Twitter is really good at talking about issues as they arise, the second theyarise. And I think that on its own is going to teach the models a very great deal. 61:16 (Speaker A) Naval Ravican, if you guys follow Namal, he always said Twitter makeshim a better writer. So we definitely know also that tweet in short form condense information better. And if their model trains on that, obviously taking all the precautions we talked about before, bots ands**t, posting, et cetera, if they're able to actually get this into the model, likely their model will be more up to date and more fine tuned like reaction. 61:20 (Speaker A) So with that, I want to close. We'll see about Xai. It's definitely exciting, right? We're potentially getting another big one, potentially open source one. 61:20 (Speaker A) So we'll see. I'm going to wrap up this update and I think the next one I want to move on. Matt, let me know if you're still around if you want to cover. 61:20 (Speaker A) So we have Matt, who introduced himself in the beginning. So I'll letyou do this quickly again because maybe and then we're going to talk about the stuff that GitHub Stars is rising on, which I think is super cool. And I invite you to give us a little bit of an interview about this. 62:16 (Speaker A) Go ahead, Matt. 62:17 (Speaker D) Yeah, sure. So I'll try to summarize it a bit better than the last time. A lot of practice, but very long story short, co founder, CEO of Other Side AI, creator of Hyperwrite, and a number of other things. Basically, we've been around for a number of years now. 62:30 (Speaker D) We're one of the first companies in the space working with LLMs. The goal always has been to build a personal assistant that scales to everybody, just like a real human personal assistant, but at scale, way cheaper, digital. The tech wasn't there at the beginning. So we built other products to sort of learn and gather resources, whether that's users, revenue, bunch of other things that we can do. 62:50 (Speaker D) What we do today. Today we are actually building that personal assistant. So an AI that can operate a computer, any software to do what a human can do on pretty much anything. 62:53 (Speaker D) So it'll help you with your tasks. It's very simple. Today it's a Chrome extension that lets you sort of like control Chrome just by sort of talking to it. 62:53 (Speaker D) So you could say, go order me a pizza, or go send this person an email or go filter my email, or anything else it works okay today. The idea is that over time, it's going to get a lot better, a lot cheaper, a lot faster, to the point where six months from now, a yearfrom now, it might actually be as good as, if not better than a humanon many tasks. But that being said, while I work on this, I also liketo learn about getting the most out of these technologies because they're so fast moving and you really have to stay on top of it to beeffective, or you. 63:34 (Speaker A) Can every week and then stay up to date with us together. But yeah, go ahead. 63:40 (Speaker D) Exactly. I mean, a lot of what I do to learn really, is just build things that I find interesting, and I find that often, even if I'm not expecting it, a lot of those learnings do translate to stuff we're doing at other sides. So this sort of just came out of that. Happy to sort of dive into the project, or if you want to sort. 63:56 (Speaker A) Of stop me and let's pause here for a second and I'll just tell folksthat I pinned Matt's Tweet from a couple of days ago with the introduction. Since then you got a few thousand stars, I think, on GitHub, and we're going to talk about the GPT Prompt Engineer projectand the different reasons why Matt and folks kind of written this andwhat it's here to serve. So maybe give us an introduction to the GPD Prompt Engineer and what kind of made you come up with this and how it works. Yeah, go deep, man. 64:29 (Speaker A) Sure. Yeah. 64:30 (Speaker D) So forget about rambling in advance. Essentially, I find prompt engineering so fun. I've been doing it pretty much every day for everything, honestly, to the point of excess, from what I would do for work to having it decide what I'm making for dinner for years now. And as I've gone through this process, sort of like learning howto use these models, it's become very clear that especially as these models evolve, there's no best practice for anything. 64:54 (Speaker D) Prompts change ways to prompt change. Something that works for one task might not work for a very similar task. And the only way sort ofget out of that is to sort of get an intuition of the model and try alot of things, but that doesn't always work perfectly. 65:01 (Speaker D) And also you don't really know kind of what works and what doesn't. Even when you're trying things right, you have to do it sort of like in a very scientific way, but there's no real right answer to anything. It's kind of like alchemy. 65:18 (Speaker D) So starting to think I think this was right. When GPD Four came out, I was using GPD Four pretty often to just ideate prompts. I would say, here's what I'm trying to do. 65:20 (Speaker D) I would say, write a prompt me, and I would use the ideas from that to help me improve my own prompts and that actually got a lot of interest. We ended up building a sort of thing similar to that into the hyperwrite platform. At the time it was really cool, but really wasn't something that would replace what I do every day, which is really hardcore prompting. 65:43 (Speaker D) Eventually I was just sort of thinking about it, and I think this wason the 4 July, I was just sitting there kind of thinking, what if we tried it? And I started thinking about how could you design a system that actually comes up with good prompts? Not just a prompt that doesthe job, but something that's actually optimal, because as humans, right, we can only try so many things at once. But the magic of theseLLMs is they're creative and they think faster than we do. In the time that I could write half a prompt, LLMs could write 5100. 65:48 (Speaker D) And what if you could leverage that? Because even if the average prompt isn't very good, you're going to luck into one or two that happen to be exceptional for your task. So I started by doing it actually with a classifier. I only released this notebook yesterday just because it's like a step on the road. 65:48 (Speaker D) And what we ended up using it for was actually something at other side where we needed to build a classifier for something with personal assistant. And I just wasn't getting good performance out ofthe prompts that I was writing. So I said f**k it, what if we have the AI try to do this? And I built this so that essentially I describe the task, I give it some test cases, so I'll give it some true false test cases. 66:11 (Speaker D) Because the classifier was classifying things as true or false. It was like classified the statement as true or false. And it was like New York is in America, it would be true. 66:54 (Speaker D) If it was new York is in Paris it would be false. And I basically created like ten or 20 of these test cases. I described the task and I had GPT generate something like, I think 20 or so prompts. 66:57 (Speaker D) And surprisingly, the quality of them just at first glance was prettygood, right? It was kind of shocking considering I spent so much timetrying to do this manually. Then what I did was I just basically had each of these prompts test against each of these test cases. And I plotted sort of the success of each and turns out some of them actually outperformed what I did. 66:57 (Speaker D) I was kind of shocked, right? Like you wouldn't expect that, especially doing this for years. 67:30 (Speaker A) Just to recap real quick on this, the GPT four, I assume that's what you're using generated prompts actually performed better than Match rumors. Prompts and Matchroomr is the founder of a prompt company with a lot of prompt use cases for a long time, from GPT-3 to four, et cetera. And some of the ones that it came up with performed betterthan yours. 67:52 (Speaker D) Yeah, it was kind of scary. Some of them performed way worse. But theidea is that you're going to sort of luck into something that is better. Maybe two out of 20 will be better, but they're great. 68:02 (Speaker D) So I was sort of just so fascinated by this, I was like, how do you take this further? Because classification is one thing, but real prompts where you're actually having it generate text, those are harder. How do you judge that? You could use GPD four to judge them, right? If you have two prompts and you say each of them generate me something and they give you your responses and you want to know whichis better, you can ask GPD four. And so I figured we could apply that. 68:29 (Speaker D) Turns out there's some issues with that and there are some papers written about this where essentially it'll be sort of like more favoring the one that's on the bottom. So just do it twice, flip the order and see if one wins. And I took that approach and I sort of combined it with sort of like an ELO style tournament where essentially you have each of them go head to head, like one on one, and each of them gets their ELO score either bumped up or down based on whether they win, lose or draw. 68:53 (Speaker A) Can you give two sentences on ELO scores as a concept? Yeah. 68:57 (Speaker D) I'm actually not super familiar with it. Funny enough, I had GPC write the code for that part, but basically think of it like a ranking system in a video game. Yeah, think of it like a ranking system in chess or a video game where you have two people competing and the one that wins gets their score increased by x. The one that loses gets their score decreased by x. 69:18 (Speaker D) And it also sort of like weighted based on the previous scores. So ifsomebody that has a high score beats somebody with a very low score, their score won't increase that much because they're very likely going to win. So it's sort of just like a weighting system to help figure out what's the best so instead of just sort of getting a clearcut, yes, this is right, or no, this isn't what you can do with classifiers, because there is a right and a wrong ground truth answer. 69:39 (Speaker D) I just had each prompt sort of generate for a test case and the sort of opposite prompt the competition prompt would generate for that test case. So I was a little bit complex and they would have the model judge which one was better. And it's expensive, right? It mightcost like $20 in GPT calls to get to an answer, but turns out at the end, the prompts again were just kind of blowing me away. 70:04 (Speaker D) Awesome creativity in them. Like the words it used, the trigger words, it didn't do what I would do. And in a really good way. 70:10 (Speaker D) And it also opened up my eyes to sort of like new ways of prompting that I never would have thought of and just sort of like aren't standard. And that's kind of the magic of all this. I think that thissort of abstracts away the sort of atomic level of prompts, right? You talk about prompts as sort of a prompt in and of itself and then a system built around the prompts with many prompts kind of working together. 70:31 (Speaker D) This makes it so that you don't have to guess about, do I have the best prompts for this single atomic part of our system? Where the magic really comes in then, is how do you string these amazing individually crafted by AI prompts together to make something that actually works really well. 70:46 (Speaker A) And how you robustly build the evaluation system, right? Because the classifier is a simple example of evaluating, because maybe you know this, et cetera, but how do you actually scale up the evaluation system such that this could potentially run in loops and then generate the best of the best prompts for a task? 71:03 (Speaker D) Exactly. 71:03 (Speaker A) That's also like a very interesting piece. How do you think about evaluation going forward? 71:08 (Speaker D) Yeah, so I think it's sort of like that, where you could have this thing run in the loop three times and take the three winners and thenhave GPT read those winners right, and be like, here are prompts thatworked really, really well. Here are the test cases where they failed. Now I want you to write new prompts that take what's good about these but also mitigate the failure cases and generate a whole new set of prompts. Sort of like evolution really doesn't just have to stop at one point in time after the first run. 71:37 (Speaker D) It's like, let's learn from what these amazing ones still did wrong and continue to make this better and better and better. Obviously, this relies on a relatively large test set. I'm also experimenting with ways where you can have the test set autogenerate, but that's a little bit finicky. 71:50 (Speaker D) But I do think that sort of like evolution of this could lead to somereally exceptional prompts. But what I found was even on the first run I was seeing it outperform myself. For example, there was a classifier we were using GPT four with logic bias to do because it was such a hard challenge and we were getting some like 90% accuracy.71:50 (Speaker D) I had it do these prompts with GPT four, but then I had it run them using GPT 3.5 and it got 96%. 72:19 (Speaker A) We've talked about this pattern before where you can outsource kind of the hard work to GPD four, but then once you get really good at prompting, GPD 3.5 is actually very decent in many things and it's way faster, cheaper, and has a 16K context now that you can use. And so we've seen this pattern with many folks that if you don't need thefull power of the GPT four, human evil for coding, et cetera. You cango far into GPT 3. 5 and get very far along, especially as you're getting better prompts. And now, Matt, you have like a recursive crafter helper guy that's here. And my next question for you is, have you used anything else? So you mentioned GPD 3. 5 where you run the prompts. Have you tried them on different models,like Cloud maybe, or the open source llama ones? 73:07 (Speaker D) I actually haven't just because I wanted to see if this worked. It was sort of just an interesting thing for me and my time is really focused on other side and personal assistant, but it wouldn't be hardto get Claude in. I suspect Claude prompts would perform better on Claude. Open ad prompts would perform better on Open xai just becausethe models give the prompt them very differently. 73:18 (Speaker D) Claude is sort of like a more emotional thinker. Open xai is more of like a logical thinker. It's a very sort of simple, not perfect analogy, but I suspect you'd want to sort of like stick within the. 73:36 (Speaker A) Ecosystems, maybe, not to mention inflections pie, which is like a whole different beast. 73:41 (Speaker D) Yeah, that's an interesting one. 73:44 (Speaker A) We discussed by a couple of times and I've seen some reactions, but Idon't think maybe at the end of this, if we have time, matt, one question I will have for you on this and I think we'll move on. Is that where folks can find more work of this? Is it open source? What are you looking for contributions? If you are. And yeah, just give usa wrap up of this project. 74:07 (Speaker D) Yeah, so you can find it on GitHub. It's called GPT prompt engineer Currently there are two notebooks. It's all done in Jupiter notebook format, so it's pretty easy to edit. One is for the classification system, the other is for the generation system. 74:20 (Speaker D) We're honestly sort of like at a point where it works well, so it's like, what do you build around it? One thing that's missing is the classification version only supports true and false labels, but it's not hard to use TikTok into or TikTok and whatever it is to allow it to support arbitrary labels like happy, sad, angry, whatever. That's probably like a 20 minutes ad that if somebody goes in and does that opens up a whole new set of use cases. The evolution idea that I mentioned before, right? Taking the best prompts and then saying, here's where it went wrong on these test cases, and then throwing it back to GPT and having it generate more and rerunning it, that's interesting. 74:45 (Speaker D) The ability to use Claude would be awesome if anybody wants to add that. I could even see it evaluating each prompt on each model, right? Because right now we only generate with GPD four. We only evaluate with GPT 3. 75:19 (Speaker D) 5. But imagine if you generate with GPD four half of them, you generate half of them with Claude and then you evaluate each prompt on GPT four, GPT 3.5 and Claude. 75:27 (Speaker D) And you can see sort of the latency success rates for each along withscores. I think all that would be super interesting. Also sort of like just open to ideas. 75:40 (Speaker D) I'm not really sort of supporting this at all. So if anybody wants tokind of take it and run with it, I am all for that. Also sort of justlike a shameless plug right now or thing that we're looking for just because I have an audience here. We are at other side in hyperwrite, really looking for somebody to help on back end hopefully with a security set of expertise. And thenalso if anybody is experienced in training machine learning models, Iwould love some help there because we're doing a lot of LLM training.75:55 (Speaker A) So just quick thing and also to add that now with the Prompt Engineerthat's automated, the results of this would likely generate like a great data set that you can add and continue fine tuning, especially as GPT four fine tuning is coming soon. So Matt, definitely store everything you generate with the yellow score and everything and froma GPT prompt engineer that runs and doesn't know about the rest run, maybe there's going to be a path forward to actually fine tuning a prompting model, which could be exactly. Well, yeah, exactly. 76:28 (Speaker D) Imagine taking a prompt and taking one that has a slightly higher score and fine tuning a model to take the initial prompt and then sort of output the one that has a higher score and you can do that evolutionarily continue to get better prompts in theory. 76:40 (Speaker A) Awesome. So folks, if you want to work in a cool place, I can write, hit met up and also check out GPD Prompt Engineer on GitHub. Thanks for coming. Feel free to stay and kind of continue commenting and talking with us as we go through a bunch of other updates that we have. 76:57 (Speaker A) Just a quick check with NISten who promised me to follow Twitter and see if anything new comes up. Breaking news as we talk. I haven't seen anything besides the space of Xai. 77:04 (Speaker A) I will ask people's attention to the last pin tweet from Dr. Jim Fan that talks about the context length dip. Matt, you also touched on this context length dip. It's basically a paper, I think. 77:22 (Speaker A) Stanford I'm not sure that figured out. That even longer. Context windows, they have a dip in the middle, which means that at the beginning of the prompt at the end of the prompt, the model has more attention to what you actually asked it to or the details that you provide in the middle there's like a dip. 77:39 (Speaker A) And this was also released this week. However, the one thing I said previously I will repeat here claude and some folks who know about contact windows way more than me. They say the Claude is actually really good at this without the dip. 77:54 (Speaker D) Yeah, I feel like that's saying. It's an interesting paper. I feel like it's sort of saying like, hey, if you train on marketing copy, then it's going to be worse at coding, obviously. Right. 78:03 (Speaker D) We do a lot of long context stuff at other side. That's actually whatI'm focused on right now, training really long context massive models. And if you train it on data where there's context in the middle that matters, it is going to be good at that. 78:16 (Speaker A) Interesting. So what you're saying, I think I've seen this kind of opinion before as well. It's just the outcome of the data that was fed in and for blog posts and other places, people want to hook your attention in the beginning and then kind of finish strong. Basically you're saying that this is potentially an outcome of that and not necessarily the tech behind it. 78:38 (Speaker D) Yeah, I believe so. I mean, who knows, maybe wrong, but from my experience, right, why I was given that analogy before is like if youtrain it up to do one thing and then you're asking it to do another, it's not going to do that other thing as well. And I'm guessing the data set that they sort of did this evaluation on was something that didn't have a ton of information at all. Part of the reason that so few of the language model companies have super long context length models and why it was such a big deal that Anthropic did is because alot of the challenge in training them isn't actually in training them, it's in the data. 79:08 (Speaker D) Obviously, inference becomes a challenge. It's the cost and the overhead there. But the data to sort of do this is really sparse. 79:10 (Speaker D) It's not very available. Right. So that's I think part of it right there's not just like a sort of standard data set that has super longcontext link, that has information in the middle. 79:25 (Speaker D) We do actually we've been building one another side and that's sort of given me some of the ideas that I'm sort of spouting here. But my guess is that Anthropic part of the reason theirs works is because they focused on the data. The data is really important. 79:38 (Speaker A) Right. 79:39 (Speaker D) I will say model, it's just fine tuning. 79:41 (Speaker A) Yeah. I will say when I got access to Clouds Window, I did like a bunch of tests with my Twitter data. I just pasted like a bunch of JSON with Twitter numbers, twitter IDs numbers. And the smaller model, the not 100K, gave me back results that actually didn't inventthose numbers. 79:57 (Speaker A) The 100K model lost in the middle and started inventing those numbers. I literally saw this difference between the longer complex one and the previous one and I thought it's because of like it loses some complex in the middle. And I need to retry this on the new ones because the new ones, they claim this doesn't happen with that. 80:01 (Speaker A) I want to go to Al and yeah, one of you I think raise your hand firstto talk about the context length dip and that paper if you have read this, if you have thoughts and if you have noticed this as well. 80:29 (Speaker F) I just had a quick question for Matt about the differences that he found in prompting between say, Claude and GPT Four. I noticed like, the prompts aren't really reusable and maybe you could speak to that in the general case. 80:42 (Speaker A) Yeah, let's end with maybe this question and move on to other updatesas we have. Go ahead, Matt. 80:48 (Speaker D) Yeah, sure. So it's like talking to two people with two different personalities, right? They're both people, but they respond differently to different ways. You're sort of prompting them, if you will. Claude is sort of like more emotional, I guess, where open xai is sort of more logical. 81:03 (Speaker D) And it's hard to sort of pin that down to any one thing, and it's hard to give you sort of like techniques based on that because, again, every use case is very different, but it's very clearly it's aprompt them differently. I think also talking about the idea of fine tuning a prompting model will be very interesting is fine tuning a model that takes an Open Xai prompt and converts it to the idealized version of a Claude prompt and vice versa. I mean, I think that couldbe very powerful because there are ways to sort of intuit your way there. 81:29 (Speaker D) It's just hard to sort of distill into a set of rules. One thing I found actually quite interestingly with Quad two is that it is insanely resistant to sort of like jailbreak attacks. So I was able to get it to do it. 81:44 (Speaker D) Turns out the stupidest method worked. It was sort of like modifying that dan prop that's been going around like reddit but the more nuanced sort of like complex methods that typically work with OpenAI they didn't. So I think the model is just qualitatively different. 81:56 (Speaker D) I think it's going to take some time to fully explore it and understand why and how still super early days. 82:06 (Speaker A) I love the fact that all of us are getting an intuition about different models and how to approach them right. And that's like Sweet was here before. This is like a specialization of what I think he talked about as an AI engineer. We're getting to start to understand the differences between those to the little fine little things that you can say. 82:11 (Speaker A) And I think it will be very interesting if you have a model that's trained to actually convert them or translate them between the modelsto work the same. I have an idea where not to get locked into the GPDFour ecosystem with the functions. I have an idea of wrapping the GPDFour API package with something. 82:47 (Speaker A) They will actually kind of print the functions into the context because cloud now has a huge context window. And then try to see whether or not cloud is able to kind of without additional tech, without additional changes to the API to replicate the outputs of howa GPT with functions would do. And that's going to be an idea I'll betesting, hopefully, and talk about next week. 83:08 (Speaker A) Thanks, Matt. 83:10 (Speaker C) Today, there has been a thing today, maybe yesterday, but anyway, today there have been a model that generates prompts. By the way, by giving the data, you generate the prompt. I've written about it todayon Twitter. It is so powerful, it is such a cool method that you can take whatever you have, like, I don't know, scientific papers and generate instructions for them. 83:32 (Speaker C) Now you can fine tune a model that generate scientific papers. You got jokes. Now you can train a model that become funny. 83:35 (Speaker C) You can generate the instruction, convert whatever you want into instructions. Amazing it is today. One more thing about the deep in the middle thing. 83:51 (Speaker C) I don't know why it happens. I have no idea how Open Xai trained their models. But I think if you think about it, many missions, many instructions, paragraph, and before the paragraph, you tell the model, please summarize the following, or on the contrary, like a paragraph and at the end, what was that? Something. 84:10 (Speaker C) So it makes a lot of sense that a model pays a lot of attention to the beginning at the end, because of this. And on the same note, it'svery easy to fix. So I wouldn't just point fingers. 84:21 (Speaker C) It's good that they pointed it, but I think it's like, I don't know, a couple of minutes of training, open AI, like, fine tune for a minute and fix it. 84:28 (Speaker A) I just want to ask yum, yum. The the pin that I just tweet sorry, theTweet that I just pinned on top, this was the one that you talked about, the instructions generation and the problem generation. 84:38 (Speaker C) Yeah. 84:39 (Speaker A) Awesome. So folks, definitely feel free to check this out. I haven't seen this. You want to give a couple more words about that one. 84:44 (Speaker A) It looks like you wrote, like, a very deep dive. What's the model like eleven B, three B? 84:54 (Speaker C) Sure. Two models put into the models, whatever you want. Okay, let's go back. You got a data set of something, emails from your company, for example, and you want a model that will help you write emails. 85:01 (Speaker C) Okay, you can start thinking about how to train this model, or you can use this and now generate a text that basically says, help me write the following email to this following person of something something and the actual email. And all of a sudden, you have a modelthat is extremely you have a data set to train a model or to fuselageor whatever that is extremely tuned to this. So I think it's a very cool technique. 85:40 (Speaker C) It's very powerful, has a lot of potential. And the trick, in simple words, is training the model. What not to say? That's the missing piece here, that they added the trick. 85:51 (Speaker C) They took instructions and outputs that do not fit just a different random output from the data and train with a different laws. That themodel should not say this because this input does not with that instruction, does not result in this output. That's it. 86:11 (Speaker C) That's the trick. And it works perfectly and really cool. 86:17 (Speaker A) Awesome. I have some folks who want to come up and ask questions. I think we're almost there in terms of the updates. I will just brieflyrun to some updates. 86:18 (Speaker A) I don't even have time to go and look for the threads, but if you're not following Rama CPP, follow gerga is one of the groups that we have in the States. I think he single handedly is in charge of so many folks trying to get a MacBook, because it's incredible how much performance they've been able to squeeze out of Llama. And it's comparatives. 86:49 (Speaker A) And many people just, like, quantize their models, basically make them smaller to run on this GGML platform that they have. The recent news that I have from over there, there's like two pieces of news. Last week, for those of us who were here last week, we talked about CFG. 86:58 (Speaker A) I forgot something. I forgot the guidance scale. And we talked about the CFG parameter moving from diffusion models that we know. 87:17 (Speaker A) Like, in stable diffusion, you can define how close to your prompt should the model generate the image. Somebody decided, I think, an illusion reaction. Somebody said, hey, can we have this control of CFG to our LLM generation? CFG is a classifying guidance scale, something like that. 87:37 (Speaker A) And they did it. The Chad GGR added this to Llama CPP. And so now youcan actually kind of pass a CFG control and fine tune. 87:48 (Speaker A) It's almost like a running fine tune to an extent. You can test the model to be closer, farther away from the problem that you have. Contrasting this with the stuff that we have on a GPD, four API, which is temperature. 88:01 (Speaker A) And I think, Matt, you mentioned something to logic bias, logged bias, something like that, right? Where you can ask it not to say certain things. So contrasting CFG, it's like a different beast that we now have a different control. And so GGML just merged into their platform. 88:18 (Speaker A) Definitely worth checking out. And the second thing is, I need to find the Tweet. Yesterday, Georgia was like, oh, yeah, by the way, here's the 48% inference speed improval that somebody just merged in.88:30 (Speaker A) Have you guys play and try this. For the 33 billion parameter model of Llama, somebody just merged in a 50% increase on inference speed just on the way. And I find this incredible because Gmail already runs many stuff on Raspberry Pi or whatever, iPhones, and now somebody's like, oh, yeah, here's a 50% increase in infinite speed. 88:41 (Speaker A) And then I think Nissan was here before he was talking about GGML runs on the iPhone, because iPhones, even from three years ago, have the same neuron chip that like the latest Max or some such, and that this performance boost on GGML also applies to iPhones as well. So, incredible stuff. And as we hear every week, we keep seeing leaps, incredible leaps in speed and performance. 89:15 (Speaker A) Definitely worth checking out GGML and the five folks that work on those stuff. GML comments, folks who use Llama, CCP, feel free to hopup and raise your hand and give us more updates from that length. I denied it. 89:28 (Speaker A) You are gay at the spaces, but sometimes as a guest as well. Other than that, I think we'll move on to some more updates and then we just have questions. No? Cool. 89:41 (Speaker A) So the next update that I have is from the diffusion side that we sometimes cover. We don't cover it often, but we do cover it from sometimes time to time. So two things from stability stable diffusion. 89:46 (Speaker A) We talked about Sdxl, the new Excel model that can generate 1024 images. We've talked about last week about the 0.9 weights dropping. 90:01 (Speaker A) Sdxl 1.0 is now available in the Stable Diffusion discord. If you've played with Me Journey before and you looked at Stable Diffusion, it's like, it's not that great. 90:05 (Speaker A) Stable diffusion sdxl one is really impressive. And besides being really impressive, they plan to release this open source. So we're going to see a bunch of folks fine tune loras and specific versions of the specific things. 90:16 (Speaker A) And I think it's like, incredible. If you want to play with those models and you haven't yet, go to Stable Diffusion discord and hit upthat bot and then Netflix let us know how incredibly different that is. And we're waiting for the wait for the Sdxl 1. 90:47 (Speaker A) 0 to drop. And I will mention this every day until the year mark. It's been less than a year since table Diffusion. 90:57 (Speaker A) It's been less than a year. I remember I think it was August 22 when they actually dropped the full open source model. Less than a year. 91:12 (Speaker A) And we've seen just such incredible progress. So, like Matt said before, it's really hard to keep up, but it's also really hard to internalize how far, just how far we're coming with those incredible leaps and changes every week. And again, to just plug in this Thursday I space. 91:21 (Speaker A) This is why we're here. Every thursdai talking about everything and everything that's changed and updated. And the other thing that I want to I see art in the audience with apart. 91:28 (Speaker A) If you play the list, the Excel, feel free to raise your hand to comeup. The other thing that they released, I don't know if you guys familiar with Clip Drop. So Stable Diffusion bought Clip Drop as a company and started implementing that interface compared to their Dream Studio interface. 91:49 (Speaker A) So ClipDrop is like a way simpler interface day to day release, something called Stable Doodle. Stable Doodle is I don't know if folks in the audience remember this. Meme how to draw an owl. 91:51 (Speaker A) Step one, draw a circle. Step two, draw some eyes. And step three is like, draw the rest of the f*****g owl. 92:06 (Speaker A) And then you have, like, a beautiful owl painting at the end of this.This is now the go to test on how the Doodle models work. And I pinned my attempt at this, but definitely check out ClipDrop Doodle thing. It's really fun to play with. So those are, like, the updates from the diffusion world. 92:10 (Speaker D) Hey, real quick. I was just looking at the repository for Comfy UI, and then I saw that I don't know how to say his name. Scousekip is inhere. So I just wanted to come on and say, like, hey, this is incredible. 92:24 (Speaker D) This is what we've been talking about for months now, right? This node based character codex, if you will, of like there's just infinite possibilities. I just want to listen, but thanks. 92:35 (Speaker A) For bringing me up. 92:36 (Speaker D) This is really cool, man. I was just thanks for bringing up Comfy UI.92:42 (Speaker A) I feel guilt at not being up to date on every single possible thing. I know it's impossible. I really try, and Comfy I has been on my listto try, but then Quad was released and Code Interpreter was released.Comfy I seems like the thing we want, man. 92:42 (Speaker A) I think stabilization when they tried to bring up Dream Studio, they talked about, like, a node based thing where you can pipe models to other models, you can find filters, et cetera. Comfy UI for folks whohave tested it out, it looks like that's it. And I definitely want toagree with Art. 93:16 (Speaker A) It's something to watch out and maybe try because automatic one on one, even though it's, like, super advanced and has been there for a beginning since Stable Diffusion, it's just like a s**t show of a UX.Just like horrible, horrible. I'm sorry, guys. 93:30 (Speaker A) I've built a web UI before automatic. It's really hard to get Gradio to play as much as you want. It's really hard to maintain a good UX product with many, many people contributing, with many, many things are changing under your feet. 93:45 (Speaker A) So it's really not their fault, but it's a s**t show to get started with. And Comfy UI seems like a fresh, clean start. So definitely if you're playing with this, test this out and let us know. 93:55 (Speaker A) Max, you have your hand raised and you play with the Excel. Give us some of your thoughts. 94:01 (Speaker I) Yeah, I have played through the website in a studio, so I'm lately working with a company that make toys for kids. They want to start incorporating AI. And one of my concerns we're working with them is like, okay, we want to generate images for kids. Something that is going to probably freak them out is two things that diffusion models have been lacking. 94:27 (Speaker I) One is the ability of painting things like complicated shapes or intricate shapes like hands. SD. Excel is not better at it. 94:40 (Speaker I) Another one is this concept of what is named like concept bleeding, which is this diffusion model tends to mix objects that are similar in shape or form is not good at it, neither. Now, I was reading the paper from Stability or the report. They claim they are outperformingMid Journey in five of seven categories now, mid Journey 5. 1, right? 95:12 (Speaker A) Just to make sure. Mid Journey since then released the new version also because we're in same pace, but yeah, they've compared to Mid Journey 5.1. Yeah. 95:20 (Speaker I) Well, now this is a report internal released by Stability. It's a paper, it might have some credibility, I don't know. I like the results. It's very close to me journey, but I think there is still one or two steps behind, in my opinion. 95:36 (Speaker I) What is different is what you have mentioned, Alex. Once they releasethe weight and we can see Lotus about this, I'm expecting to see the results that we can get because probably that is what is going to position this model like a step above Mid Journey, but not yet. This is my opinion. 95:58 (Speaker A) Yeah, definitely. And thanks for that. And I love folks coming up andsharing their opinion about these things. I will say on the top. 96:05 (Speaker A) Thanks Mike. Or I guess I know you're a new name, but I'm not sure ifI can if I should. 96:10 (Speaker I) Yeah, totally, totally have it, in my view. I'm Juan Spanish, living in Mexico and I like these things. 96:17 (Speaker A) We appreciate you coming up here on the topic of UIs that we've mentioned with somebody or somebody folks released Pinocchio. They call this the AI browser. And I want to highlight this because I wantto give you practical tips. Janae, I think, is coming in with some breaking news. 96:28 (Speaker A) I don't know if Janae wants to come up or can, but if you can, feel free to come up and tell us there's some news from Bard. Until we talk about Bard, the topic of UIs for those things, and you guys knowwe're mostly focused on the LLM side and the Engineer side. Less thanthere's a fusion, but we sometimes have love for both the above tool that you can download and not deal with the terminal, not deal with the bunch of stuff, unifies all of them. 97:08 (Speaker A) It's really nice. Check out the Nokio AI browser. I think it's open source. 97:12 (Speaker A) You download this once, it's cross platform, Mac, PC, et cetera, and then you're able to download Llama CPP, and then you're able to also download table diffusion. And then fairly quickly, without knowing how to code, without going through the terminal, without installing packages, folks here know that installing the packages is like a whole pain we all share and we all hate without doing all of that. That's the promise that they have, you are able to pipe Llama outputsinto stable diffusion. 97:38 (Speaker A) So Yam previously mentioned kind of the model that can do, and Yam and Method are talking about a method of generating prompts for LLMs,but also we know that there's models prompts to actually generate prompts for diffusions and they're trained on different and fine tuned on different ways to generate diffusion prompts. Right, and this Pinocchio browser is actually allowing you to run like an and then pipe the output into stabilization model and then see the outputof that. I think it's incredible that this exists and is downloadable. 98:07 (Speaker A) I haven't tried this yet. If you in the audience or somebody on stagehave tried Pinocchio, please raise your hand. I want to bring you up and talk about Pinocchio and your experience with this. 98:19 (Speaker A) And if we haven't, I want to bring this to our attention so that nextweek we're able to talk about this. This is added to my list of things I like. The Comfy UI that I haven't tried it yet. 98:29 (Speaker A) Anybody use pinocchio yet? No? Cool. I wanted to get Cocktail Peanut.The guy who wrote Cocktail Peanut. 98:36 (Speaker A) If you're in the audience, feel free to raise your hand. I don't think you are, but feel free to follow the thread. He goes fairly deep. 98:44 (Speaker A) And feel free to use and try Pinocchio by next week and then come up next week and talk about the differences between this and running automatic one one. All right, folks, thanks everyone for coming to another Thursday. I space. 98:58 (Speaker A) Hope this has been helpful for a bunch of you. We tried a few new things here. We tried to give updates, but also deep dive into a conversation with Matt and looks from the reactions here that maybe this is worth putting down on paper and sending out an email for those of you who want to maybe sign up for this and not don't have the time to listen to two hour spaces, so I'll definitely try at least to do that. 99:19 (Speaker A) I want to thank a few folks on stage that have joined consistently and providing a lot of signal yum follow Yam. He has great insights into models and training and different things al in the audience. Thanks always for coming up. 99:33 (Speaker A) Junaid is running the Denver meetup, and if you're in the Denver area, feel free to join us next week. Thanks for coming. Haven't seenyou in a while, buddy. 99:45 (Speaker A) Juan sorry. Yeah, I think Juan great. Maxi and Lentos has recently been joining us. 99:51 (Speaker A) It's been great. We have some more folks in the Evans who are regulars, and we invite you to also be regulars and come up and talk about Thursday. I will say this one thing, tag me in anything that's new. 100:01 (Speaker A) I would love that. And help promote the message for other folks. If you did like the space, this also really helps for more folks to get to the bottom of this for those folks. 100:01 (Speaker A) I didn't get to their questions. I apologize. I'm trying to keep thisas a balance of a high signal thing versus letting everybody questions as well. 100:22 (Speaker A) Last thing I'll say is about myself, a little bit consultant. I stay up to date so you don't have to. That's my tagline. 100:29 (Speaker A) If you're in the company and needs consultancy for somebody who's up to date on everything, I try to be that guy. Feel free to tap me in the DMs. And, yeah, thursdai folks, keep tagging us everything that'snew. We're going to try to cover next week with that. 100:34 (Speaker A) I thank all of you. Thanks for coming. Thanks for giving us two and ahalf hours of your attention. 100:34 (Speaker A) I really appreciate it. Attention is sparse and very important, and Ireally thank everybody who gave us, like, two and a half hours. Thankyou, folks. 101:00 (Speaker A) Hey, Alex, we really appreciate you. 101:04 (Speaker B) Thanks, Alex. 101:05 (Speaker H) Thanks for doing a good space and keeping us on track, actually. 101:09 (Speaker A) Yeah, thank you. 101:10 (Speaker D) Yeah, alex definitely want to kind of. 101:13 (Speaker A) Give our thanks to you as well. 101:15 (Speaker E) For curating an awesome space. 101:17 (Speaker D) I think I'm definitely not the only one that gets a lot of good signal out of this. And I know a lot of hard work goes into keeping yourself up to. 101:27 (Speaker A) Date so that you can share it. 101:28 (Speaker E) With all of us. 101:29 (Speaker D) So just on my own behalf, thank you. And I'm sure that is echoed by. 101:34 (Speaker E) A lot of people on stage and in the audience. 101:36 (Speaker A) Humble man thank you. I appreciate you. Thank you, folks. Have a niceThursday and bye next week. This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode