ThursdAI - The top AI news from the past week cover image

ThursdAI - The top AI news from the past week

Latest episodes

undefined
Oct 5, 2023 • 1h 28min

📅 ThursdAI Oct 4 - AI wearables, Mistral fine-tunes, AI browsers and more AI news from last week

The podcast highlights Google adding Bard to Google Assistant, the launch of Reka AI's multi-modal Yasa-1, the integration of AI in browsers with Arc Max, and Mistral's Open Orca 7B. They also discuss voice-based AI assistants, AI voice cloning, and the importance of local LLEM. Furthermore, they explore the advantages of using browsers as a platform for developers and the potential of AI assistants in the real world.
undefined
Sep 29, 2023 • 1h 41min

📅🔥ThursdAI Sep 28 - GPT4 sees, speaks and surfs, Cloudflare AI on GPUs,Mistral 7B, Spotify Translates, Meta AI everywhere, Qwen14B & more AI news from this INSANE week

GPT4 from OpenAI can see, speak, and listen. Apple rumors and on-device inference are discussed, as well as OpenAI Voice Cloning Tech used in Spotify. Meta AI announces integration of EMU image model into AI agents. Cloudflare AI partnership with HuggingFace and new Vectorize DB. Mistral 7B model from MistralAI and the release of the Quinn 14B model. Discussions about treating digital property as physical property and challenges in adopting new LLM models.
undefined
Sep 22, 2023 • 1h 9min

📆 ThursdAI Sep 21 - OpenAI 🖼️ DALL-E 3, 3.5 Instruct & Gobi, Windows Copilot, Bard Extensions, WebGPU, ChainOfDensity, RemeberAll

This podcast covers the latest AI updates, including OpenAI's DALL-E 3 art model, Windows Co-Pilot by Microsoft, and Bard extensions from Google. They also discuss the significance of staying up to date with AI developments, the disappointment with Google's AI-powered extensions, and controversial opinions on compression papers. Additionally, they talk about building and running a GgML model with WebGPU and their experience at Jeffrey Hinton's AI lab.
undefined
Sep 17, 2023 • 55min

📅 ThursdAI - Special interview with Killian Lukas, Author of Open Interpreter (23K Github stars for the first week) 🔥

Killian Lukas, Creator of Open Interpreter, discusses his open source project that lets you run code via AI models like GPT-4 or local models like Llama on your own machine. They explore the capabilities and use cases of Open Interpreter, including web-based tools, multi-modal models, and imagination unlock. The podcast also highlights the significance of community support and the future of language model programming.
undefined
Sep 15, 2023 • 1h 32min

🔥 ThursdAI Sep 14 - Phi 1.5, Open XTTS 🗣️, Baichuan2 13B, Stable Audio 🎶, Nougat OCR and a personal life update from Alex

This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsHey, welcome to yet another ThursdAI 🫡 This episode is special for several reasons, one of which, I shared a personal life update (got to listen to the episode to hear 😉) but also, this is the first time I took the mountainous challenge of fixing, editing and “video-fying” (is that a word?) our whole live recording! All 3 hours of it, were condensed, sliced, sound improved (x audio quality is really dogshit) and uploaded for your convenience. Please let me know what you think! Premium folks get access to the full podcast in audiogram format, and a full transcription with timestamps and speakers, here’s a sneak preview of how that looks, why not subscribe? 😮TL;DR of all topics covered* Open Source LLM* Microsoft Phi 1.5 - a tiny model that beats other 7B models (with a twist?) (Paper, Model)* Baichuan 7B / 13B - a bilingual (cn/en) model with highly crafted approach to training (Paper, Github) * Big Co LLMs + API updates* Nothing major this week* Voice & Audio* Stable Audio 🎶 - A new music generation model from Stability AI. (Website)* Coqui XTTS - an open source multilingual text to speech for training and generating a cloned voice (Github, HuggingFace)* AI Art & Diffusion* Würstchen v2 - A new super quick 1024 diffusion model (Announcement, Demo, Github)* DiffBIR - Towards Blind Image Restoration with Generative Diffusion Prior (Annoucement, Demo, Github)* Tools* Nougat from Meta - open-source OCR model that accurately scans books with heavy math/scientific notations (Announcement, Github, Paper)* GPT4All Vulkan from Nomic - Run LLMs on ANY consumer GPUs, not just NVIDIA (Announcement)* Nisten’s AI ISO disk - Announcement And here are timestamps and chapter/discussion topics for your convenience: [00:05:56] Phi 1.5 - 1.3B parameter model that closely matches Falcon & LLaMa 7B[00:09:08] Potential Data Contamination with Phi 1.5[00:10:11] Data Contamination unconfirmed[00:12:59] Tiny models are all the rage lately[00:16:23] Synthetic Dataset for Phi[00:18:37] Are we going to run out of training data?[00:20:31] Breaking News - Nougat - OCR from Meta[00:23:12] Nisten - AI ISO disk[00:29:08] Baichuan 7B - an immaculate Chinese model[00:36:16] Unique Loss Terms[00:38:37] Baichuan ByLingual and MultiLingual dataset[00:39:30] Finetunes of Baichuan[00:42:28] Philosophical questions in the dataset[00:45:21] Let's think step by step[00:48:17] Is breath related text in the original dataset?[00:50:27] Counterintuitive prompting for models with no breath[00:55:36] Idea spaces[00:59:59] Alex - Life update about ThursdAI[01:04:30] Stable Audio from Stability AI[01:17:23] GPT4ALL Vulkan[01:19:37] Coqui.ai releases XTTS - an open source TTS - interview With Josh Meyer[01:30:40] SummaryHere’s a full video of the pod, and a full transcription, and as always, 🧡 thank you for bring a paid subscriber, this really gives me the energy to keep going, get better guests, release dope podcast content, and have 3 hours spaces and then spend 7 hours editing 🔥
undefined
Sep 10, 2023 • 54min

🔥🎙️ ThursdAI Sunday special - Extending LLaMa to 128K context window (2 orders of magnitude) with YaRN [Interview with authors]

This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsHappy Sunday everyone, I am very excited to bring you this interview with the folks who took LLaMa 2 and made it LLoooooongMa!Extending LLaMa 2 context window from 4,000 to a whopping 128,000 tokens (Yarn-Llama-2-13b-128k on Hugging Face), these guys also came up with a paper called YaRN (Efficient Context Window Extension of Large Language Models) and showed that YaRN is not only requires 10x less tokens to create these long contexts, but also 2.5x less training steps! And, the models generalize so there’s now no need to collect extremely long sequences (think books length sequences) for the models to understand those context lengths. I have decided also to do something different (which took me half of Sunday so I can’t promise and am not committing to this format, but for the premium subscribers, you can now watch this interview with running Karaoke style subtitles and improved audio! This will be uploaded to Youtube in a week but aren’t you glad you subscribed and is getting this first?) Here’s a teaser preview: And here’s the chapter for your convenience (the only thing that’s ai generated 😂)0:00 - Introduction3:08 - Discussion of extending LLAMA2's context length from 4,000 tokens to 128,000 tokens using the YaRN method8:23 - Explanation of rope scaling for positional encodings in transformers13:21 - How the rope scaling idea allows for longer context through positional interpolation18:51 - Using in-context learning to train models on shorter sequences but still handle long contexts25:18 - Sourcing long-form data like books to train 128k token models31:21 - Whether future models will natively support longer contexts37:33 - New model from Adept with 16k context using rope scaling42:46 - Attention is quadratic - need better algorithms to make long context usable49:39 - Open source community pushing state of the art alongside big labs52:34 - Closing thoughtsAs always, full (manually edited) transcription (and this time a special video version!) is reserved for the premium subscribers, I promise it’ll be worth it, so why not .. y’know? skip a cup of coffee from SB and support ThursdAI?
undefined
Sep 7, 2023 • 29min

ThursdAI Sep 7 - Falcon 180B 🦅 , 🔥 Mojo lang finally here, YaRN scaling interview, Many OSS models & more AI news

Hey ya’ll, welcome to yet another ThursdAI, this is Alex coming at you every ThursdAI, including a live recording this time! Which was incredible, we chatted about Falcon 180B,had a great interview in the end with 3 authors of the YaRN scaling paper and LLongMa 128K context, had 3 breaking news! in the middle, MOJO🔥 has been released and Adept released a LLaMa comparable OSS model (and friend of the pod) @reach_vb showed an open ASR leaderboard on hugging face! We also covered an incredible tiny model called StarCoder 1B that was finetuned by friend of the pod (who joined the space to talk to us about it!) As always, you can listen to the whole 3 hour long form conversation (raw, unedited) on our Zealous page (and add it to your podcatcher via this RSS) and this short-form pod is available on Apple, Spotify and everywhere. ThursdAI - Hey, if you enjoy these, how about subscribing for real? Would love to do this full time! Every paid subscriber is like a dear friend 🧡TL;DR of all topics covered* Open Source LLM* Falcon 180B announced by TIIUAE (Announcement, Demo)* YaRN scaling paper - scaling LlaMa to 128K context (link)* OpenHermes-13B from @teknium1 (link)* Persimmon-8B from Adept.AI (link)* Starcoder-1B-sft from @abacaj (link) * Big Co LLMs + API updates* OpenAI first ever Dev conference (link)* Claude announces a $20/mo Claude Pro tier (link)* Modular releases Mojo🔥 with 68,000x improvement over python (Link)* Vision* Real time deepfake with FaceFusion (link)* HeyGen released AI avatars and AI video translation with lipsync (link, translation announcement)* Voice* Open ASR (automatic speech recognition) leaderboard from HuggingFace (link)* Tools* LangChain Hub (re) launched * Open Interpreter (Announcement, Github)Open Source LLM🦅 Falcon 180B - The largest open source LLM to date (Announcement, Demo)The folks at the “Technology Innovation Institute” have open sourced the huge Falcon 180B, and have put it up on Hugging Face. Having previously open sourced Falcon 40B, the folks from TIIUAE have given us a huge model that beats (base) LLaMa 2 on several evaluations, if just slightly by a few percentages points. It’s huge, was trained on 3.5 trillion tokens and weights above 100GB as a file and requires 400GB for inference. Some folks were not as impressed with Falcon performance, given it’s parameter size is 2.5 those of LLaMa 2 (and likely it took a longer time to train) but the relative benchmarks is just a few percentages higher than LLaMa. It also boasts an embarrassingly low context window of just 2K tokens, and code was just 5% of it’s dataset, even though we already know that more code in the dataset, makes the models smarter! Georgi Gerganov is already running this model on his M2 Ultra because he’s the Goat, and co-host of ThursdAI spaces, Nisten, was able to run this model with CPU-only and with just 4GB of ram 🤯 We’re waiting for Nisten to post a Github on how to run this monsterous model on just CPU, because it’s incredible! However, given the Apache2 license and the fine-tuning community excitement about improving these open models, it’s an incredible feat. and we’re very happy that this was released! The complete open sourcing also matters in terms of geopolitics, this model was developed in the UAE, while in the US, the export of A100 GPUs was banned to the middle easy, and folks are talking about regulating foundational models, and this release, size and parameter model that’s coming out of the United Arab Emirates, for free, is going to definitely add to the discussion wether to regulate AI, open source and fine-tuning huge models! YaRN scaling LLaMa to 128K context windowLast week, just in time for ThursdAI, we posted about the release of Yarn-Llama-2-13b-128k, a whopping 32x improvement in context window size on top of the base LLaMa from the folks at Nous Research, Enrico Shippole, @theemozilla with the help of Eluether AI.This week, they released the YaRN: Efficient Context Window Extension of Large Language Models paper which uses Rotary Position Embeddings to stretch the context windows of transformer attention based LLMs significantly. We had friends of the pod Enrico Shippole, theemozilla (Jeff) and Bowen Peng on the twitter space and an special interview with them will be released on Sunday, if you’re interested in scaling and stretching context windows work, definitely subscribe for that episode, it was incredible! It’s great to see that their work is already applied into several places, including CodeLLaMa (which was released with 16K - 100K context) and the problem is now compute, basically, context windows can be stretched, and the models are able to generalize from smaller datasets, such that the next models are predicted to be released with infinite amount of context window, and it’ll depend on your hardware memory requirements.Persimmon-8B from AdeptAI (announcement, github)AdeptAI, the company behind Act-1, a foundational model for AI Agent that does browser driving, and has a few co-founders that are the original transformers paper authors, have dropped a ThursdAI surprise, a fresh (read, not a LLaMa clone) model!Releasing an completely open source model called Persimmon-8B, with a full Apache 2 license, 16K context window (using custom RoPE scaling methods) and some interesting inference speedups with C++. A very interesting 8B model that can fit on most consumer hardware, with additional tricks and a huge context window, is definitely welcome! Additional interesting point is, they have 70K unused embeddings for multimodal extensions! Can’t wait to see what’s that about!Starcoder-1B-sft - tiny model that’s great at codeAnton Bacaj (@abacaj) has finetuned StarCoder, to achieve some incredible results, for such a tiny model! Remember the first item, a whopping 180B parameter Falcon? We’ll, this is just 1B parameters model, finetuned on 65K sampled dataset of code, that’s outperforming Falcon, LLaMa2, Palm-2 (and Persimmon) on coding tasks, and runs on your device, so fast, that it’s hard to read! It boasts an incredible 39% on HumanEval task and 31% on MBPP! (Anton reran and updated the MBPP score later) and can run locally. Friend of the pod Xenova has already ported this model to transformers.js and it’ll soon run in your browser! OpenHermes-13B from @teknium1 (link)Our friend Teknium1 (who we’ve interviewed a few weeks ago) releases OpenHermes on top of LLaMa2, but this time it’s a completely open model and datasets, marking this the first time that Hermes models have been open!OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including: * GPTeacher - General Instruct, * Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium * WizardLM (v1, evol_instruct 70k), by WizardLM * Team/nlpxucan Airoboros GPT-4 (v1.0), by JonDurbin * Camel-AI's domain expert datasets, by the Camel-AI Team * CodeAlpaca, by Sahil2801 * GPT4-LLM and * Unnatural Instructions, by MicrosoftCheck it out folks! Big Co LLM + API updatesModular finally ships Mojo 🔥 (Announcement)I just knew it, that Mojo will finally be shipped during ThursdAI, and in fact, this was a great #BreakingNews moment on twitter spaces!Modular, and it’s co-founder Chris Lattner (author of LLVM, MLIR, Swift and many other things) have finally released their Mojo 🔥 language, for AI. Mojo 🔥 is like Python++, includes strong types, full interoperability with python ecosystem but is able to run basic vanilla python, and has so so much more in it, but the main thing Modular is claiming is a whopping 68,000x improvement over vanilla python! You didn’t misread this, 68,000 improvement, when using all the Modular inference compilers, and Mojo virtualization tricks and compilation improvements. It’s incredible. The beauty of Mojo is that it meets developers where they are and allows them to adopt new features to achieve high performance gradually. By combining the best of dynamic and static languages, Mojo can deliver performance up to 68,000 times faster than Python today. That's quite a leap! If you want to delve deeper into Mojo's origin story, you can find more information in their documentation. But for now, let me highlight a few key benefits that Mojo offers:Firstly, Mojo allows you to write everything in one language, merging the usability of Python with the systems programming features that typically require developers to rely on C, C++, or CUDA. This means that both research and deployment teams can work within a common codebase, streamlining the workflow from research to production.Secondly, Mojo unlocks Python's performance potential. While Python is widely used, it may not be the best tool for high-performance or specialized hardware tasks. However, Mojo bridges that gap by enabling high performance on CPUs and providing support for exotic accelerators like GPUs and ASICs. With Mojo, you can achieve performance levels on par with C++ and CUDA.Thirdly, and this is a big one, Mojo seamlessly integrates with the entire Python ecosystem. You can leverage the extensive library collection available in Python while making use of Mojo's features and performance benefits. This means you can easily combine libraries like NumPy and Matplotlib with your Mojo code – talk about flexibility!Finally, Mojo allows you to upgrade your AI workloads effortlessly. By tightly integrating with the Modular AI Engine, Mojo empowers you to extend your AI workloads with custom operations. This includes pre-processing and post-processing operations, as well as high-performance mathematical algorithms. You can even integrate kernel fusion, graph rewrites, shape functions, and more. Mojo is all about expanding the possibilities!Mojo’s playground has been around since May and I have a deep dive here but you should really watch for over 3 hours on everything from Why they chose to be a python superset, to why he thinks the community will pick it up, it’s an incredible watch and will make you excited about Mojo! WebGPU ships with support for FP16 in Chromium Chrome has shipped with WebGPU back in April of 23’, after years of development, it allows high performance 3D graphics (and of course, transformers inference) in the browser and on the web! However, for inference of models, GPU access is not enough, you also need to be able to run smaller models. Well, one way to make models smaller is to run them in fp16 format. Essentially cutting the precision of the weights numbers by half, we can use much smaller (read compressed) models with a slight loss in accuracy. Friends of the pod Nisten and Xenova (transformers.js author) have given us an update that a new, updated fp16 support has shipped in nightly of chromium, allowing for much much smaller models to be run on clientside! OpenAI first dev conference (Announcement)OpenAI has announced their first developer focused conference, to happen in SF during November 6th! In person only (with the keynote being streamed to all) and they also said that they won’t do any model announcement like GPT-5 😂But we'll all expect at least a few API updates! VisionFaceFusion 1.1.0 - a deepfake faceswapper (Announcement, Github) We all know deepfakes are here, I mean, don’t we? But did you know that it’s now super easy to face swap your face into an image or a video? FaceFusion does just that, an incredibly fast way to deepfake someone’s face into an image or a video with a few clicks, works on CPU (I couldn’t make it work on GPU but it’s possible) and shows incredible results! Enjoy Steve Buschemi dance around as Harry Styles? 3 clicks and 10 minutes and you get this 🔥Friend of the pod CocktailPeanut, has made it incredible easy to install with just 1 click with his pinokio.computer app, which I use and love! Facefusion also has a webcam mode that is able to deepfake any image onto a webcam stream for a lot of fun on zoom calls! (which I wasn’t able to test for some reason) HeyGen launches their deep AI face creatorMany of us used 11Labs to clone voices, but what if you can clone a voice AND an image of a person? With just 2 minutes of their recording? That’s what HeyGen are claiming to be able to do, and we’ve previously reported that their incredible realistic AI avatar generation from videos/images + voice really blew us away. Heygen just launched their service and you can sign up and get a few minutes for free, here’s a sample (with the CEO avatar, they couldn’t make my own for some launch day errors) The video you see on top of just that, the CEO of HeyGen, thanking you for reading this weeks ThursdAI! VoiceASR leaderboard + New top ASR model from NvidiaI love doing ThursdAI, and one of the things I love most, is folks sending me stuff they worked on, and then coming to ThursdAI to chat about it. Friend of the pod Vaibhav (VB) Srivastav, who’s an incredible dev rel at HuggingFace, focusing on Audio, has shipped a new Open-ASR (automatic speech recognition) leaderboard on huggingface! Showing the top ASR models like Whisper and a new comer, Nvidia FastConformer, which I didn’t even know existed, and now it’s topping Whisper for english speech to text tasks! HuggingFace leaderboards like these are definitely a boon for the Open Source industry as they allow all of us to easily select open source models, but also allow the open source community to start racing towards the top, while we all benefit! ToolsOpen Interpreter (Announcement, Github)One tool that I’ve used this week, and is incredible, is OpenInterpreter from @heyitskillian It’s incredibly easy to install and run, and behaves like OpenAI Code Interpreter (renamed to Advanced Data Analytics) but on your computer, and is able to do things like control your apps, lower volume, edit images/files and tons morepip install open-interpreterAnd that’s it! Give it a try (and you have to approve each command that it runs) It’s a great agent, and hopefully we’ll get Killian to chat with us about it on next ThursdAI!LangChain hub has launched (link)If you’re into LangChain, and even if you aren’t, it’s undeniable the weight LangChain has in the ai engineer industry! They have a connector for everything, tons of folks use them, and they have raised a bunch of funding. They have just launched their new LangChain Hub and it’s exciting! Many folks are sharing their best prompts on there, and ways to work with langchain, with upvotes and sharable links! Also, worth nothing that our friends swyx and Alessio from Latent Space have recently released an episode with Harrison on Latent space, and it’s WELL worth listening (and reading) as swyx did a deep dive into Landchain, it’s nay-sayers and everything in between! Check it out below : Thank you, see you next time (with some incredible personal news I’ll have to share)ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Aug 25, 2023 • 1h 8min

ThursdAI Aug 24 - Seamless Voice Model, LLaMa Code, GPT3.5 FineTune API & IDEFICS vision model from HF

Hey everyone, this week has been incredible (isn’t every week?), and as I’m writing this, I had to pause and go check out breaking news about LLama code which was literally released on ThursdAI as I’m writing the summary! I think Meta deserves their own section in this ThursdAI update 👏A few reminders before we dive in, we now have a website (thursdai.news) which will have all the links to Apple, Spotify, Full recordings with transcripts and will soon have a calendar you can join to never miss a live space!This whole thing would have been possible without Yam, Nisten, Xenova , VB, Far El, LDJ and other expert speakers from different modalities who join and share their expertise from week to week, and there’s a convenient way to follow all of them now!TL;DR of all topics covered* Voice* Seamless M4T Model from Meta (demo)* Open Source LLM* LLaMa2 - code from Meta* Vision* IDEFICS - A multi modal text + image model from Hugging face* AI Art & Diffusion* 1 year of Stable Diffusion 🎂* IdeoGram* Big Co LLMs + API updates* GPT 3.5 Finetuninng API* AI Tools & Things* Cursor IDEVoiceSeamless M4t - A multi lingual, mutli tasking, multimodality voice model.To me, the absolute most mindblowing news of this week was Meta open sourcing (not fully, not commercially licensed) SeamlessM4TThis is a multi lingual model that takes speech (and/or text) can generate the following:* Text* Speech* Translated Text* Translated SpeechIn a single model! For comparison sake, I takes a whole pipeline with whisper and other translators in targum.video not to mention much bigger models, and not to mention I don’t actually generate speech!This incredible news got me giddy and excited so fast, not only because it simplifies and unifies so much of what I do into 1 model, and makes it faster and opens up additional capabilities, but also because I strongly believe in the vision that Language Barriers should not exist and that’s why I built Targum.Meta apparently also believes in this vision, and gave us an incredible new power unlock that understands 100 languages and does so multilingually without effort.Language barriers should not existDefinitely checkout the discussion in the podcast, where VB from the open source audio team on Hugging Face goes in deeper into the exciting implementation details of this model.Open Source LLMs🔥 LLaMa CodeWe were patient and we got it! Thank you Yann!Meta releases LLaMa Code, a LlaMa fine-tuned on coding tasks, including “in the middle” completion tasks, which are what copilot does, not just autocompleting code, but taking into account what’s surrounding the code it needs to generate.Available in 7B, 13B and 34B sizes, the largest model beats GPT3.5 on HumanEval, which is a metric for coding tasks. (you can try it here)In an interesting move, they also separately release a specific python finetuned versions, for python code specifically.Additional incredible thing is, it supports 100K context window of code, which is, a LOT of code. However it’s unlikely to be very useful in open source because of the compute requiredThey also give us instruction fine-tuned versions of these models, and recommend using them, since those are finetuned on being helpful to humans rather than just autocomplete code.Boasting impressive numbers, this is of course, just the beginning, the open source community of finetuners is salivating! This is what they were waiting for, can they finetune these new models to beat GPT-4? 🤔Nous updateFriends of the Pod LDJ and Teknium1 are releasing the latest 70B model of their Nous Hermes 2 70B model 👏* Nous-Puffin-70BWe’re waiting on metrics but it potentially beats chatGPT on a few tasks! Exciting times!Vision & Multi ModalityIDEFICS - a new 80B model from HuggingFace, was released after a years effort, and is quite quite good. We love vision multimodality here on ThursdAI, we’ve been covering it since we say that GPT-4 demo!IDEFICS is a an effort by hugging face to create a foundational model for multimodality, and it is currently the only visual language model of this scale (80 billion parameters) that is available in open-access.It’s made by fusing the vision transformer CLIP-VIT-H-14 and LLaMa 1, I bet LLaMa 2 is coming soon as well!And the best thing, it’s openly available and you can use it in your code with hugging face transformers library!It’s not perfect of course, and can hallucinate quite a bit, but it’s quite remarkable that we get these models weekly now, and this is just the start!AI Art & DiffusionStable Diffusion is 1 year oldHas it been a year? wow, for me, personally, stable diffusion is what started this whole AI fever dream. SD was the first model I actually ran on my own GPU, the first model I learned how to.. run, and use without relying on APIs. It made me way more comfortable with juggling models, learning what weights were, and we’ll here we are :) I now host a podcast and have a newsletter and I’m part of a community of folks who do the same, train models, discuss AI engineer topics and teach others!Huge thank you to Emad, Stability AI team, my friends there, and everyone else who worked hard on this.Hard to imagine how crazy of a pace we’ve been on since the first SD1.4 release, and how incredibly realistic the images are now compared to what we got then and got excited about!🎂IdeaoGram joins the AI art raceIdeoGram - new text to image from ex googlers (announcement) is the new kid on the block, not open source (unless I missed it) it boasts significant text capabilities, and really great quality of imagery. It also has a remix ability, and is availble from the web, unlike… MidJourney!Big Co LLMs + API updatesOpen AI pairs with ScaleAI to let enterprises finetune and run finetuned GPT3.4 models!This is an interesting time for OpenAI to dive into fine-tuning, as open source models inch closer and closer to GPT3.5 on several metrics with each week.Reminder, if you finetune a GPT3.5 model ,you need to provide your own data to OpenAI but then also you have to pay them for essentially hosting a model just for you, which means it’s not going to be cheap.Use as much prompting as humanly possible before you consider doing the above fine-tuning and you may be able to solve your task much better and cheaper.AgentsThe most interesting thing to me in the world of agents actually came from an IDE!I installed Cursor, the new AI infused VsCode clone, imported my vscode settings, and off we went! It can use your own GPT-4 keys if you don’t want to send them our code or pay, it embeds your whole repo for easy import and code understand and does so much more, like adding a button to every error in console to “debug” and has an “new AI project” feature, which builds you a template just by typing a few words!Our friends Alessio and Swyx have interviewed the founder of Cursor on their podcast, a strong recommendation to check that episode out!After using Cursor for just a few days, I don’t want to go back to VSCode and even consider … maybe pausing my copilot subscription 🤯That’s all for today folks! I wish you all a great week, and we’ll see you in the next ThursdAI 🫡Thank you for reading ThursdAI - Recaps of the most high signal AI weekly spaces. This post is public so feel free to share it with a friend? Let’s get to 1K readers 🔥 This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Aug 20, 2023 • 52min

🎙️ThursdAI - LLM Finetuning deep dive, current top OSS LLMs (Platypus 70B, OrctyPus 13B) authors & what to look forward to

This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsBrief outline for your convenience:[00:00] Introduction by Alex Volkov[06:00] Discussing the Platypus models and data curation process by Ariel, Cole and Nathaniel[15:00] Merging Platypus with OpenOrca model by Alignment Labs* Combining strengths of Platypus and OpenOrca* Achieving state-of-the-art 13B model[40:00] Mixture of Experts (MOE) models explanation by Prateek and Far El[47:00] Ablation studies on different fine-tuning methods by TekniumFull transcript is available for our paid subscribers 👇 Why don’t you become one?Here’s a list of folks and models that appear in this episode please follow all of them on X:* ThursdAI cohosts - Alex Volkov, Yam Peleg, Nisten Tajiraj* Garage Baind - Ariel, Cole and Nataniel (platypus-llm.github.io)* Alignment Lab - Austin, Teknium (Discord server)* SkunkWorks OS - Far El, Prateek Yadav, Alpay Ariak (Discord server)* Platypus2-70B-instruct* Open Orca Platypus 13BI am recording this on August 18th, which marks the one month birthday of the Lama 2 release from Meta. It was the first commercially licensed large language model of its size and quality, and we want to thank the great folks at MetaAI. Yann LeCun, BigZuck and the whole FAIR team. Thank you guys. It's been an incredible month since it was released.We saw a Cambrian explosion of open source communities who make this world better, even since Lama 1. For example, LLaMa.Cpp by Georgi Gerganov is such an incredible example of how open source community comes together and this one guy in the weekend Took the open source weights and made it run on CPUs and much, much faster.Mark Zuckerberg even talked about this, how amazing the open source community has adopted LLAMA, and that Meta is also now adopting many of those techniques and developments back to run their own models cheaper and faster. And so it's been exactly one month since LLAMA 2 was released.And literally every ThursdAI since then, we have covered a new state of the art open source model all based on Lama 2 that topped the open source model charts on Hugging Face.Many of these top models were fine tuned by Discord organizations of super smart folks who just like to work together in the open and open source their work.Many of whom are great friends of the pod.Nous Research, with whom we've had a special episode a couple of weeks back Teknium1 seems to be part of every orgm Alignment Labs and GarageBaind being the last few folks topping the charts.I'm very excited not to only bring you an interview with Alignment Labs and GarageBaind, but also to give you a hint of two additional very exciting efforts that are happening in some of these discords.I also want to highlight how many of those folks do not have data scientist backgrounds. Some of them do. So we had a few PhDs or PhD studies folks, but some of them studied all this at home with the help of GPT 4. And some of them even connected via ThursdAI community and space, which I'm personally very happy about.So this special episode has two parts. The first part we're going to talk with Ariel. Cole and Natniel, currently known as GarageBaind, get it? bAInd, GarageBaind, because they're doing AI in their garage. I love it.🔥 Who are now holding the record for the best performing open source model called Platypus2-70B-Instruct.And then, joining them is Austin from Alignment Labs, the authors of OpenOrca, also a top performing model, will talk about how they've merged and joined forces and trained the best performing 13b model called Open Orca Platypus 13B or Orctypus 13BThis 13b parameters model comes very close to the Base Llama 70b. So, I will say this again, just 1 month after Lama 2 released by the great folks at Meta, we now have a 13 billion parameters model, which is way smaller and cheaper to run that comes very close to the performance benchmarks of a way bigger, very expensive to train and run 70B model.And I find it incredible. And we've only just started, it's been a month. And so the second part you will hear about two additional efforts, one run by Far El, Prateek and Alpay from the SkunksWorks OS Discord, which is an effort to bring everyone an open source mixture of experts model, and you'll hear about what mixture of experts is.And another effort run by a friend of the pod Teknium previously a chart topper himself with Nous Hermes models and many others, to figure out which of the fine tuning methods are the most efficient. and fast and cheap to run. You will hear several mentions of LORAs, which stand for Low Rank Adaptation, which are basically methods of keeping the huge weights of LAMA and other models frozen and retrain and fine tune and align some specific parts of it with new data, which is a method we know from Diffusion World.And it's now applying to the LLM world and showing great promise in how fast, easy, and cheap it is to fine tune these huge models with significantly less hardware costs and time. Specifically, Nataniel Ruiz, the guy who helped Ariel and Cole to train Platypus, the co-author on DreamBooth, StyleDrop and many other diffusion methods, mentioned that it takes around five hours on a single A100 GPU to fine tune the 13B parameter model. That, if you can find an A100 GPU, that's around $10.That's incredible.I hope you enjoy listening and learning from these great folks, and please don’t forget to checkout our website at thursdai.news for all the links, socials and podcast feeds.Brief outline for your convinience:[00:00] Introduction by Alex Volkov[06:00] Discussing the Platypus models and data curation process by Ariel, Cole and Nathaniel[15:00] Merging Platypus with OpenOrca model by Alignment Labs* Combining strengths of Platypus and OpenOrca* Achieving state-of-the-art 13B model[40:00] Mixture of Experts (MOE) models explanation by Prateek and Far El[47:00] Ablation studies on different fine-tuning methods by TekniumFull transcript is available for our paid subscribers 👇 Why don’t you become one?
undefined
Aug 17, 2023 • 17min

ThursdAI Aug 17 - AI Vision, Platypus tops the charts, AI Towns, Self Alignment 📰 and a special interview with Platypus authors!

Hey everyone, this is Alex Volkov, the host of ThursdAI, welcome to yet another recap of yet another incredibly fast past faced week.I want to start with a ThursdAI update, we now have a new website http://thursdai.news and a new dedicated twitter account @thursdai_pod as we build up the ThursdAI community and brand a bit more.As always, a reminder that ThursdAI is a weekly X space, newsletter and 2! podcasts, short form (Apple, Spotify) and the unedited long-form spaces recordings (RSS, Zealous page) for those who’d like the nitty gritty details (and are on a long drive somewhere).Open Source LLMs & FinetuningHonestly, the speed with which LLaMa 2 finetunes are taking over state of the art performance is staggering. We literally talk about a new model every week that’s topping the LLM Benchmark leaderboard, and it hasn’t even been a month since LLaMa 2 release day 🤯 (July 18 for those who are counting)Enter Platypus 70B (🔗)Platypus 70B-instruct is currently the highest ranked open source LLM and other Platypus versionsWe’ve had the great pleasure to chat with new friends of the pod Arielle Lee and Cole Hunter (and long time friend of the pod Nataniel Ruiz, co-author of DreamBooth, and StyleDrop which we’ve covered before) about this incredible effort to finetune LLaMa 2, the open dataset they curated and released as part of this effort and how quick and easy it is possible to train (a smaller 13B) version of Platypus (just 5 hours on a single A100 GPU ~= 6$ on Lambda 🤯)We had a great interview with Garage BAIND the authors of Platypus and we’ll be posting that on a special Sunday episode of ThursdAI so make sure you are subscribed to receive that when it drops.Open Orca + Platypus = OrctyPus 13B? (🔗)We’ve told you about OpenOrca just last week, from our friends at @alignment_lab and not only is Platypus is the best performing 70B model, the open source community comes through with an incredible merge and collaborating to bring you the best 13B model, which is a merge between OpenOrca and Platypus.This 13B model is now very close to the original LLaMa 70B in many of the metrics. LESS THAN A MONTH after the initial open source. It’s quite a remarkable achievement and we salute the whole community for this immense effort 👏 Also, accelerate! 🔥Join the skunksworksSpeaking of fast moving things, In addition to the above interview, we had a great conversation with folks from so called SkunksWorks OS discord, Namely Far El, Prateek Yadav, Alpay Ariak, Teknium and Alignment Labs, and our recurring guest hosts Yam Peleg and Nisten covered two very exciting community efforts, all happening within the SkunksWorks Discord.First effort is called MoE, Open mixture of experts, which is an Open Source attempt at replicating the Mixture of Experts model, which is widely attributed to why GPT-4 is so much better than GPT-3.The second effort is called Ablation studies, which is an effort Teknium is leading to understand once and for all, what is the best, cheapest and most high quality way to finetune open source models, whether it's Qlora or a full finetune or Loras.If you're interested in any of these, either by helping directly or provide resources such as GPU compute, please join the SkunksWorks discord. They will show you how to participate, even if you don't have prior finetuning knowledge! And we’ll keep you apprised of the results once they release any updates!Big Co LLMs + API updatesIn our Big CO corner, we start with an incredible paper from MetaAi, announcing:Self-Alignment w/ Backtranslation method + Humpback LLM - MetaAISummarized briefly (definitely listen to the full episode and @yampeleg detailed overview of this method) it’s a way for an LLM to be trained on a unsupervised way of creating high quality datasets, for itself! Using not a lot of initial “seed” data from a high quality dataset. Think of it this way, fine-tuning a model requires a lot of “question → response” data in your dataset, and back-translation proposes “response → question” dataset generation, coming up with novel ways of saying “what would a potential instruction be that would make an LLM generate this result”This results in a model that effectively learns to learn better and create it’s own datasets without humans (well at least human labelers) in the loop.Here are some more reading material on X for reference.OpenAI new JS SDK (X link)OpenAI has partnered with StainlessAPI to released a major new version 4 of their TS/JS SDK with the following incredible DX improvements for AI engineers* Streaming responses for chat & completions* Carefully crafted TypeScript types* Support for ESM, Vercel edge functions, Cloudflare workers, & Deno* Better file upload API for Whisper, fine-tune files, & DALL·E images* Improved error handling through automatic retries & error classes* Increased performance via TCP connection reuse* Simpler initialization logicThe most exciting part for me is, this is now very easy to get started with AI projects and get streaming on the incredible Cloudflare workers platform (Targum is part of the first Cloudflare workers launchpad but is not affiliated, we’re just superfans 🫶)Vision & Multi ModalityThere’s been some really cool stuff happening in computer vision and multi-modal AI recently. First up, a new method called 3D Gaussian Splatting that shows an incredibly clear and smooth way to generate 3d scenes from just a few images.Compared to neural radiance fields (NeRFs), Gaussian splatting produces much smoother results without the grainy voxel artifacts NeRFs often have. However, it achieves this improved quality without sacrificing the speed and performance of NeRFs. So Gaussian splatting gives a big boost in realism compared to NeRF renderings, while maintaining real-time speeds in cleaning up those “clouds”Supervision from Roboflow (and Piotr)Btw our own friend of the pod and AI Vision expert @skalskiP (who reviewed Gaussian Splatting for us) is also having a crazy ThursdAI week, with their open source library called SuperVision, which is a computer vision toolkit, and is trending #2 on Github 👏Apple stepping in their Vision (not the headset) Transformer gameApple has open sourced ml-fastvit, which is their general purpose Vision Transformers model, which they claim runs at ~1ms on mobile devices, including code and pre-train weights available on Github 🔥This is great to see from Apple ML teams, not only them open sourcing, but also them preparing all of us to the world of spatial computers (Vision Pro coming remember?) and many new Computer Vision heavy apps will be available at those incredible speeds.This is also great for on device inference running these models in node / on edge (as Friend of the pod @visheratin demonstrated with WebAI)Additional updates included Nvidia releasing a web playground for NeVa, which is their MLLM (Multimodal LLM, get used to seeing this term everywhere) and you can play with that here ), and Link-Context learning for MLLMsAgentsOpenAi is also announced that Global Illumination joining OpenAI, that team is CEOd by the creator of Instagram stories algorithm and feed contributor and the team is behind a massive open world minecraft clone. Will we see OpenAI release agents into that world? We know that they are working on agentsA16Z - AI Town (🔗)Speaking of agents roaming free and interacting, we covered the open sourcing of SmallVille just last week ↴ and now we see a new open source framework called AI Town of running letting agents roam and interact with each other from Andreessen Horowitz AI division.AI Town (Github) is a web framework, written in TypeScript and is built to run, get customized and run with different LLMs (even Open source ones) in mind and you can see the AI agents running around in a live demo hereThis ThursdAI was so packed with great information, that it’s really worth listening to the whole recording, you can do this on our Zealous page, RSS and on twitter (all those links can always be found on thursdai.news )If you found this valuable, join our community and let your friends know? This is a great way to support us, as well as participate in the discussion on social, tag #thursdAI on anything you feel is worthwhile for us to summarize and This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode