ThursdAI - The top AI news from the past week

From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
undefined
Oct 26, 2023 • 1h 40min

📅 ThursdAI Oct-26, Jina Embeddings SOTA, Gradio-Lite, Copilot crossed 100M paid devs, and more AI news

Bo Weng, author of Jina Embeddings V2, discusses the latest updates and initiatives in AI. Topics include open source LLMs, Hugging Face's text embeddings, the Data Provenance Initiative, and Gradio Lite. The podcast also features an interview with Abubakar, Xenova, and Yuichiro from Gradio, discussing their effects on the open source LLM ecosystem and the integration of Transformers.js with Gradio Lite. This episode covers a wide range of topics in the AI field in an entertaining and informative way.
undefined
Oct 20, 2023 • 1h 30min

🔥 ThursdAI Oct 19 - Adept Fuyu multimodal, Pi has internet access, Mojo works on macs, Baidu announces ERNIE in all apps & more AI news

Hey friends, welcome to ThursdAI Oct - 19. Here’s everything we covered + a little deep dive after the TL;DR for those who like extra credit. ThursdAI - If you like staying up to date, join our communityAlso, here’s the reason why the newsletter is a bit delayed today, I played with Riffusion to try and get a cool song for ThursdAI 😂ThursdAI October 19thTL;DR of all topics covered: * Open Source MLLMs * Adept open sources Fuyu 8B - multi modal trained on understanding charts and UI (Announcement, Hugging face, Demo)* Teknium releases Open Hermes 2 on Mistral 7B (Announcement, Model)* NEFTune - a "one simple trick" to get higher quality finetunes by adding noise (Thread, Github)* Mistral is on fire, most fine-tunes are on top of Mistral now* Big CO LLMs + APIs* Inflection Pi got internet access & New therapy mode (Announcement)* Mojo 🔥 is working on Apple silicon Macs and has LLaMa.cpp level performance (Announcement, Performance thread)* Anthropic Claude.ai is rolled out to additional 95 countries (Announcement) * Baidu AI announcements - ERNIE 4, multimodal foundational model, integrated with many applications (Announcement, Thread)* Vision* Meta is decoding brain activity in near real time using non intrusive MEG (Announcement, Blog, Paper)* Baidu YunYiduo drive - Can use text prompts to extract precise frames from video, and summarize videos, transcribe and add subtitles. (Announcement)* Voice & Audio* Near real time voice generation with play.ht - under 300ms (Announcement)* I'm having a lot of fun with Airpods + chatGPT voice (X)* Riffusion - generate short songs with sound and singing (Riffusion, X)* AI Art & Diffusion* Adobe releases Firefly 2 - lifelike and realistic images, generative match, prompt remix and prompt suggestions (X, Firefly)DALL-E 3 is now available to all chatGPT Plus uses (Announcement, Research paper!) * Tools* LMStudio - a great and easy way to download models and run on M1 straight on your mac (Download)* Other* ThursdAI is adhering to the techno-optimist manifesto by Pmarca (Link)Open source mLLMsWelcome to multimodal future with Fuyu 8B from AdeptWe've seen and covered many multi-modal models before, and in fact, most of them will start being multimodal, so get ready to say "MLLMs" or... we come up with something better. Most of them so far have been pretty heavy, IDEFICS was 80B parameters etc' This week we received a new, 8B multi modal with great OCR abilities from Adept, the same guys who gave us Persimmon 8B a few weeks ago, in fact, Fuyu is a type of persimmon tree (we see you Adept!)In the podcast I talked about having 2 separate benchmarks for myself, one for chatGPT or any MultiModal coming from huge companies, and another for open source/tiny models. Given that Fuyu is a tiny model, it's quite impressive! It's OCR capabilities are impressive, and the QA is really on point (as well as captioning)An interesting thing about FuYu architecture is, because it doesn't use the traditional vision encoders, it can scale to arbitrary image sizes and resolutions, and is really fast (large image responses under 100ms)Additionally, during the release of Fuyu, Arushi from Adept authored a thread about visualQA evaluation datasets are, which... they really are bad, and I hope we get better ones! NEFTune - 1 weird trick of adding noise to embeddings makes models better (announcement thread)If you guys remember, a "this one weird trick" was discovered by KaiokenDev back in June, to extend the context window of LLaMa models, which then turned into RoPE scaling and YaRN scaling (which we covered in a special episode with the authors) Well, now we have a similar "1 weird trick" that by just adding some noise to embeddings at training time, the model performance can grow by up to 25%! The results very per dataset of course, however, considering how easy it is to try, literally: It's as simple as doing this in your forward pass if training: return orig_embed(x) + noise else: return orig_embed(x)We should be happy that the "free lunch" tricks like this exist. Notably, we had a great guest, Wing Lian the maintainer of Axolotl, a very popular tool to streamline fine-tuning, chime in and say that in his tests, and among the discord folks, they couldn't reproduce some of these claims (as they are adding everything that's super cool and beneficial for finetuners to their library) so it remains to be seen how far this "trick" scales, and what else needed to be done here. Similarly, back when the context extend trick was discovered, there was a lot of debates about it's effectiveness from Ofir Press (author of ALiBi, another context scaling methond) and futher iterations of the trick made into a paper and a robust method, so this develompment is indeed exciting! Mojo 🔥 now supports Apple silicon Macs and has LLaMa.cpp level performance!I've been waiting for this day! We've covered Mojo from Modular a couple of times and it seems that the promise behind it starts to materialize. Modular promises an incredible unbelieavable 68,000X boost over vanilla python, and it's been great to see that develop.Today (October 19) they have released their support of Mojo Lang on Apple silicon which most developers use, and it's a native one and you can use it right now via CLI. A friend of the pod Aydyn Tairov, hopped on the live recording and talked to use about his LLama.🔥 project (Github) that he ported to the Apple silicon, and showed an incredible, LLaMa.cpp like performance, without crazy optimizations! Aydyn collected many LLaMa implementations, including Llama.cpp, LLama.c by Karpathy and many others, and included his LLama.mojo (or Llama.🔥) and saw that the mojo one is coming very very close to LLama.cpp and significantly beats Rust and Go and Julia examples (on specific baby llama models) The Mojo future is bright, and we'll keep updating with more, but for now, go play with it! Meta is doing near-real time brain → image research! 🤯We've talked about fMRI signals (and EEG) signals being translated to diffusion imagery before, and this week, Meta has shown that while fMRI signals to brain imagery is pretty crazy on it's own, using something called MEG (non invasive Magnetoencephalography) they can generate and keep generating images based on the brain signals, in near real time! [TK video here]I don't have a LOT to say about this topic, besides the fact that as an Aphant (I have Aphantasia) I can't wait to try this on myself and see what my brain actually "sees" Baidu announces ERNIE and a bunch AI native products including maps, drive, autonomous ride hailing and more. Baidu has just wrapped up their biggest conference of the year, BaiduWorld, where they announced a new version of their foundational model called ERNIE4, which is a multimodal (of unknown size) and is now integrated into quite a few of their products, many of which are re-imagined with AI. A few examples beyond a basic LLM chat like interface are, a new revamped map experience with an AI assistant (with voice) built in to help you navigate and find locations, a new office management app that handles appointments and time slots called InfoFlow, and it apparently even does travel booking, to an AI "google drive" like product called YunYidou, that is able to find video content, based on what was said and when, and even pinpoint specific frames, summarize and do a bunch fo other incredible AI stuff, here's a translated video of someone interacting with YunYinou and asking for a bunch of stuff one after another. Disclosure: I don't know if the video is edited or in real time. Voice & AudioReal time voice for agents is almost here, chatGPT voice mode is powerfulI've spent maybe 2 hours this week, with chatGPT in my ear, using the new voice mode + AirPods. It's almost like... being on a call with chatGPT. I started talking to it in the store, asking for different produce to buy for a recipe, then drove home and ask it to "prepare" me for the task (I don't usually cook this specific thing) and then during my cooking, I kept talking to it, asking for next steps. With the new IOS the voice mode shows up as a live activity and you can pause it and resume without opening the app: It was literally present in my world, without me having to watch the screen or type. It's a completely new paradigm of interactions when you don't have to type anymore, or pick up a screen and read, and it's wonderful! Play.ht shows off an impressive <300ms voice generation for agentsAfter spending almost 2 hours talking to chatGPT, I was thinking, why aren't all AI assistants like this, and the answer was, well... generating voice takes time, which takes you out of your "conversation flow" And then today, play.ht showed off a new update to their API that generates voice in <300ms, and that can be a clone of your voice, with your accent and all. We truly live in unprecedented times. I can't wait for agents to start talking and seeing what I see (and remember everything I heard, via Tab or Pendant or Pin) Riffusion is addictive, generate song snippets with life-like lyrics!We've talked about music gen before, however, Riffusion is a new addition and is now generating short song segments with VOICE! Here are a few samples, and honestly, I've procrastinated writing this newsletter because it's so fun to generate these, and I wish they went for longer! AI Art & DiffusionAdobe releases Firefly2 which is significantly better at skin textures, realism, and hands. Additionally they have added a style transfer which is wonderful, upload a picture of something with a style you'd like, and your prompt will be generated in that style, it works really really well. The extra details on the skin is just something else, though I did cherry pick this example, the other hands were a dead give-away, still, the hands are getting better across the board! Plus they have a bunch of prompt features, like prompt suggestion, ability to remix other creations and more, it's really quite developed at this point. Also: DALL-E is now available to 100% of plus users and enterprise, have you tried it yet? What do you think? Let me know in replies!That’s it for October 19. If you're into AI engineering, make sure you listen to the previous weeks podcast where Swyx and I recapped everything that happened on stage and off it in the seminal AI Engineer summit. And make sure to share this newsletter with your friends who like AI! For those who are 'in the know', emoji of the week is 📣, please DM or reply with it if you got all the way here 🫡 and we'll see you next week (where I will have some exciting news to share!) This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
undefined
Oct 13, 2023 • 1h 29min

A week of horror, an AI conference of contrasts

In this podcast, Miguel, a participant at the AI.engineer event, discusses the contrasting experiences of attending the event while dealing with the horror of rocket attacks in Israel. The podcast also covers topics such as building a better future, the importance of exploring boundaries in AI engineering, discussions and workshops from the AI conference, recap of the conference and keynote presentations, civility online, AI threats, coding and UI/UX agencies, conference review and design process of Co-Pilot, dislike for people in coding, and an announcement of the upcoming AI Engineer World's Fair.
undefined
Oct 5, 2023 • 1h 28min

📅 ThursdAI Oct 4 - AI wearables, Mistral fine-tunes, AI browsers and more AI news from last week

The podcast highlights Google adding Bard to Google Assistant, the launch of Reka AI's multi-modal Yasa-1, the integration of AI in browsers with Arc Max, and Mistral's Open Orca 7B. They also discuss voice-based AI assistants, AI voice cloning, and the importance of local LLEM. Furthermore, they explore the advantages of using browsers as a platform for developers and the potential of AI assistants in the real world.
undefined
Sep 29, 2023 • 1h 41min

📅🔥ThursdAI Sep 28 - GPT4 sees, speaks and surfs, Cloudflare AI on GPUs,Mistral 7B, Spotify Translates, Meta AI everywhere, Qwen14B & more AI news from this INSANE week

GPT4 from OpenAI can see, speak, and listen. Apple rumors and on-device inference are discussed, as well as OpenAI Voice Cloning Tech used in Spotify. Meta AI announces integration of EMU image model into AI agents. Cloudflare AI partnership with HuggingFace and new Vectorize DB. Mistral 7B model from MistralAI and the release of the Quinn 14B model. Discussions about treating digital property as physical property and challenges in adopting new LLM models.
undefined
Sep 22, 2023 • 1h 9min

📆 ThursdAI Sep 21 - OpenAI 🖼️ DALL-E 3, 3.5 Instruct & Gobi, Windows Copilot, Bard Extensions, WebGPU, ChainOfDensity, RemeberAll

This podcast covers the latest AI updates, including OpenAI's DALL-E 3 art model, Windows Co-Pilot by Microsoft, and Bard extensions from Google. They also discuss the significance of staying up to date with AI developments, the disappointment with Google's AI-powered extensions, and controversial opinions on compression papers. Additionally, they talk about building and running a GgML model with WebGPU and their experience at Jeffrey Hinton's AI lab.
undefined
17 snips
Sep 17, 2023 • 55min

📅 ThursdAI - Special interview with Killian Lukas, Author of Open Interpreter (23K Github stars for the first week) 🔥

Killian Lukas, Creator of Open Interpreter, discusses his open source project that lets you run code via AI models like GPT-4 or local models like Llama on your own machine. They explore the capabilities and use cases of Open Interpreter, including web-based tools, multi-modal models, and imagination unlock. The podcast also highlights the significance of community support and the future of language model programming.
undefined
Sep 15, 2023 • 1h 32min

🔥 ThursdAI Sep 14 - Phi 1.5, Open XTTS 🗣️, Baichuan2 13B, Stable Audio 🎶, Nougat OCR and a personal life update from Alex

This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsHey, welcome to yet another ThursdAI 🫡 This episode is special for several reasons, one of which, I shared a personal life update (got to listen to the episode to hear 😉) but also, this is the first time I took the mountainous challenge of fixing, editing and “video-fying” (is that a word?) our whole live recording! All 3 hours of it, were condensed, sliced, sound improved (x audio quality is really dogshit) and uploaded for your convenience. Please let me know what you think! Premium folks get access to the full podcast in audiogram format, and a full transcription with timestamps and speakers, here’s a sneak preview of how that looks, why not subscribe? 😮TL;DR of all topics covered* Open Source LLM* Microsoft Phi 1.5 - a tiny model that beats other 7B models (with a twist?) (Paper, Model)* Baichuan 7B / 13B - a bilingual (cn/en) model with highly crafted approach to training (Paper, Github) * Big Co LLMs + API updates* Nothing major this week* Voice & Audio* Stable Audio 🎶 - A new music generation model from Stability AI. (Website)* Coqui XTTS - an open source multilingual text to speech for training and generating a cloned voice (Github, HuggingFace)* AI Art & Diffusion* Würstchen v2 - A new super quick 1024 diffusion model (Announcement, Demo, Github)* DiffBIR - Towards Blind Image Restoration with Generative Diffusion Prior (Annoucement, Demo, Github)* Tools* Nougat from Meta - open-source OCR model that accurately scans books with heavy math/scientific notations (Announcement, Github, Paper)* GPT4All Vulkan from Nomic - Run LLMs on ANY consumer GPUs, not just NVIDIA (Announcement)* Nisten’s AI ISO disk - Announcement And here are timestamps and chapter/discussion topics for your convenience: [00:05:56] Phi 1.5 - 1.3B parameter model that closely matches Falcon & LLaMa 7B[00:09:08] Potential Data Contamination with Phi 1.5[00:10:11] Data Contamination unconfirmed[00:12:59] Tiny models are all the rage lately[00:16:23] Synthetic Dataset for Phi[00:18:37] Are we going to run out of training data?[00:20:31] Breaking News - Nougat - OCR from Meta[00:23:12] Nisten - AI ISO disk[00:29:08] Baichuan 7B - an immaculate Chinese model[00:36:16] Unique Loss Terms[00:38:37] Baichuan ByLingual and MultiLingual dataset[00:39:30] Finetunes of Baichuan[00:42:28] Philosophical questions in the dataset[00:45:21] Let's think step by step[00:48:17] Is breath related text in the original dataset?[00:50:27] Counterintuitive prompting for models with no breath[00:55:36] Idea spaces[00:59:59] Alex - Life update about ThursdAI[01:04:30] Stable Audio from Stability AI[01:17:23] GPT4ALL Vulkan[01:19:37] Coqui.ai releases XTTS - an open source TTS - interview With Josh Meyer[01:30:40] SummaryHere’s a full video of the pod, and a full transcription, and as always, 🧡 thank you for bring a paid subscriber, this really gives me the energy to keep going, get better guests, release dope podcast content, and have 3 hours spaces and then spend 7 hours editing 🔥
undefined
Sep 10, 2023 • 54min

🔥🎙️ ThursdAI Sunday special - Extending LLaMa to 128K context window (2 orders of magnitude) with YaRN [Interview with authors]

This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsHappy Sunday everyone, I am very excited to bring you this interview with the folks who took LLaMa 2 and made it LLoooooongMa!Extending LLaMa 2 context window from 4,000 to a whopping 128,000 tokens (Yarn-Llama-2-13b-128k on Hugging Face), these guys also came up with a paper called YaRN (Efficient Context Window Extension of Large Language Models) and showed that YaRN is not only requires 10x less tokens to create these long contexts, but also 2.5x less training steps! And, the models generalize so there’s now no need to collect extremely long sequences (think books length sequences) for the models to understand those context lengths. I have decided also to do something different (which took me half of Sunday so I can’t promise and am not committing to this format, but for the premium subscribers, you can now watch this interview with running Karaoke style subtitles and improved audio! This will be uploaded to Youtube in a week but aren’t you glad you subscribed and is getting this first?) Here’s a teaser preview: And here’s the chapter for your convenience (the only thing that’s ai generated 😂)0:00 - Introduction3:08 - Discussion of extending LLAMA2's context length from 4,000 tokens to 128,000 tokens using the YaRN method8:23 - Explanation of rope scaling for positional encodings in transformers13:21 - How the rope scaling idea allows for longer context through positional interpolation18:51 - Using in-context learning to train models on shorter sequences but still handle long contexts25:18 - Sourcing long-form data like books to train 128k token models31:21 - Whether future models will natively support longer contexts37:33 - New model from Adept with 16k context using rope scaling42:46 - Attention is quadratic - need better algorithms to make long context usable49:39 - Open source community pushing state of the art alongside big labs52:34 - Closing thoughtsAs always, full (manually edited) transcription (and this time a special video version!) is reserved for the premium subscribers, I promise it’ll be worth it, so why not .. y’know? skip a cup of coffee from SB and support ThursdAI?
undefined
Sep 7, 2023 • 29min

ThursdAI Sep 7 - Falcon 180B 🦅 , 🔥 Mojo lang finally here, YaRN scaling interview, Many OSS models & more AI news

Hey ya’ll, welcome to yet another ThursdAI, this is Alex coming at you every ThursdAI, including a live recording this time! Which was incredible, we chatted about Falcon 180B,had a great interview in the end with 3 authors of the YaRN scaling paper and LLongMa 128K context, had 3 breaking news! in the middle, MOJO🔥 has been released and Adept released a LLaMa comparable OSS model (and friend of the pod) @reach_vb showed an open ASR leaderboard on hugging face! We also covered an incredible tiny model called StarCoder 1B that was finetuned by friend of the pod (who joined the space to talk to us about it!) As always, you can listen to the whole 3 hour long form conversation (raw, unedited) on our Zealous page (and add it to your podcatcher via this RSS) and this short-form pod is available on Apple, Spotify and everywhere. ThursdAI - Hey, if you enjoy these, how about subscribing for real? Would love to do this full time! Every paid subscriber is like a dear friend 🧡TL;DR of all topics covered* Open Source LLM* Falcon 180B announced by TIIUAE (Announcement, Demo)* YaRN scaling paper - scaling LlaMa to 128K context (link)* OpenHermes-13B from @teknium1 (link)* Persimmon-8B from Adept.AI (link)* Starcoder-1B-sft from @abacaj (link) * Big Co LLMs + API updates* OpenAI first ever Dev conference (link)* Claude announces a $20/mo Claude Pro tier (link)* Modular releases Mojo🔥 with 68,000x improvement over python (Link)* Vision* Real time deepfake with FaceFusion (link)* HeyGen released AI avatars and AI video translation with lipsync (link, translation announcement)* Voice* Open ASR (automatic speech recognition) leaderboard from HuggingFace (link)* Tools* LangChain Hub (re) launched * Open Interpreter (Announcement, Github)Open Source LLM🦅 Falcon 180B - The largest open source LLM to date (Announcement, Demo)The folks at the “Technology Innovation Institute” have open sourced the huge Falcon 180B, and have put it up on Hugging Face. Having previously open sourced Falcon 40B, the folks from TIIUAE have given us a huge model that beats (base) LLaMa 2 on several evaluations, if just slightly by a few percentages points. It’s huge, was trained on 3.5 trillion tokens and weights above 100GB as a file and requires 400GB for inference. Some folks were not as impressed with Falcon performance, given it’s parameter size is 2.5 those of LLaMa 2 (and likely it took a longer time to train) but the relative benchmarks is just a few percentages higher than LLaMa. It also boasts an embarrassingly low context window of just 2K tokens, and code was just 5% of it’s dataset, even though we already know that more code in the dataset, makes the models smarter! Georgi Gerganov is already running this model on his M2 Ultra because he’s the Goat, and co-host of ThursdAI spaces, Nisten, was able to run this model with CPU-only and with just 4GB of ram 🤯 We’re waiting for Nisten to post a Github on how to run this monsterous model on just CPU, because it’s incredible! However, given the Apache2 license and the fine-tuning community excitement about improving these open models, it’s an incredible feat. and we’re very happy that this was released! The complete open sourcing also matters in terms of geopolitics, this model was developed in the UAE, while in the US, the export of A100 GPUs was banned to the middle easy, and folks are talking about regulating foundational models, and this release, size and parameter model that’s coming out of the United Arab Emirates, for free, is going to definitely add to the discussion wether to regulate AI, open source and fine-tuning huge models! YaRN scaling LLaMa to 128K context windowLast week, just in time for ThursdAI, we posted about the release of Yarn-Llama-2-13b-128k, a whopping 32x improvement in context window size on top of the base LLaMa from the folks at Nous Research, Enrico Shippole, @theemozilla with the help of Eluether AI.This week, they released the YaRN: Efficient Context Window Extension of Large Language Models paper which uses Rotary Position Embeddings to stretch the context windows of transformer attention based LLMs significantly. We had friends of the pod Enrico Shippole, theemozilla (Jeff) and Bowen Peng on the twitter space and an special interview with them will be released on Sunday, if you’re interested in scaling and stretching context windows work, definitely subscribe for that episode, it was incredible! It’s great to see that their work is already applied into several places, including CodeLLaMa (which was released with 16K - 100K context) and the problem is now compute, basically, context windows can be stretched, and the models are able to generalize from smaller datasets, such that the next models are predicted to be released with infinite amount of context window, and it’ll depend on your hardware memory requirements.Persimmon-8B from AdeptAI (announcement, github)AdeptAI, the company behind Act-1, a foundational model for AI Agent that does browser driving, and has a few co-founders that are the original transformers paper authors, have dropped a ThursdAI surprise, a fresh (read, not a LLaMa clone) model!Releasing an completely open source model called Persimmon-8B, with a full Apache 2 license, 16K context window (using custom RoPE scaling methods) and some interesting inference speedups with C++. A very interesting 8B model that can fit on most consumer hardware, with additional tricks and a huge context window, is definitely welcome! Additional interesting point is, they have 70K unused embeddings for multimodal extensions! Can’t wait to see what’s that about!Starcoder-1B-sft - tiny model that’s great at codeAnton Bacaj (@abacaj) has finetuned StarCoder, to achieve some incredible results, for such a tiny model! Remember the first item, a whopping 180B parameter Falcon? We’ll, this is just 1B parameters model, finetuned on 65K sampled dataset of code, that’s outperforming Falcon, LLaMa2, Palm-2 (and Persimmon) on coding tasks, and runs on your device, so fast, that it’s hard to read! It boasts an incredible 39% on HumanEval task and 31% on MBPP! (Anton reran and updated the MBPP score later) and can run locally. Friend of the pod Xenova has already ported this model to transformers.js and it’ll soon run in your browser! OpenHermes-13B from @teknium1 (link)Our friend Teknium1 (who we’ve interviewed a few weeks ago) releases OpenHermes on top of LLaMa2, but this time it’s a completely open model and datasets, marking this the first time that Hermes models have been open!OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including: * GPTeacher - General Instruct, * Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium * WizardLM (v1, evol_instruct 70k), by WizardLM * Team/nlpxucan Airoboros GPT-4 (v1.0), by JonDurbin * Camel-AI's domain expert datasets, by the Camel-AI Team * CodeAlpaca, by Sahil2801 * GPT4-LLM and * Unnatural Instructions, by MicrosoftCheck it out folks! Big Co LLM + API updatesModular finally ships Mojo 🔥 (Announcement)I just knew it, that Mojo will finally be shipped during ThursdAI, and in fact, this was a great #BreakingNews moment on twitter spaces!Modular, and it’s co-founder Chris Lattner (author of LLVM, MLIR, Swift and many other things) have finally released their Mojo 🔥 language, for AI. Mojo 🔥 is like Python++, includes strong types, full interoperability with python ecosystem but is able to run basic vanilla python, and has so so much more in it, but the main thing Modular is claiming is a whopping 68,000x improvement over vanilla python! You didn’t misread this, 68,000 improvement, when using all the Modular inference compilers, and Mojo virtualization tricks and compilation improvements. It’s incredible. The beauty of Mojo is that it meets developers where they are and allows them to adopt new features to achieve high performance gradually. By combining the best of dynamic and static languages, Mojo can deliver performance up to 68,000 times faster than Python today. That's quite a leap! If you want to delve deeper into Mojo's origin story, you can find more information in their documentation. But for now, let me highlight a few key benefits that Mojo offers:Firstly, Mojo allows you to write everything in one language, merging the usability of Python with the systems programming features that typically require developers to rely on C, C++, or CUDA. This means that both research and deployment teams can work within a common codebase, streamlining the workflow from research to production.Secondly, Mojo unlocks Python's performance potential. While Python is widely used, it may not be the best tool for high-performance or specialized hardware tasks. However, Mojo bridges that gap by enabling high performance on CPUs and providing support for exotic accelerators like GPUs and ASICs. With Mojo, you can achieve performance levels on par with C++ and CUDA.Thirdly, and this is a big one, Mojo seamlessly integrates with the entire Python ecosystem. You can leverage the extensive library collection available in Python while making use of Mojo's features and performance benefits. This means you can easily combine libraries like NumPy and Matplotlib with your Mojo code – talk about flexibility!Finally, Mojo allows you to upgrade your AI workloads effortlessly. By tightly integrating with the Modular AI Engine, Mojo empowers you to extend your AI workloads with custom operations. This includes pre-processing and post-processing operations, as well as high-performance mathematical algorithms. You can even integrate kernel fusion, graph rewrites, shape functions, and more. Mojo is all about expanding the possibilities!Mojo’s playground has been around since May and I have a deep dive here but you should really watch for over 3 hours on everything from Why they chose to be a python superset, to why he thinks the community will pick it up, it’s an incredible watch and will make you excited about Mojo! WebGPU ships with support for FP16 in Chromium Chrome has shipped with WebGPU back in April of 23’, after years of development, it allows high performance 3D graphics (and of course, transformers inference) in the browser and on the web! However, for inference of models, GPU access is not enough, you also need to be able to run smaller models. Well, one way to make models smaller is to run them in fp16 format. Essentially cutting the precision of the weights numbers by half, we can use much smaller (read compressed) models with a slight loss in accuracy. Friends of the pod Nisten and Xenova (transformers.js author) have given us an update that a new, updated fp16 support has shipped in nightly of chromium, allowing for much much smaller models to be run on clientside! OpenAI first dev conference (Announcement)OpenAI has announced their first developer focused conference, to happen in SF during November 6th! In person only (with the keynote being streamed to all) and they also said that they won’t do any model announcement like GPT-5 😂But we'll all expect at least a few API updates! VisionFaceFusion 1.1.0 - a deepfake faceswapper (Announcement, Github) We all know deepfakes are here, I mean, don’t we? But did you know that it’s now super easy to face swap your face into an image or a video? FaceFusion does just that, an incredibly fast way to deepfake someone’s face into an image or a video with a few clicks, works on CPU (I couldn’t make it work on GPU but it’s possible) and shows incredible results! Enjoy Steve Buschemi dance around as Harry Styles? 3 clicks and 10 minutes and you get this 🔥Friend of the pod CocktailPeanut, has made it incredible easy to install with just 1 click with his pinokio.computer app, which I use and love! Facefusion also has a webcam mode that is able to deepfake any image onto a webcam stream for a lot of fun on zoom calls! (which I wasn’t able to test for some reason) HeyGen launches their deep AI face creatorMany of us used 11Labs to clone voices, but what if you can clone a voice AND an image of a person? With just 2 minutes of their recording? That’s what HeyGen are claiming to be able to do, and we’ve previously reported that their incredible realistic AI avatar generation from videos/images + voice really blew us away. Heygen just launched their service and you can sign up and get a few minutes for free, here’s a sample (with the CEO avatar, they couldn’t make my own for some launch day errors) The video you see on top of just that, the CEO of HeyGen, thanking you for reading this weeks ThursdAI! VoiceASR leaderboard + New top ASR model from NvidiaI love doing ThursdAI, and one of the things I love most, is folks sending me stuff they worked on, and then coming to ThursdAI to chat about it. Friend of the pod Vaibhav (VB) Srivastav, who’s an incredible dev rel at HuggingFace, focusing on Audio, has shipped a new Open-ASR (automatic speech recognition) leaderboard on huggingface! Showing the top ASR models like Whisper and a new comer, Nvidia FastConformer, which I didn’t even know existed, and now it’s topping Whisper for english speech to text tasks! HuggingFace leaderboards like these are definitely a boon for the Open Source industry as they allow all of us to easily select open source models, but also allow the open source community to start racing towards the top, while we all benefit! ToolsOpen Interpreter (Announcement, Github)One tool that I’ve used this week, and is incredible, is OpenInterpreter from @heyitskillian It’s incredibly easy to install and run, and behaves like OpenAI Code Interpreter (renamed to Advanced Data Analytics) but on your computer, and is able to do things like control your apps, lower volume, edit images/files and tons morepip install open-interpreterAnd that’s it! Give it a try (and you have to approve each command that it runs) It’s a great agent, and hopefully we’ll get Killian to chat with us about it on next ThursdAI!LangChain hub has launched (link)If you’re into LangChain, and even if you aren’t, it’s undeniable the weight LangChain has in the ai engineer industry! They have a connector for everything, tons of folks use them, and they have raised a bunch of funding. They have just launched their new LangChain Hub and it’s exciting! Many folks are sharing their best prompts on there, and ways to work with langchain, with upvotes and sharable links! Also, worth nothing that our friends swyx and Alessio from Latent Space have recently released an episode with Harrison on Latent space, and it’s WELL worth listening (and reading) as swyx did a deep dive into Landchain, it’s nay-sayers and everything in between! Check it out below : Thank you, see you next time (with some incredible personal news I’ll have to share)ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app