Advancements in AI Models

The speakers discuss the improvements in speed and efficiency of AI models, including the Stripe Hyena and Mamba architectures. They also mention the fine-tuning of these models and the introduction of a new decoding method called ego. The chapter highlights the advancements in the field, such as smaller models and quantization techniques.

Play episode from 28:08

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Wow what a week. I think I’ve reached to a level that I’m not phased by incredible weeks or days that happen in AI, but I… guess I still have much to learn!

TL;DR of everything we covered (aka Show Notes)

* Open Source LLMs

* Mixtral MoE - 8X7B experts dropped with a magnet link again (Announcement, HF, Try it)

* Mistral 0.2 instruct (Announcement, HF)

* Upstage Solar 10B - Tops the HF leaderboards (Announcement)

* Together -Striped Hyena architecture and new models (Announcement)

* EAGLE - a new decoding method for LLMs (Announcement, Github)

* Deci.ai - new SOTA 7B model

* Phi 2.0 weights are available finally from Microsoft (HF)

* QuiP - LLM quantization & Compression (link)

* Big CO LLMs + APIs

* Gemini Pro access over API (Announcement, Thread)

* Uses character pricing not token

* Mistral releases API inference server - La Platforme (API docs)

* Together undercuts Mistral with serving Mixtral by 70% and announces OAI compatible API

* OpenAI is open sourcing again - Releasing Weak-2-strong generalization paper and github! (announcement)

* Vision

* Gemini Pro api has vision AND video capabilities (API docs)

* AI Art & Diffusion

* Stability announces Zero123 - Zero Shot image to 3d model (Thread)

* Imagen 2 from google (link)

* Tools & Other

* Optimus from Tesla is coming, and it looks incredible

This week started on Friday, as we saw one of the crazier single days in the history of OSS AI that I can remember, and I’ve been doing this now for .. jesus, 9 months!

In a single say, we saw a new Mistral model release called Mixtral, which is a Mixture of Experts (like GPT4 is rumored to be) of 8x7B Mistrals, and beats GPT3.5, we saw a completely new architecture that competes with Transformers called HYENA from Tri Dao and Together.xyz + 2 new models trained with that architecture, we saw a new SoTA 2-bit quantization method called QuiP from cornell AND a new 3x faster decoding method for showing tokens to users after an LLM has done “thinking”.

And the best thing? All those advancements are stackable! What a day!

Then I went to NeurIPS2023 (which is where I am right now, writing these words!), which I cover at length at the second part of the podcast, but figured I’d write about it here as well, since it was such a crazy experience.

NeurIPS is the biggest AIML conference, I think they estimated 15K people from all over the world attending! Of course this brings many companies to sponsor, raise booths, give out swag and try to record!

Of course with my new position at Weights & Biases I had to come as well and experience this for myself!

Many of the attendees are customers of ours, and I was not expecting this amount of love given, just an incredible stream of people coming up to the booth, and saying how much they love the product!

So I manned the booth, did interviews and live streams, and connected with a LOT of folks and I gotta say, this whole NeurIPS thing is quite incredible from the ability to meet people!

I hung out with folks from Google, Meta, Microsoft, Apple, Weighs & Biases, Stability, Mistral, HuggingFace and PHD students and candidates from most of the top universities in the world, from KAIST to MIT and Stanford, Oslo and Shaghai, it's really a worldwide endeavor!

I also got to meet many of the leading figures in AI, all of whom I had to come up to and say hi, shake their hand, introduce myself (and ThursdAI) and chat about what they or their team released and presents at the conference! Truly an unforgettable experience!

Of course, This Weeks’ Buzz is that, everyone here loves W&B, from the PHD students, to literally every big LLM lab! They all came up to us (yes yes, even researches at Google who kinda low-key hate their internal tooling) and told us how awesome the experience was! (besides Xai folks, Jimmy wasn’t that impressed haha) and of course I got to practice the pitch so many times, since I manned the W&B booth!

Please do listen to the above podcast, there’s so much detail that’s in there that doesn’t get up on the newsletter, as it’s impossible to cover all, but it was a really fun conversation, including my excited depiction of this weeks NOLA escapades!

I think I’ll end here, cause I can go on and on about the parties (There were literally 7 at the same time last night, Google, Stability, OpenAI, Runway, and I’m sure there were a few more I wasn’t invited to!) and about New Orleans food (it’s my first time here, I ate a soft shell deep fried crab and turtle soup!) and I still have the poster sessions to go to and workshops! I will report more on my X account and the Weights & Biases X account, so stay tuned for that there, and as always, thanks for tuning in, reading and sharing ThursdAI with your friends 🫡

P.S - Still can’t really believe I get to do this full time now and share this journey with all of you, bringing you all with me to SF, and now NeurIPS and tons of other places and events in the future!

— Alex Volkov, AI Evangelist @ Weights & Biases, Host of ThursdAI 🫡

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books