📅 ThursdAI - Apr 18th - 🎉 Happy LLama 3 day + Bigxtral instruct, WizardLM gives and takes away + Weights & Biases conference update

ThursdAI - The top AI news from the past week

00:00

Exploring the Performance of the Llama Three Model in Different Language Capabilities

This chapter delves into the speaker's experience with testing and evaluating the newly released llama three model across various language capabilities, with a particular focus on German. Surprising results from different model sets, role play tests, and manual checks on popular models are shared, emphasizing the importance of real-world evaluations. The conversation also covers challenges faced by a startup founder CEO in deploying the llama three model, testing against other models, and working on extending the model's capabilities for better performance.

Play episode from 45:34

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Happy LLama 3 day folks! After a lot of rumors, speculations, and apparently pressure from the big Zuck himself, we finally can call April 18th, 2024, LLaMa 3 day!

I am writing this, from a lobby of the Mariott hotel in SF, where our annual conference is happening called Fully Connected, and I recorded today's episode from my hotel room. I really wanna shout out how awesome it was to meet folks who are listeners of the ThursdAI pod and newsletter subscribers, participate in the events, and give high fives.

During our conference, we had the pleasure to have Joe Spisak, the Product Director of LLaMa at Meta, to actually announce LLaMa3 on stage! It was so exhilarating, I was sitting in the front row, and then had a good chat with Joe outside of the show 🙌

The first part of the show was of course, LLaMa 3 focused, we had such a great time chatting about the amazing new 8B and 70B models we got, and salivating after the announced but not yet released 400B model of LLaMa 3 😮

We also covered a BUNCH of other news from this week, that was already packed with tons of releases, AI news and I was happy to share my experiences running a workshop a day before our conference, with focus on LLM evaluations. (If there's an interest, I can share my notebooks and maybe even record a video walkthrough, let me know in the comments)

Ok let's dive in 👇

Happy LLama 3 day 🔥

The technical details

Meta has finally given us what we're all waiting for, an incredibly expensive (2 clusters of 24K H100s over 15 Trillion tokens) open weights models, the smaller 8B one and the larger 70B one.

We got both instruction fine tune and base models, which are great for finetuners, and worth mentioning that this is a dense model (not a mixture of experts, all the parameters are accessible for the model during inference)

It is REALLY good at benchmarks, with the 7B model beating the previous (LLaMa 2 70B) on pretty much all benchmarks, and the new 70B is inching on the bigger releases from the past month or two, like Claude Haiku and even Sonnet!

The only downsides are the 8K context window + non multimodality, but both are coming according to Joe Spisak who announced LLama3 on stage at our show Fully Connected 🔥

I was sitting in the front row and was very excited to ask him questions later!

By the way, Joe did go into details they haven't yet talked about pulblicly (see? I told you to come to our conference! and some of you did!) and I've been live-tweeting his whole talk + the chat outside with the "extra" spicy questions and Joes winks haha, you can read that thread here

The additional info

Meta has also partnered with both Google and Bing (take that OpenAI) and inserted LLama 3 into the search boxes of Facebook, Instagram, Messenger and Whatsapp plus deployed it to a new product called meta.ai (you can try it there now) and is now serving LLama 3 to more than 4 Billion people across all of those apps, talk about compute cost!

Llama 3 also has a new Tokenizer (that Joe encouraged us to "not sleep on") and a bunch of new security tools like Purple LLama and LLama Guard. PyTorch team recently released finetuning library called TorchTune is now supporting LLama3 finetuning natively out of the box as well (and integrates Wandb as it's first party experiment tracking tool)

If you'd like more details, directly from Joe, I was live tweeting his whole talk, and am working at getting the slides from our team. We'll likely have a recording as well, will post it as soon as we have it.

Here's a TL;DR (with my notes for the first time) of everything else we talked about, but given today is LLaMa day, and I still have to do fully connected demos, I will "open source" my notes and refer you to the podcast episode to hear more detail about everything else that happened today 🫡

TL;DR of all topics covered:

* Meta releases LLama 3 -8B, 70B and later 400B (Announcement, Models, Try it, Run Locally)

* Open Source LLMs

* Meta LLama 3 8B, 70B and later 400B (X, Blog)

* Trained 15T tokens!

* 70B and 8B modes released + Instruction finetuning

* 8K context length , not multi modal

* 70B gets 82% on MMLU and 81.7% on HumanEval

* 128K vocab tokenizer

* Dense model not MoE

* Both instruction tuned on human annotated datasets

* Open Access

* The model already uses RoPe

* Bigxtral instruct 0.1 (Blog, Try it)

* Instruct model of the best Apache 2 model around

* Release a comparison chart that everyone started "fixing"

* 🤖 Mixtral 8x22B is Mistral AI's latest open AI model, with unmatched performance and efficiency

* 🗣 It is fluent in 5 languages: English, French, Italian, German, Spanish

* 🧮 Has strong math and coding capabilities

* 🧠 Uses only 39B parameters out of 141B total, very cost efficient

* 🗜 Can recall info from large documents thanks to 64K token context window

* 🆓 Released under permissive open source license for anyone to use

* 🏆 Outperforms other open models on reasoning, knowledge and language benchmarks

* 🌐 Has strong multilingual abilities, outperforming others in 4 languages

* 🧪 Excellent basis for customization through fine-tuning

* New Tokenizer from Mistral (Docs)

* Focusing on Tool Use with tokens 🔥

* WizardLM-2 8x22B, 70B and 7B (X, HF)

* Released it and then pulled it back from HF and Github due to microsoft toxicity not passing

* Big CO LLMs + APIs

* OpenAI gives us Batch API + Assistants API v2

* Batch is 50% cost and win win win

* Assistants API V2 - new RAG

* new file search tool

* up to 10,000 files per assistant

* new vector store

* Reka gives us Reka Core (X, Try)

* Multimodal that understands video as well

* 20 people team

* Video understanding is very close to Gemini

* 128K context

* Core has strong reasoning abilities including for language, math and complex analysis.

* 32 languages support

* HuggingFace ios chat bot now

* This weeks Buzz

* Me + team led a workshop a day before the conference (Workshop Thread)

* Fully Connected in SF was an incredible success, over 1000 AI attendies + Meta AI announcement on stage 🔥

* PyTorch new TorchTune finetuning library with first class WandB support (X)

* Vision & Video

* Microsoft VASA-1 animated avatars (X, Blog)

* Amazing level of animation from 1 picture + Sound

* Harry Potter portraits are here

* They likely won't release this during Election year

* Looks very good ,close to EMO but no code

* 📺 Videos show faces speaking naturally with head movements and lip sync

* 🔬 Researchers are exploring applications in education, accessibility and more

* HuggingFace updates IDEFICS2 8B VLM (X, HF)

* Apache 2 license

* Competitive with 30B models

* 12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1)

* > 10x fewer parameters than Idefics 1

* Supports image resolution up to 980 x 980+

* Better OCR capabilities (thanks to more than 6TB of OCR pre-training data)

* Adobe shows Firefly video + SORA support (X)

* Voice & Audio

* Rewind AI is now Limitless (X)

* New service & Brand name

* Transcription to you

* Hardware device that looks sleek

* 100hours

* Privacy support in cloud

* AI Art & Diffusion & 3D

* Stability - Stable Diffusion 3 is here

* Available via API only

* Partnered with Fireworks HQ for the release

* Needs stability AI membership to use / access $$

* Big step up in composition and notorious issues like hands, "AI faces" etc. (from

* Seems to prefer simpler prompts.

* Way more copyright-friendly. It's hard to get any kind of brands/logos.

* Text is amazing.

* Others

* New AIrChat with amazing transcription is out, come join us in our AI corner there

* Humane AI pin was almost killed by MKBHD review

* Rabbit reviews incoming

That's all for this week, next week we have an amazing guest, see you then! 🫡

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books