AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Discussion on Multi-modal Support and AI Town System
This chapter covers the announcement of multi-modal support by Lama and Open Interpreter, as well as the skepticism towards this method. It also discusses the AI town system showcased by Eugene from RWKV and its advantages in longer context conversations compared to transformers.
ThursdAI December 7th TL;DR
Greetins of the day everyone (as our panelist Akshay likes to sometimes say) and Happy first candle of Hannukah for those who celebrate! 🕎
I'm writing this newsletter from the back of an Waymo self driving car, in SF, as I'm here for just a few nights (again) to participate in the Open Source AI meetup, that was co-organized by Ollama and Nous Research, Alignment Labs and hosted by A16Z in their SF office.
This event was the highlight of this trip, it was quite a packed meetup in terms of AI talent, and I got to meet quite a few ThursdAI listeners, mutuals on X, and AI celebs
We also recorded the podcast this week from the arena, thanks to Swyx and Alessio from latentspace pod for hosting ThursdAI this week form their newly built out pod studio (and apologies everyone for the rocky start and the cutting out issues, luckily we had local recordings so the pod version sounds good!)
Google finally teases Gemini Ultra (and gives us Pro)
What a week folks, what a week, as I was boarding the flight to SF to meet with Open Source folks, Google announced (finally!) the release of Gemini, their long rumored, highly performant model with a LOT of fanfare!
Blogposts authored by Sundar and Demis Hassabis, beautiful demos of unseen before capabilities, comparisons to GPT-4V which the Ultra version of Gemini outperforms on several benchmarks, and rumors that Sergey Brin, the guy who's net worth is north of 100Bn is listed as the core contributor on the paper and reports on benchmarks (somewhat skewed) show Ultra beaing GPT-4 on many coding and reasoning evaluations!
We've been waiting for Gemini for such a long time, that we spend the first hour of the podcast discussing it and it's implications basically. We were also fairly disillusioned by the sleight of hand tricks Google marketing department played with the initial launch video, where it purportedly shows Gemini being a fully multi-modal AI, that reacts to a camera feed + user voice in real time, when in fact, it was quickly clear (from their developer blog) that it was not video+audio but rather images+text (the same two modalities we already have in GPT-4V and given some prompting, it's quite easy to replicate most of it. We've also discussed how we again, got a tease, and not even a waitlist for the "super cool" stuff, while getting a GPT3.5 level of a model today in Bard upgrade.
To me, the most mind-blowing demo video was actually one of the other ones in the announcement, which showed that Gemini has agentic behavior in understanding user intent, asks for clarifications, creates a PRD (Product Requirement Document) for itself, and then, generates Flutter code to create a UI on the fly, based on what the use asked it! This is pretty wild, as we all should expect that Just In Time UI will come to many of these big models!
Tune in to the episode if you want to hear more takes, opinions and frustrations as none of us actually got to use Gemini Ultra, and the experience with Gemini Pro (which is now live on Bard) was at least for me, underwhelming
This weeks buzz (What I learned in Weights & Biases this week)
I actually had a blast talking about W&B to many of the open source and fine-tuners community this and past week. I already learned that W&B doesn't only help huge companies (like OpenAI, Anthropic, Meta, Mistral and tons more) to train their foundational models, but is widely used by the open source fine-tuners community as well. I've met with folks like Wing Lian (aka Caseus), maintainer of Axolotl, who uses W&B together with Axolotl, and got to geek out about W&B, met with Teknium and LDJ (Nous Research, Alignment Labs) and in fact, got LDJ to walk me through some of the ways he uses and used W&B in the past, including how it's used to track model runs, show artifacts in the middle of runs, and run mini-benchmarks and evaluations for LLMS as they finetune.
If you're interested in this, here's an episode of a new “series” of me learning publicly (from scratch) so if you want to learn from scratch with me, welcome to check it out:
Open Source AI in SF meetup
This meetup was the reason I flew in to SF, I was invited by dear friends in the open source community, and couldn't miss it! There was such a talent density there, it was quite remarkable. Andrej Karpathy who's video about LLM I just finished re-watching, Jeremy Howard, folks from Mistral, A16Z, and tons of other startups, open source collectives, and enthusiasts, all came together to listen to a few lightning talks, but mostly to mingle and connect and share ideas.
Nous Research announced that they are a company (not anymore just a discord collective of rag tag open sourcers!) and that they are working on Forge, a product offering of theirs, that runs local AI, has a platform for agent behavior, and is very interesting to keep an eye for.
I've spent most of my time going around, hearing what folks are using (Hint: a LOT of axolotl), what they are finetuning (mostly Mistral) and what is the future (everyone's waiting for next Llama or next Mistral). Funnily enough, there was not a LOT of conversation about Gemini there at all, at least not among the folks that I talked to!
Overall this was really really fun, and of course, being in SF, at least for me, especially now as an AI Evangelist, feels like coming home! So expect more trip reports!
Here's a recap and a few more things that happened this week in AI:
* Open Source LLMs
* Apple released MLX - machine learning framework on apple silicon
* Mamba - transformers alternative architecture from Tri Dao
* Big CO LLMs + APIs
* Google Gemini beats GPT-4V on a BUNCH of metrics, shows cool fake multimodal demo
* Demo was embellished per the google developer blog
* Multimodal capabilities are real
* Dense model vs MOE
* Multimodel on the output as well
* For 5-shot, GPT-4 outperforms Gemini Ultra on MMLU
* AlphaCode 2 is here and Google claims it performs better than 85% competitive programmers in the world and it performs even better, collaborating with a competitive programmer.
* Long context prompting for Claude 2 shows 27% - 98% increase by using prompt techniques
* X.ai finally released grok to many premium+ X subscribers. (link)
* Vision
* OpenHermes Vision finally released - something there was not right there, back to drawing board
* Voice
* Apparently Gemini beats Whisper v3! As part of a unified model no less
* AI Art & Diffusion
* Meta - releases a standalong EMU AI art generator websites https://imagine.meta.com
* Tools
* Jetbrains finally releases their own AI native companion + subscription
That's it for me this week, this Waymo ride took extra long as it seems that in SF, during night rush hour, AI is at a disadvatage against human drivers. Maybe I'll take an Uber next time.
P.S - here’s Grok roasting ThursdAI
See you next week, and if you've scrolled all the way here for the emoji of the week, it's hidden in the middle of the article, send me that to let me know you read through 😉
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode