AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Google released Gemini, a new AI model that shows promising performance on benchmarks and multi-model capabilities. Gemini comes in different versions, including Gemini Ultra, Gemini Pro, and a mini-model for on-device inferencing. The release has generated both excitement and criticism, with discussions around its performance compared to GPT-4 and the marketing tactics used. Gemini's video demos showcase impressive multi-modal capabilities, where it generates images and text based on given prompts. The integration of Gemini into Google's ecosystem, like Gmail and Google Docs, provides a new alternative for enterprises and signals Google's commitment to AI innovation.
Google also announced AlphaCode, a model that excels in coding tasks and significantly outperforms other models. AlphaCode utilizes brute force and smart tree search methods to generate high-quality coding outputs. The provided demonstrations highlight AlphaCode's ability to create on-the-fly interfaces, write and execute PRDs (Product Requirement Documents), and generate real-time UI based on tasks. The integration of AlphaCode into Google's product offerings may open up new possibilities for automated interface development and application creation.
Entropic recently addressed concerns regarding long context prompting. They proposed a simple fix by prepending a specific prompt to the user's request. This approach significantly improved the model's ability to recall and correctly answer questions from extensive contexts. The introduction of this prompt helped counter the skepticism raised by some users regarding long context prompting and demonstrated a remarkable increase in the model's performance during evaluation.
Google's release of Gemini and AlphaCode showcases their commitment to developing advanced AI models and providing alternatives to existing offerings. The focus on multi-model capabilities and improved coding abilities enhances Google's position in the AI industry. The developments in long context prompting also provide new avenues for users to interact with AI models effectively. These advancements have sparked discussions among users, researchers, and enthusiasts, highlighting the continuous progress in the AI landscape.
Mamba is a new non-transformer architecture that shows promise. It is trained from scratch and has faster training and inference speeds compared to transformers. While the model sizes are currently smaller, the results indicate that Mamba is able to learn more from the same amount of data compared to transformers. The research behind Mamba is ongoing, but it could potentially lead to faster and more efficient models for various applications.
The Open Source AI meetup featured several announcements and discussions. Among the highlights were presentations from News Research, showcasing their Forge platform and mobile app support, and Olama, presenting their multi-model vision and chat with documents capabilities. Other topics included advancements in training pipeline optimization and the introduction of RWKV, an architecture that enables more efficient conversations with longer context sequences. The meetup served as a platform for knowledge-sharing and collaboration among experts in the AI community.
Imagine.meta.com, a standalone platform from META, allows users to generate AI art using the Emu AI art generator. While not officially announced as open source, the platform shows the potential for future open source projects from META. Pica Labs, another notable advancement, gained attention for their impressive trailer showcasing their image generation capabilities. Pica Labs has seen increased usage and positive response from the AI community.
Frankenstein and Mamba are two new architectures showing promise in the field of AI. Frankenstein combines different layers from multiple models, achieving impressive results. Mamba, a non-transformer architecture, offers faster speeds at the beginning of conversations, with even greater speed improvements as the context length increases. While both architectures require further training and fine-tuning, the future looks promising for these innovative approaches to AI.
ThursdAI December 7th TL;DR
Greetins of the day everyone (as our panelist Akshay likes to sometimes say) and Happy first candle of Hannukah for those who celebrate! ๐
I'm writing this newsletter from the back of an Waymo self driving car, in SF, as I'm here for just a few nights (again) to participate in the Open Source AI meetup, that was co-organized by Ollama and Nous Research, Alignment Labs and hosted by A16Z in their SF office.
This event was the highlight of this trip, it was quite a packed meetup in terms of AI talent, and I got to meet quite a few ThursdAI listeners, mutuals on X, and AI celebs
We also recorded the podcast this week from the arena, thanks to Swyx and Alessio from latentspace pod for hosting ThursdAI this week form their newly built out pod studio (and apologies everyone for the rocky start and the cutting out issues, luckily we had local recordings so the pod version sounds good!)
Google finally teases Gemini Ultra (and gives us Pro)
What a week folks, what a week, as I was boarding the flight to SF to meet with Open Source folks, Google announced (finally!) the release of Gemini, their long rumored, highly performant model with a LOT of fanfare!
Blogposts authored by Sundar and Demis Hassabis, beautiful demos of unseen before capabilities, comparisons to GPT-4V which the Ultra version of Gemini outperforms on several benchmarks, and rumors that Sergey Brin, the guy who's net worth is north of 100Bn is listed as the core contributor on the paper and reports on benchmarks (somewhat skewed) show Ultra beaing GPT-4 on many coding and reasoning evaluations!
We've been waiting for Gemini for such a long time, that we spend the first hour of the podcast discussing it and it's implications basically. We were also fairly disillusioned by the sleight of hand tricks Google marketing department played with the initial launch video, where it purportedly shows Gemini being a fully multi-modal AI, that reacts to a camera feed + user voice in real time, when in fact, it was quickly clear (from their developer blog) that it was not video+audio but rather images+text (the same two modalities we already have in GPT-4V and given some prompting, it's quite easy to replicate most of it. We've also discussed how we again, got a tease, and not even a waitlist for the "super cool" stuff, while getting a GPT3.5 level of a model today in Bard upgrade.
To me, the most mind-blowing demo video was actually one of the other ones in the announcement, which showed that Gemini has agentic behavior in understanding user intent, asks for clarifications, creates a PRD (Product Requirement Document) for itself, and then, generates Flutter code to create a UI on the fly, based on what the use asked it! This is pretty wild, as we all should expect that Just In Time UI will come to many of these big models!
Tune in to the episode if you want to hear more takes, opinions and frustrations as none of us actually got to use Gemini Ultra, and the experience with Gemini Pro (which is now live on Bard) was at least for me, underwhelming
This weeks buzz (What I learned in Weights & Biases this week)
I actually had a blast talking about W&B to many of the open source and fine-tuners community this and past week. I already learned that W&B doesn't only help huge companies (like OpenAI, Anthropic, Meta, Mistral and tons more) to train their foundational models, but is widely used by the open source fine-tuners community as well. I've met with folks like Wing Lian (aka Caseus), maintainer of Axolotl, who uses W&B together with Axolotl, and got to geek out about W&B, met with Teknium and LDJ (Nous Research, Alignment Labs) and in fact, got LDJ to walk me through some of the ways he uses and used W&B in the past, including how it's used to track model runs, show artifacts in the middle of runs, and run mini-benchmarks and evaluations for LLMS as they finetune.
If you're interested in this, here's an episode of a new โseriesโ of me learning publicly (from scratch) so if you want to learn from scratch with me, welcome to check it out:
Open Source AI in SF meetup
This meetup was the reason I flew in to SF, I was invited by dear friends in the open source community, and couldn't miss it! There was such a talent density there, it was quite remarkable. Andrej Karpathy who's video about LLM I just finished re-watching, Jeremy Howard, folks from Mistral, A16Z, and tons of other startups, open source collectives, and enthusiasts, all came together to listen to a few lightning talks, but mostly to mingle and connect and share ideas.
Nous Research announced that they are a company (not anymore just a discord collective of rag tag open sourcers!) and that they are working on Forge, a product offering of theirs, that runs local AI, has a platform for agent behavior, and is very interesting to keep an eye for.
I've spent most of my time going around, hearing what folks are using (Hint: a LOT of axolotl), what they are finetuning (mostly Mistral) and what is the future (everyone's waiting for next Llama or next Mistral). Funnily enough, there was not a LOT of conversation about Gemini there at all, at least not among the folks that I talked to!
Overall this was really really fun, and of course, being in SF, at least for me, especially now as an AI Evangelist, feels like coming home! So expect more trip reports!
Here's a recap and a few more things that happened this week in AI:
* Open Source LLMs
* Apple released MLX - machine learning framework on apple silicon
* Mamba - transformers alternative architecture from Tri Dao
* Big CO LLMs + APIs
* Google Gemini beats GPT-4V on a BUNCH of metrics, shows cool fake multimodal demo
* Demo was embellished per the google developer blog
* Multimodal capabilities are real
* Dense model vs MOE
* Multimodel on the output as well
* For 5-shot, GPT-4 outperforms Gemini Ultra on MMLU
* AlphaCode 2 is here and Google claims it performs better than 85% competitive programmers in the world and it performs even better, collaborating with a competitive programmer.
* Long context prompting for Claude 2 shows 27% - 98% increase by using prompt techniques
* X.ai finally released grok to many premium+ X subscribers. (link)
* Vision
* OpenHermes Vision finally released - something there was not right there, back to drawing board
* Voice
* Apparently Gemini beats Whisper v3! As part of a unified model no less
* AI Art & Diffusion
* Meta - releases a standalong EMU AI art generator websites https://imagine.meta.com
* Tools
* Jetbrains finally releases their own AI native companion + subscription
That's it for me this week, this Waymo ride took extra long as it seems that in SF, during night rush hour, AI is at a disadvatage against human drivers. Maybe I'll take an Uber next time.
P.S - hereโs Grok roasting ThursdAI
See you next week, and if you've scrolled all the way here for the emoji of the week, it's hidden in the middle of the article, send me that to let me know you read through ๐
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode