AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
DeepMind recently made headlines with significant advancements in AI, focusing on their new model, AlphaProteo, which is designed for protein binder design. This model demonstrates 300 to 300-fold improvements in binding affinities and success rates compared to previous models. These improvements in protein interactions could lead to significant advancements in drug discovery and biomedical applications. The development showcases DeepMind’s continued commitment to harnessing AI for complex biological challenges.
SSI Inc., co-founded by notable figures in the AI community, recently raised an impressive $1 billion to fuel its mission in AI innovation. With a focus on developing generative AI technologies, SSI Inc. aims to build products over two years before public release, allowing them to refine their models and techniques without market pressure. This lengthy development period sets them apart from many competitors, emphasizing quality and detailed research. The substantial fundraise highlights investor confidence in SSI's vision and potential impact on the industry.
The discussion around open source AI continues to gain momentum, especially with new models like YeCoder and the RWKV framework making significant waves in the community. YeCoder showcases enhanced coding capabilities, outperforming established models through a massive context window and innovative training data approaches. Additionally, RWKV’s integration into Windows demonstrates the growing trust in open source solutions for accessible, scalable AI tools. This trend reinforces the importance of open collaboration and transparency in AI development.
This week saw impressive advancements in multimodal AI, with various companies focusing on integrating visuals and language capabilities into their models. Remarkably, Cohere's Command R and R+ models have optimized performance while reducing latency, significant for real-time applications. The models have shown improved results in areas including code generation and structured outputs, inviting more developers to explore their functionalities. These enhancements position multimodal AI as increasingly vital across diverse industries, including software development and content creation.
Entropic made significant strides by offering a new enterprise model with an impressive 500,000 token context window, enhancing the ability to handle extensive data interactions. Additionally, the integration of Claude into Amazon Alexa was announced, marking a shift from Amazon's in-house models to leveraging Entropic's technologies. This transition underscores the growing demand for superior AI capabilities in consumer products. Furthermore, concerns about injected instructions in Claude have surfaced, raising questions about transparency and user trust.
Replit introduced its new AI agent capability, allowing users to streamline coding tasks by generating code based on user prompts. This feature aims to simplify the coding process by allowing users to prototype applications swiftly while managing backend deployment seamlessly. The integration of such AI capabilities within Replit's platform could galvanize more users to adopt coding through a more interactive and less intimidating interface. Such advancements align with the growing trend of using AI to assist developers and make programming more accessible.
Google enhanced its AI Studio by introducing structured outputs and improved function calling capabilities, enabling developers to generate reliable JSON outputs easily. This update signifies a major upgrade for those utilizing Google's APIs, making them more user-friendly and efficient. The structured output feature will be particularly beneficial for developers who require precise data formats for applications. These updates position Google's AI offerings more competitively and support a broader range of development needs.
Welcome back everyone, can you believe it's another ThursdAI already? And can you believe me when I tell you that friends of the pod Matt Shumer & Sahil form Glaive.ai just dropped a LLama 3.1 70B finetune that you can download that will outperform Claude Sonnet 3.5 while running locally on your machine?
Today was a VERY heavy Open Source focused show, we had a great chat w/ Niklas, the leading author of OLMoE, a new and 100% open source MoE from Allen AI, a chat with Eugene (pico_creator) about RWKV being deployed to over 1.5 billion devices with Windows updates and a lot more.
In the realm of the big companies, Elon shook the world of AI by turning on the biggest training cluster called Colossus (100K H100 GPUs) which was scaled in 122 days 😮 and Anthropic announced that they have 500K context window Claude that's only reserved if you're an enterprise customer, while OpenAI is floating an idea of a $2000/mo subscription for Orion, their next version of a 100x better chatGPT?!
TL;DR
* Open Source LLMs
* Matt Shumer / Glaive - Reflection-LLama 70B beats Claude 3.5 (X, HF)
* Allen AI - OLMoE - first "good" MoE 100% OpenSource (X, Blog, Paper, WandB)
* RWKV.cpp is deployed with Windows to 1.5 Billion devices
* MMMU pro - more robust multi disipline multimodal understanding bench (proj)
* 01AI - Yi-Coder 1.5B and 9B (X, Blog, HF)
* Big CO LLMs + APIs
* Replit launches Agent in beta - from coding to production (X, Try It)
* Ilya SSI announces 1B round from everyone (Post)
* Cohere updates Command-R and Command R+ on API (Blog)
* Claude Enterprise with 500K context window (Blog)
* Claude invisibly adds instructions (even via the API?) (X)
* Google got structured output finally (Docs)
* Amazon to include Claude in Alexa starting this October (Blog)
* X ai scaled Colossus to 100K H100 GPU goes online (X)
* DeepMind - AlphaProteo new paper (Blog, Paper, Video)
* This weeks Buzz
* Hackathon did we mention? We're going to have Eugene and Greg as Judges!
* AI Art & Diffusion & 3D
* ByteDance - LoopyAvatar - Audio Driven portait avatars (Page)
Open Source LLMs
Reflection Llama-3.1 70B - new 👑 open source LLM from Matt Shumer / GlaiveAI
This model is BANANAs folks, this is a LLama 70b finetune, that was trained with a new way that Matt came up with, that bakes CoT and Reflection into the model via Finetune, which results in model outputting its thinking as though you'd prompt it in a certain way.
This causes the model to say something, and then check itself, and then reflect on the check and then finally give you a much better answer. Now you may be thinking, we could do this before, RefleXion (arxiv.org/2303.11366) came out a year ago, so what's new?
What's new is, this is now happening inside the models head, you don't have to reprompt, you don't even have to know about these techniques! So what you see above, is just colored differently, but all of it, is output by the model without extra prompting by the user or extra tricks in system prompt. the model thinks, plans, does chain of thought, then reviews and reflects, and then gives an answer!
And the results are quite incredible for a 70B model 👇
Looking at these evals, this is a 70B model that beats GPT-4o, Claude 3.5 on Instruction Following (IFEval), MATH, GSM8K with 99.2% 😮 and gets very close to Claude on GPQA and HumanEval!
(Note that these comparisons are a bit of a apples to ... different types of apples. If you apply CoT and reflection to the Claude 3.5 model, they may in fact perform better on the above, as this won't be counted 0-shot anymore. But given that this new model is effectively spitting out those reflection tokens, I'm ok with this comparison)
This is just the 70B, next week the folks are planning to drop the 405B finetune with the technical report, so stay tuned for that!
Kudos on this work, go give Matt Shumer and Glaive AI a follow!
Allen AI OLMoE - tiny "good" MoE that's 100% open source, weights, code, logs
We've previously covered OLMO from Allen Institute, and back then it was obvious how much commitment they have to open source, and this week they continued on this path with the release of OLMoE, an Mixture of Experts 7B parameter model (1B active parameters), trained from scratch on 5T tokens, which was completely open sourced.
This model punches above its weights on the best performance/cost ratio chart for MoEs and definitely highest on the charts of releasing everything.
By everything here, we mean... everything, not only the final weights file; they released 255 checkpoints (every 5000 steps), the training code (Github) and even (and maybe the best part) the Weights & Biases logs!
It was a pleasure to host the leading author of the OLMoE paper, Niklas Muennighoff on the show today, so definitely give this segment a listen, he's a great guest and I learned a lot!
Big Companies LLMs + API
Anthropic has 500K context window Claude but only for Enterprise?
Well, this sucks (unless you work for Midjourney, Airtable or Deloitte). Apparently Anthropic has been sitting on Claude that can extend to half a million tokens in the context window, and decided to keep it to themselves and a few trial enterprises, and package it as an Enterprise offering.
This offering now includes, beyond just the context window, also a native Github integration, and a few key enterprise features like access logs, provisioning and SCIM and all kinds of "procurement and CISO required" stuff enterprises look for.
To be clear, this is a great move for Anthropic, and this isn't an API tier, this is for their front end offering, including the indredible artifacts tool, so that companies can buy their employees access to Claude.ai and have them be way more productive coding (hence the Github integration) or summarizing (very very) long documents, building mockups and one off apps etc'
Anthropic is also in the news this week, because Amazon announced that it'll use Claude as the backbone for the smart (or "remarkable" as they call it) Alexa brains coming up in October, which, again, incredible for Anthropic distribution, as there are maybe 100M Alexa users in the world or so.
Prompt injecting must stop!
And lastly, there have been mounting evidence, including our own Wolfram Ravenwolf that confirmed it, that Anthropic is prompt injecting additional context into your own prompts, in the UI but also via the API! This is awful practice and if anyone from there reads this newsletter, please stop or at least acknowledge. Claude apparently just... thinks that it's something my users said, when in fact, it's some middle layer of anthropic security decided to just inject some additional words in there!
XAI turns on the largest training GPU SuperCluster Colossus - 100K H100 GPUS
This is a huge deal for AI, specifically due to the time this took and the massive massive scale of this SuperCluster. SuperCluster means all these GPUs sit in one datacenter, drawing from the same power-grid and can effectively run single training jobs.
This took just 122 days for Elon and the XAI team to go from an empty warehouse in Memphis to booting up an incredible 100K H100, and they claim that they will double this capacity by adding 50K H200 in the next few months. As Elon mentioned when they released Grok2, it was trained on 15K, and it matched GPT4!
Per SemiAnalisys, this new Supercluster can train a GPT-4 level model in just 4 days 🤯
XAI was founded a year ago, and by end of this year, they plan for Grok to be the beast LLM in the world, and not just get to GPT-4ish levels, and with this + 6B investment they have taken in early this year, it seems like they are well on track, which makes some folks at OpenAI reportedly worried
This weeks buzz - we're in SF in less than two weeks, join our hackathon!
This time I'm very pleased to announce incredible judges for our hackathon, the spaces are limited, but there's still some spaces so please feel free to sign up and join us
I'm so honored to announce that we'll have Eugene Yan (@eugeneyan), Greg Kamradt (@GregKamradt) and Charles Frye (@charles_irl) on the Judges panel. 🤩 It'll be incredible to have these folks see what hackers come up with, and I'm excited as this comes closer!
Replit launches Agents beta - a fully integrated code → deployment agent
Replit is a great integrated editing environment, with database and production in 1 click and they've had their LLMs trained on a LOT of code helping folks code for a while.
Now they are launching agents, which seems very smart from them, given that development is much more than just coding. All the recent excitement we see about Cursor, is omitting the fact that those demos are only working for folks who already know how to set up the environment, and then there's the need to deploy to production, maintain.
Replit has that basically built in, and now their Agent can build a plan and help you build those apps, and "ship" them, while showing you what they are doing. This is massive, and I can't wait to play around with this!
The additional benefit of Replit is that they nailed the mobile app experience as well, so this now works from mobile, on the go!
In fact, as I was writing this, I got so excited that I paused for 30 minutes, payed the yearly subscription and decided to give building an app a try!
The fact that this can deploy and run the server and the frontend, detect errors, fix them, and then also provision a DB for me, provision Stripe, login buttons and everything else is quite insane.
Can't wait to see what I can spin up with this 🔥 (and show all of you!)
Loopy - Animated Avatars from ByteDance
A new animated avatar project from folks at ByteDance just dropped, and it’s WAY clearer than anything we’ve seen before, like EMO or anything else. I will just add this video here for you to enjoy and look at the earring movements, vocal cords, eyes, everything!
I of course wanted to know if I’ll ever be able to use this, and .. likely no, here’s the response I got from Jianwen one of the Authors today.
That's it for this week, we've talked about so much more in the pod, please please check it out.
As for me, while so many exciting things are happening, I'm going on a small 🏝️ vacation until next ThursdAI, which will happen on schedule, so planning to decompress and disconnect, but will still be checking in, so if you see things that are interesting, please tag me on X 🙏
P.S - I want to shout out a dear community member that's been doing just that, @PresidentLin has been tagging me in many AI related releases, often way before I would even notice them, so please give them a follow! 🫡
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode