
ThursdAI - The top AI news from the past week
📅 ThursdAI - ChatGPT-4o back on top, Nous Hermes 3 LLama finetune, XAI uncensored Grok2, Anthropic LLM caching & more AI news from another banger week
Podcast summary created with Snipd AI
Quick takeaways
- OpenAI's recent release of ChatGPT 4.0 enhances structured outputs and coding capabilities, creating some confusion among developers.
- Elon Musk's XAI launched Grok 2, which offers superior coding abilities, real-time data access, and a less restricted AI approach.
- Google's Gemini Live introduces an advanced voice-assistant technology, facilitating continuous conversations and interaction for improved user productivity.
- Anthropic's prompt caching feature allows for substantial cost reductions in AI operations, fostering wider accessibility for developers and complex tasks.
Deep dives
OpenAI's Confusing Releases
OpenAI has made multiple announcements regarding the updates to its GPT models, particularly the release of a new version of ChatGPT. They introduced a version known as GPT 4.0 that enhances structured outputs and coding capabilities. The confusion arises as OpenAI released two variations of ChatGPT, which has left developers uncertain about which version to utilize. Despite this, the latest addition has been recognized as the top-performing model according to various rankings.
XAI's Grog 2 Release
XAI, led by Elon Musk, has launched Grog 2, showcasing significant advancements considering the company is relatively new. This model reportedly outperforms its predecessor and achieves impressive scores in various evaluations. Grog 2 comes equipped with enhanced features such as improved coding abilities and real-time data access via their APIs, which allow it to stay updated with current events. The model's less restricted approach sets it apart from others in the market, focusing on fewer censorship protocols.
Google's Gemini Live Introduction
Google has unveiled Gemini Live during an event, marking a significant advancement in voice-assisted technologies. This feature enables continuous conversations and the ability to interrupt during dialogue, thus offering a more interactive user experience. Gemini Live is set to be integrated into Android devices, allowing users to engage with their Google Assistant seamlessly. This integration aims to enhance user productivity and interactive capabilities across various Google applications.
Anthropic's Prompt Caching Feature
Anthropic has introduced a novel feature called prompt caching, allowing developers to save long context conversations for more efficient processing. This functionality promises to significantly reduce costs, making coding and other complex tasks more accessible. With cost reductions up to 90%, this innovation facilitates more extensive and efficient use of AI models, fostering an environment where developers can run operations that were previously financially impractical. The development marks a pivotal shift towards cost-effective AI operations.
Hermes 3 and Its Impact
The co-founder of News Research announced the release of Hermes 3, featuring groundbreaking models with 405 billion parameters, regarded as one of the first full fine-tunes of Llama 3's foundation. This release emphasizes user alignment, granting users the ability to dictate the system prompt, thus enabling customized interactions. With a focus on diverse capabilities, Hermes 3 boasts improvements in tool usage, retrieval-augmented generation (RAG), and role-playing functions. The model's release signifies a major step forward for open-source AI offerings.
Advancements in AI Art Generation
Flux, an image generation model from Black Forest Labs, has been recognized for its outstanding capabilities in creating high-quality visual content. This model not only competes with prominent players like MidJourney but also introduces features such as personalized image generation and text inclusion within images. Users can train the model with individual characteristics to produce tailored results, leading to enhanced creative possibilities in visual outputs. Furthermore, the open-source nature of Flux allows wider accessibility for developers and artists alike.
NVIDIA's Llama 3.1 MiniThron Release
NVIDIA has launched Llama 3.1 MiniThron, employing new techniques for distillation and pruning, aimed at yielding smaller and more efficient models. The model offers a density reduction while attempting to maintain performance levels equivalent to larger models. By exploring width pruning and depth techniques, NVIDIA's efforts represent a significant step in optimizing model performance without compromising on quality. This advancement highlights the trend towards developing more accessible AI technologies.
Runway's Gen 3 Turbo Launch
Runway has released Gen 3 Turbo, an updated version of its video generation model that boasts remarkably faster response times, capable of generating videos in under 30 seconds. This advancement in video technology enhances user experience, particularly for those engaging in creative projects. The introduction of this model in a free tier also democratizes high-quality video generation, providing opportunities for a broader range of users. The improvements underscore the rapid evolution of AI-driven content creation tools.
Look these crazy weeks don't seem to stop, and though this week started out a bit slower (while folks were waiting to see how the speculation about certain red berry flavored conspiracies are shaking out) the big labs are shipping!
We've got space uncle Elon dropping an "almost-gpt4" level Grok-2, that's uncensored, has access to real time data on X and can draw all kinds of images with Flux, OpenAI announced a new ChatGPT 4o version (not the one from last week that supported structured outputs, a different one!) and Anthropic dropping something that makes AI Engineers salivate!
Oh, and for the second week in a row, ThursdAI live spaces were listened to by over 4K people, which is very humbling, and awesome because for example today, Nous Research announced Hermes 3 live on ThursdAI before the public heard about it (and I had a long chat w/ Emozilla about it, very well worth listening to)
TL;DR of all topics covered:
* Big CO LLMs + APIs
* Xai releases GROK-2 - frontier level Grok, uncensored + image gen with Flux (𝕏, Blog, Try It)
* OpenAI releases another ChatGPT-4o (and tops LMsys again) (X, Blog)
* Google showcases Gemini Live, Pixel Bugs w/ Gemini, Google Assistant upgrades ( Blog)
* Anthropic adds Prompt Caching in Beta - cutting costs by u to 90% (X, Blog)
* AI Art & Diffusion & 3D
* Flux now has support for LORAs, ControlNet, img2img (Fal, Replicate)
* Google Imagen-3 is out of secret preview and it looks very good (𝕏, Paper, Try It)
* This weeks Buzz
* Using Weights & Biases Weave to evaluate Claude Prompt Caching (X, Github, Weave Dash)
* Open Source LLMs
* NousResearch drops Hermes 3 - 405B, 70B, 8B LLama 3.1 finetunes (X, Blog, Paper)
* NVIDIA Llama-3.1-Minitron 4B (Blog, HF)
* AnswerAI - colbert-small-v1 (Blog, HF)
* Vision & Video
* Runway Gen-3 Turbo is now available (Try It)
Big Companies & LLM APIs
Grok 2: Real Time Information, Uncensored as Hell, and… Flux?!
The team at xAI definitely knows how to make a statement, dropping a knowledge bomb on us with the release of Grok 2. This isn't your uncle's dad joke model anymore - Grok 2 is a legitimate frontier model, folks.
As Matt Shumer excitedly put it
“If this model is this good with less than a year of work, the trajectory they’re on, it seems like they will be far above this...very very soon” 🚀
Not only does Grok 2 have impressive scores on MMLU (beating the previous GPT-4o on their benchmarks… from MAY 2024), it even outperforms Llama 3 405B, proving that xAI isn't messing around.
But here's where things get really interesting. Not only does this model access real time data through Twitter, which is a MOAT so wide you could probably park a rocket in it, it's also VERY uncensored. Think generating political content that'd make your grandma clutch her pearls or imagining Disney characters breaking bad in a way that’s both hilarious and kinda disturbing all thanks to Grok 2’s integration with Black Forest Labs Flux image generation model.
With an affordable price point ($8/month for x Premium including access to Grok 2 and their killer MidJourney competitor?!), it’ll be interesting to see how Grok’s "truth seeking" (as xAI calls it) model plays out. Buckle up, folks, this is going to be wild, especially since all the normies now have the power to create political memes, that look VERY realistic, within seconds.
Oh yeah… and there’s the upcoming Enterprise API as well… and Grok 2’s made its debut in the wild on the LMSys Arena, lurking incognito as "sus-column-r" and is now placed on TOP of Sonnet 3.5 and comes in as number 5 overall!
OpenAI last ChatGPT is back at #1, but it's all very confusing 😵💫
As the news about Grok-2 was settling in, OpenAI decided to, well… drop yet another GPT-4.o update on us. While Google was hosting their event no less. Seriously OpenAI? I guess they like to one-up Google's new releases (they also kicked Gemini from the #1 position after only 1 week there)
So what was anonymous-chatbot in Lmsys for the past week, was also released in ChatGPT interface, is now the best LLM in the world according to LMSYS and other folks, it's #1 at Math, #1 at complex prompts, coding and #1 overall.
It is also available for us developers via API, but... they don't recommend using it? 🤔
The most interesting thing about this release is, they don't really know to tell us why it's better, they just know that it is, qualitatively and that it's not a new frontier-class model (ie, not 🍓 or GPT5)
Their release notes on this are something else 👇
Meanwhile it's been 3 months, and the promised Advanced Voice Mode is only in the hands of a few lucky testers so far.
Anthropic Releases Prompt Caching to Slash API Prices By up to 90%
Anthropic joined DeepSeek's game of "Let's Give Devs Affordable Intelligence," this week rolling out prompt caching with up to 90% cost reduction on cached tokens (yes NINETY…🤯 ) for those of you new to all this technical sorcery
Prompt Caching allows the inference provider to save users money by reusing repeated chunks of a long prompt form cache, reducing pricing and increasing time to first token, and is especially beneficial for longer contexts (>100K) use-cases like conversations with books, agents with a lot of memory, 1000 examples in prompt etc'
We covered caching before with Gemini (in Google IO) and last week with DeepSeek, but IMO this is a better implementation from a frontier lab that's easy to get started, manages the timeout for you (unlike Google) and is a no brainer implementation.
And, you'll definitely want to see the code to implement it all yourself, (plus Weave is free!🤩):
"In this week's buzz category… I used Weave, our LLM observability tooling to super quickly evaluate how much cheaper Cloud Caching from Anthropic really is, I did a video of it and I posted the code … If you're into this and want to see how to actually do this … how to evaluate, the code is there for you" - Alex
With the ridiculous 90% price drop for those cached calls (Haiku basically becomes FREE and cached Claude is costs like Haiku, .30 cents per 1Mtok). For context, I took 5 transcripts of 2 hour podcast conversations, and it amounted to ~110,000 tokens overall, and was able to ask questions across all this text, and it cost me less than $1 (see in the above video)
Code Here + Weave evaluation Dashboard here
AI Art, Diffusion, and Personalized AI On the Fly
Speaking of mind blowing, Flux took over this week, thanks in no small part to Elon strategically leveraging their tech in Grok (and everyone reminding everyone else, that it's not Grok creating images, it's Flux!)
Now, remember, the REAL magic happens when code meets open source, “Flux now has support for LORAs, ControlNet, img2img…" meaning developers have turned those foundational tools into artistic wizardry. With as little as $5 bucks and a few pictures, “You can train the best image model on your own face. ”🤯 (Seriously folks, head up to Fal.ai, give it a whirl… it’s awesome)
Now if you combine the LORA tech with ControlNet tech, you can get VERY creepy very fast (I'm using my own face here but you get the idea), here's "me" as the distracted boyfriend meme, and the girlfriend, and the distraction 😂 (I'm sorry you had to see this, AI has gone too far! Shut it all down!)
If seeing those creepy faces on screen isn't for you (I totally get that) there’s also Google IMAGEN 3, freshly escaped from secret preview and just waiting for you to unleash those artistic prompts on it! Google, despite being… Google, somehow figured out that a little competition does a lab good and rolled out a model that’s seriously impressive.
Runway Video Gets a "Turbocharged" Upgrade🚀🚀🚀
Ever tried those jaw-dropping text-to-video generators but groaned as you watched those seconds of video render painfully slowly?😭 Well Runway, creators of Gen 3, answered our prayers with the distilled turbocharged version that churns out those visuals in a blink 🤯🤯🤯 .
What's truly cool is they unlocked it for FREE tier users (sign up and unleash those cinematic prompts right now!), letting everyday folks dip their toes in those previously-unfathomable waters. Even the skeptics at OpenBMB (Junyang knows what I'm talking about…) had to acknowledge that their efforts with MiniCPM V are impressive, especially the smooth way it captures video sequences better than models even twice its size 🤯.
Open Source: Hermes 3 and The Next Generation of Open AI 🚀
NousResearch Dropped Hermes 3: Your New Favorite AI (Yes Really)
In the ultimate “We Dropped This On ThursdAI Before Even HuggingFace”, the legendary team at NousResearch dropped the hottest news since Qwen decided to play math God: Hermes 3 is officially here! 🤯
“You’re about to get to use the FIRST big Finetune of LLama 3.1 405B… We don’t think there have been finetunes,” announced Emozilla who’s both co founder and resident master wizard of all things neural net, “And it's available to try for free thanks to Lambda, you can try it out right here ” (you’re all racing to their site as I type this, I KNOW it!).
Not ONLY does this beauty run ridiculously smooth on Lambda, but here’s the real TL;DR:
* Hermes 3 isn’t just 405B; there are 70B and 8B versions dropping simultaneously on Hugging Face, ready to crush benchmarks and melt your VRAM (in a GOOD way… okay maybe not so great for your power bill 😅).
* On Benchmark, they beat LLama 3.1 instruct on a few evals and lose on some, which is quite decent, given that Meta team did an amazing job with their instruct finetuning (and probably spent millions of $ on it too)
* Hermes 3 is all about user alignment, which our open source champion Wolfram Ravenwolf summarized beautifully: “When you have a model, and you run it on your system, IT MUST BE LOYAL TO YOU.” 😈
Hermes 3 does just that with incredibly precise control via its godlike system prompt: “In Hermes 3 the system prompt is KING,” confirmed Emoz. It’s so powerful that the 405B version was practically suffering existential angst in their first conversation… I read that part outloud during the space, but here you go, this is their first conversation, and he goes into why this they thing this happened, in our chat that's very worth listening to
This model was trained on a bunch of datasources that they will release in the future, and includes tool use, and a slew of tokens that you can add in the system prompt, that will trigger abilities in this model to do chain of thought, to do scratchpad (think, and then rethink), to cite from sources for RAG purposes and a BUNCH more.
The technical report is HERE and is worth diving into as is our full conversation with Emozilla on the pod.
Wrapping Things Up… But We’re Just Getting Started! 😈
I know, I KNOW, your brain is already overflowing but we barely SCRATCHED the surface…
We also dove into NVIDIA's research into new pruning and distilling techniques, TII Falcon’s attempt at making those State Space models finally challenge the seemingly almighty Transformer architecture (it's getting closer... but has a way to go!), plus AnswerAI's deceptively tiny Colbert-Small-V1, achieving remarkable search accuracy despite its featherweight size and a bunch more...
See you all next week for what’s bound to be yet another wild AI news bonanza… Get those download speeds prepped, we’re in for a wild ride. 🔥
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe