
ThursdAI - The top AI news from the past week
📆 ThursdAI Turns Two! 🎉 Gemma 3, Gemini Native Image, new OpenAI tools, tons of open source & more AI news
Episode guests
Podcast summary created with Snipd AI
Quick takeaways
- The podcast celebrates its two-year anniversary, highlighting its evolution from a Twitter Space to a popular AI news platform with a dedicated community.
- Google's Gemini 3 and the multilingual EuroBERT model showcase significant advancements in AI accessibility and localization for diverse global needs.
- OpenAI's new Responses API and agent SDKs highlight their commitment to improving user experiences and expanding capabilities in intelligent applications.
Deep dives
Celebrating Two Years of Thursday Eye
The episode commemorates the second anniversary of Thursday Eye, a community-focused platform dedicated to discussing advancements in AI. Originally started as a Twitter Space, it gained traction when the hosts, Alex Volkov and Travis, realized the demand for deeper discussions around the GPT-4 release. As listeners expressed their desire for a podcast format to accommodate those who could not join live sessions, the hosts pivoted to recording and distributing episodes on various podcast platforms. This evolution has fostered a thriving community where AI enthusiasts and experts regularly engage with the latest developments.
Exciting Developments in AI Models
Several new AI models were discussed, including Google's Gemini 3, which is notable for its multimodal capabilities, including support for video and natural language inputs. This model is claimed to operate efficiently on less hardware compared to its competitors, thus opening avenues for broader accessibility. In addition, a multilingual encoder named EuroBERT was introduced, specifically designed for European languages, which emphasizes the need for better localization in AI applications. These advancements reflect a significant shift towards creating more inclusive and practical AI tools for diverse global needs.
Innovations in Cohere’s Command-A Model
Cohere announced the release of Command-A, a model that boasts strong multilingual capabilities and is particularly efficient for enterprise applications. Command-A is notable for its large context size of 256K tokens, making it ideal for extensive document understanding and generation tasks. The model's ability to run effectively on only two GPUs is a significant improvement compared to similar models that typically require much more hardware. This efficiency, combined with robust security measures, positions Command-A as a competitive option for businesses looking to integrate AI into their workflows.
OpenAI's Enhanced API Ecosystem
OpenAI has unveiled a new Responses API and a set of tools aimed at improving agent functionalities, streamlining the integration of various tasks like web searches and computer vision. This new API offers a simpler interface for text generation and natural language processing, but the existing Completions API will remain active for developers who prefer it. Along with the introduction of agent SDKs, these updates emphasize OpenAI's commitment to enhancing user experience and expanding capabilities in building intelligent applications. The release underscores the growing trend of combining different AI modalities to create cohesive solutions.
Continued Growth of the AI Community
The episode highlights the growing engagement of the AI community, with active participation evident through live discussions and audience feedback. Viewers were encouraged to share the podcast with friends, reflecting the show's aim to democratize AI knowledge and lower the entry barriers for newcomers. Listeners were recognized for their role in the show's success, as regular audience interaction has shaped the direction and content of the episodes. This ongoing dialogue is fundamental to the show's mission of keeping everyone informed about the rapid advancements in artificial intelligence.
Gemini’s New Features: Deep Research and Image Generation
Google launched several groundbreaking features through its Gemini platform, including a free Deep Research tool that allows users to conduct extensive searches across multiple websites. This feature not only enhances the research experience but also showcases Gemini's ability to integrate multimodal outputs, such as generating and modifying images based on user instructions. The introduction of native image generation capabilities means users can interact with visual content more dynamically, which is expected to change how people create and utilize digital media. These innovations position Google as a strong contender in the competitive landscape of AI-driven tools.
LET'S GO!
Happy second birthday to ThursdAI, your favorite weekly AI news show! Can you believe it's been two whole years since we jumped into that random Twitter Space to rant about GPT-4? From humble beginnings as a late-night Twitter chat to a full-blown podcast, Newsletter and YouTube show with hundreds of thousands of downloads, it's been an absolutely wild ride!
That's right, two whole years of me, Alex Volkov, your friendly AI Evangelist, along with my amazing co-hosts, trying to keep you up-to-date on the breakneck speed of the AI world
And what better way to celebrate than with a week PACKED with insane AI news? Buckle up, folks, because this week Google went OPEN SOURCE crazy, Gemini got even cooler, OpenAI created a whole new Agents SDK and the open-source community continues to blow our minds. We’ve got it all - from game-changing model releases to mind-bending demos.
This week I'm also on the Weights & Biases company retreat, so TL;DR first and then the newsletter, but honestly, I'll start embedding the live show here in the substack from now on, because we're getting so good at it, I barely have to edit lately and there's a LOT to show you guys!
TL;DR and Show Notes & Links
* Hosts & Guests
* Alex Volkov - AI Eveangelist & Weights & Biases (@altryne)
* Co Hosts - @WolframRvnwlf @ldjconfirmed @nisten
* Sandra Kublik - DevRel at Cohere (@itsSandraKublik)
* Open Source LLMs
* Google open sources Gemma 3 - 1B - 27B - 128K context (Blog, AI Studio, HF)
* EuroBERT - multilingual encoder models (210M to 2.1B params)
* Reka Flash 3 (reasoning) 21B parameters is open sourced (Blog, HF)
* Cohere Command A 111B model - 256K context (Blog)
* Nous Research Deep Hermes 24B / 3B Hybrid Reasoners (X, HF)
* AllenAI OLMo 2 32B - fully open source GPT4 level model (X, Blog, Try It)
* Big CO LLMs + APIs
* Gemini Flash generates images natively (X, AI Studio)
* Google deep research is now free in Gemini app and powered by Gemini Thinking (Try It no cost)
* OpenAI released new responses API, Web Search, File search and Computer USE tools (X, Blog)
* This weeks Buzz
* The whole company is at an offsite at oceanside, CA
* W&B internal MCP hackathon and had cool projects - launching an MCP server soon!
* Vision & Video
* Remade AI - 8 LORA video effects for WANX (HF)
* AI Art & Diffusion & 3D
* ByteDance Seedream 2.0 - A Native Chinese-English Bilingual Image Generation Foundation Model by ByteDance (Blog, Paper)
* Tools
* Everyone's talking about Manus - (manus.im)
* Google AI studio now supports youtube understanding via link dropping
Open Source LLMs: Gemma 3, EuroBERT, Reka Flash 3, and Cohere Command-A Unleashed!
This week was absolutely HUGE for open source, folks. Google dropped a BOMBSHELL with Gemma 3! As Wolfram pointed out, this is a "very technical achievement," and it's not just one model, but a whole family ranging from 1 billion to 27 billion parameters. And get this – the 27B model can run on a SINGLE GPU! Sundar Pichai himself claimed you’d need "at least 10X compute to get similar performance from other models." Insane!
Gemma 3 isn't just about size; it's packed with features. We're talking multimodal capabilities (text, images, and video!), support for over 140 languages, and a massive 128k context window. As Nisten pointed out, "it might actually end up being the best at multimodal in that regard" for local models. Plus, it's fine-tuned for safety and comes with ShieldGemma 2 for content moderation. You can grab Gemma 3 on Google AI Studio, Hugging Face, Ollama, Kaggle – everywhere! Huge shoutout to Omar Sanseviero and the Google team for this incredible release and for supporting the open-source community from day one! Colin aka Bartowski, was right, "The best thing about Gemma is the fact that Google specifically helped the open source communities to get day one support." This is how you do open source right!
Next up, we have EuroBERT, a new family of multilingual encoder models. Wolfram, our European representative, was particularly excited about this one: "In European languages, you have different characters than in other languages. And, um, yeah, encoding everything properly is, uh, difficult." Ranging from 210 million to 2.1 billion parameters, EuroBERT is designed to push the boundaries of NLP in European and global languages. With training on a massive 5 trillion-token dataset across 15 languages and support for 8K context tokens, EuroBERT is a workhorse for RAG and other NLP tasks. Plus, how cool is their mascot?
Reka Flash 3 - a 21B reasoner with apache 2 trained with RLOO
And the open source train keeps rolling! Reka AI dropped Reka Flash 3, a 21 billion parameter reasoning model with an Apache 2.0 license! Nisten was blown away by the benchmarks: "This might be one of the best like 20B size models that there is right now. And it's Apache 2.0. Uh, I, I think this is a much bigger deal than most people realize." Reka Flash 3 is compact, efficient, and excels at chat, coding, instruction following, and function calling. They even used a new reinforcement learning technique called REINFORCE Leave One-Out (RLOO). Go give it a whirl on Hugging Face or their chat interface – chat.reka.ai!
Last but definitely not least in the open-source realm, we had a special guest, Sandra (@itsSandraKublik) from Cohere, join us to announce Command-A! This beast of a model clocks in at 111 BILLION parameters with a massive 256K context window. Sandra emphasized its efficiency, "It requires only two GPUs. Typically the models of this size require 32 GPUs. So it's a huge, huge difference." Command-A is designed for enterprises, focusing on agentic tasks, tool use, and multilingual performance. It's optimized for private deployments and boasts enterprise-grade security. Congrats to Sandra and the Cohere team on this massive release!
Big CO LLMs + APIs: Gemini Flash Gets Visual, Deep Research Goes Free, and OpenAI Builds for Agents
The big companies weren't sleeping either! Google continued their awesome week by unleashing native image generation in Gemini Flash Experimental! This is seriously f*****g cool, folks! Sorry for my French, but it’s true. You can now directly interact with images, tell Gemini what to do, and it just does it. We even showed it live on the stream, turning ourselves into cat-confetti-birthday-hat-wearing masterpieces!
Wolfram was right, "It's also a sign what we will see in, like, Photoshop, for example. Where you, you expect to just talk to it and have it do everything that a graphic designer would be doing." The future of creative tools is HERE.
And guess what else Google did? They made Deep Research FREE in the Gemini app and powered by Gemini Thinking! Nisten jumped in to test it live, and we were all impressed. "This is the nicest interface so far that I've seen," he said. Deep Research now digs through HUNDREDS of websites (Nisten’s test hit 156!) to give you comprehensive answers, and the interface is slick and user-friendly. Plus, you can export to Google Docs! Intelligence too cheap to meter? Google is definitely pushing that boundary.
Last second additions - Allen Institute for AI released OLMo 2 32B - their biggest open model yet
Just as I'm writing this, friend of the pod, Nathan from Allen Institute for AI announced the release of a FULLY OPEN OLMo 2, which includes weights, code, dataset, everything and apparently it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral.
Evals look legit, but nore than that, this is an Apache 2 model with everything in place to advance open AI and open science!
Check out Nathans tweet for more info, and congrats to Allen team for this awesome release!
OpenAI new responses API and Agent ASK with Web, File and CUA tools
Of course, OpenAI wasn't going to let Google have all the fun. They dropped a new SDK for agents called the Responses API. This is a whole new way to build with OpenAI, designed specifically for the agentic era we're entering. They also released three new tools: Web Search, Computer Use Tool, and File Search Tool. The Web Search tool is self-explanatory – finally, built-in web search from OpenAI!
The Computer Use Tool, while currently limited in availability, opens up exciting possibilities for agent automation, letting agents interact with computer interfaces. And the File Search Tool gives you a built-in RAG system, simplifying knowledge retrieval from your own files. As always, OpenAI is adapting to the agentic world and giving developers more power.
Finally in the big company space, Nous Research released PORTAL, their new Inference API service. Now you can access their awesome models, like Hermes 3 Llama 70B and DeepHermes 3 8B, directly via API. It's great to see more open-source labs offering API access, making these powerful models even more accessible.
This Week's Buzz at Weights & Biases: Offsite Hackathon and MCP Mania!
This week's "This Week's Buzz" segment comes to you live from Oceanside, California! The whole Weights & Biases team is here for our company offsite. Despite the not-so-sunny California weather (thanks, storm!), it's been an incredible week of meeting colleagues, strategizing, and HACKING!
And speaking of hacking, we had an MCP hackathon! After last week’s MCP-pilling episode, we were all hyped about Model Context Protocol, and the team didn't disappoint. In just three hours, the innovation was flowing! We saw agents built for WordPress, MCP support integrated into Weave playground, and even MCP servers for Weights & Biases itself! Get ready, folks, because an MCP server for Weights & Biases is COMING SOON! You'll be able to talk to your W&B data like never before. Huge shoutout to the W&B team for their incredible talent and for embracing the agentic future! And in case you missed it, Weights & Biases is now part of the CoreWeave family! Exciting times ahead!
Vision & Video: LoRA Video Effects and OpenSora 2.0
Moving into vision and video, Remade AI released 8 LoRA video effects for 1X! Remember 1X from Alibaba? Now you can add crazy effects like "squish," "inflate," "deflate," and even "cakeify" to your videos using LoRAs. It's open source and super cool to see video effects becoming trainable and customizable.
And in the realm of open-source video generation, OpenSora 2.0 dropped! This 11 billion parameter model claims state-of-the-art video generation trained for just $200,000! They’re even claiming performance close to Sora itself on some benchmarks. Nisten checked out the demos, and while we're all a bit jaded now with the rapid pace of video AI, it's still mind-blowing how far we've come. Open source video is getting seriously impressive, seriously fast.
AI Art & Diffusion & 3D: ByteDance's Bilingual Seedream 2.0
ByteDance, the folks behind TikTok, released Seedream 2.0, a native Chinese-English bilingual image generation foundation model. This model, from ByteDream, excels at text rendering, cultural nuance, and human preference alignment. Seedream 2.0 boasts "powerful general capability," "native bilingual comprehension ability," and "excellent text rendering." It's designed to understand both Chinese and English prompts natively, generating high-quality, culturally relevant images. The examples look stunning, especially its ability to render Chinese text beautifully.
Tools: Manus AI Agent, Google AI Studio YouTube Links, and Cursor Embeddings
Finally, in the tools section, everyone's buzzing about Manus, a new AI research agent. We gave it a try live on the show, asking it to do some research. The UI is slick, and it seems to be using Claude 3.7 behind the scenes. Manus creates a to-do list, browses the web in a real Chrome browser, and even generates files. It's like Operator on steroids. We'll be keeping an eye on Manus and will report back on its performance in future episodes.
And Google AI Studio keeps getting better! Now you can drop YouTube links into Google AI Studio, and it will natively understand the video! This is HUGE for video analysis and content understanding. Imagine using this for support, content summarization, and so much more.
PHEW! What a week to celebrate two years of ThursdAI! From open source explosions to Gemini's visual prowess and OpenAI's agentic advancements, the AI world is moving faster than ever. As Wolfram aptly put it, "The acceleration, you can feel it." And Nisten reminded us of the incredible journey, "I remember I had early access to GPT-4 32K, and, uh, then... the person for the contract that had given me access, they cut it off because on the one weekend, I didn't realize how expensive it was. So I had to use $180 worth of tokens just trying it out." Now, we have models that are more powerful and more accessible than ever before.
Thank you to Wolfram, Nisten, and LDJ for co-hosting and bringing their insights every week.
And most importantly, THANK YOU to our amazing community for tuning in, listening, and supporting ThursdAI for two incredible years! We couldn't do it without you. Here's to another year of staying up-to-date so YOU don't have to! Don't forget to subscribe to the podcast, YouTube channel, and newsletter to stay in the loop. And share ThursdAI with a friend – it's the best birthday gift you can give us! Until next week, keep building and keep exploring the amazing world of AI! LET'S GO!
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe