AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The episode begins with the host expressing enthusiasm for the latest advancements in AI, particularly focusing on DeepSeq, a groundbreaking model that has captured widespread attention. Notably, attendees at AI events are now discussing DeepSeq, highlighting its significant impact on the AI community. The conversation touches on how DeepSeq has transcended the usual AI circles, gaining recognition among the general public as well. The excitement surrounding DeepSeq emphasizes its status as a major event in AI this week.
The hosts plan to conduct a meta-analysis of public reactions to DeepSeq as it continues to dominate the news cycle. They discuss previous high-profile releases that similarly created hype but lacked sustained interest. There's acknowledgment that while DeepSeq's release has garnered attention, there's still a broader context to consider regarding its implications and performance comparisons with other models. Reflecting on reactions from major companies, the hosts anticipate additional insights as various industry players respond to DeepSeq's launch.
A central theme discussed is the growing need for reasoning models in the AI landscape, as evidenced by an initiative that released a large-scale reasoning dataset. This dataset includes 114,000 versions designed to assist in developing and enhancing reasoning capabilities among AI models. The community's demand for improved reasoning performance is highlighted, alongside the acknowledgment that many models need better foundational data. As reasoning models become more prevalent, the value of dedicated datasets like this one will continue to rise.
The episode details the release of Mistral Small 2501, a model with impressive parameters and open-source licensing, which has been well-received in the community. Thereâs emphasis on Mistral finally adding comparative evaluations that include previously overlooked models, showcasing their commitment to community feedback. The importance of community engagement and open-source collaboration is underscored, as Mistral's decision reflects a growing trend within the AI development space. With a focus on transparency, Mistral aims to foster trust and facilitate further advancements.
Discussions around the rise of AI agents highlight the rapid developments in this field, with tools like OpenAI's Operator gaining traction. Although there's a sense of excitement, the hosts caution that many of these agents still struggle to fully deliver on their promises. Upcoming frameworks intended to streamline the user experience are explored, indicating a shift towards more advanced agent capabilities. The investment in agent technology represents a significant step forward, although challenges remain in their implementation and performance.
Voice and audio AI developments are prominent in the episode, with updates on models such as UA7B, which can generate full songs and has recently shifted to an Apache 2 license, opening commercial use options. The versatility of UA7B in creating music across genres showcases the growing sophistication of audio generation models. Additionally, the introduction of Refusion, a free music generator, provides more accessible options for users interested in music AI. These advancements signal a vibrant ecosystem rapidly evolving to meet user demands.
The hosts discuss significant corporate announcements, including updates from major companies like Meta concerning their AI infrastructure and commitments to advancing their models. Statements from executives emphasize the importance of substantial investments into AI technologies, highlighting competition from models like DeepSeq. The interactions between different companies, their models, and evolving APIs reflect the competitive landscape where companies strive to maintain their innovation edge. Insights from these corporate strategies indicate a strong move towards more integrated and powerful AI systems.
The podcast concludes with a forward-looking perspective, anticipating further announcements and developments that align with the rapid advancements seen this week. Notably, expectations around the launch of GPT-O3 Mini and other models keep the audience engaged. The hosts also reflect on the weekâs trends, emphasizing the importance of community feedback, open-source collaboration, and the significant progress made in AI reasoning and audio. As the conversation wraps up, the enthusiasm for future AI innovations remains palpable.
Hey folks, Alex here đ
Itâs officialâgrandmas (and the entire stock market) now know about DeepSeek. If youâve been living under an AI rock, DeepSeekâs new R1 model just set the world on fire, rattling Wall Street (causing the biggest monetary loss for any company, ever!) and rocketing to #1 on the iOS App Store. This weekâs ThursdAI show took us on a deep (pun intended) dive into the dizzying whirlwind of open-source AI breakthroughs, agentic mayhem, and big-company cat-and-mouse announcements. Grab your coffee (or your winter survival kit if youâre in Canada), because in true ThursdAI fashion, weâve got at least a dozen bombshells to coverâeverything from brand-new Mistral to next-gen vision models, new voice synthesis wonders, and big moves from Meta and OpenAI.
Weâre also talking âreasoning mania,â as the entire industry scrambles to replicate, dethrone, or ride the coattails of the new open-source champion, R1. So buckle upâbecause if the last few days are any indication, 2025 is officially the Year of Reasoning (and quite possibly, the Year of Agents, or both!)
Open Source LLMs
DeepSeek R1 discourse Crashes the Stock Market
One-sentence summary: DeepSeekâs R1 âreasoning modelâ caused a frenzy this week, hitting #1 on the App Store and briefly sending NVIDIAâs stock plummeting in the process ($560B drop, largest monetary loss of any stock, ever)
Ever since DeepSeek R1 launched (our technical coverate last week!), the buzz has been impossible to ignoreâeveryone from your mom to your local barista has heard the name. The speculation? DeepSeekâs new architecture apparently only cost $5.5 million to train, fueling the notion that high-level AI might be cheaper than Big Tech claims. Suddenly, people wondered if GPU manufacturers like NVIDIA might see shrinking demand, and the stock indeed took a short-lived 17% tumble. On the show, I joked, âMy mom knows about DeepSeekâyour grandma probably knows about it, too,â underscoring just how mainstream the hype has become.
Not everyone is convinced the cost claims are accurate. Even Dario Amodei of Anthropic weighed in with a blog post arguing that DeepSeekâs success increases the case for stricter AI export controls.
Public Reactions
* Dario Amodeiâs blogIn âOn DeepSeek and Export Controls,â Amodei argues that DeepSeekâs efficient scaling exemplifies why democratic nations need to maintain a strategic leadership edgeâand enforce export controls on advanced AI chips. He sees Chinese breakthroughs as proof that AI competition is global and intense.
* OpenAI Distillation EvidenceOpenAI mentioned it found âdistillation tracesâ of GPT-4 inside R1âs training data. Hypocrisy or fair game? On ThursdAI, the panel mused that âeveryone trains on everything,â so perhaps itâs a moot point.
* Microsoft ReactionMicrosoft wasted no time, swiftly adding DeepSeek to Azureâfurther proof that corporations want to harness R1âs reasoning power, no matter where it originated.
* Government reactedEven officials in the government, David Sacks, US incoming AI & Crypto czar, discussed the fact that DeepSeek did "distillation" using the term somewhat incorrectly, and presidet Trump was asked about it.
* API OutagesDeepSeekâs own API has gone in and out this week, apparently hammered by demand (and possibly DDoS attacks). Meanwhile, GPU clouds like Groq are showing up to accelerate R1 at 300 tokens/second, for those who must have it right now.
We've seen so many bad takes on the topic, from seething cope takes, to just gross misunderstandings from gov officials confusing the ios App with the OSS models, folks throwing conspiracy theories into the mix, claiming that $5.5M sum was a PsyOp. The fact of the matter is, DeepSeek R1 is an incredible model, and is now powering (just a week later), multiple products (more on this below) and experiences already, while pushing everyone else to compete (and give us reasoning models!)
Open Thoughts Reasoning Dataset
One-sentence summary: A community-led effort, âOpen Thoughts,â released a new large-scale dataset (OpenThoughts-114k) of chain-of-thought reasoning data, fueling the open-source drive toward better reasoning models.
Worried about having enough labeled âthinkingâ steps to train your own reasoner? Fear not. The OpenThoughts-114k dataset aggregates chain-of-thought prompts and responsesâ114,000 of themâfor building or fine-tuning reasoning LLMs. Itâs now on Hugging Face for your experimentation pleasure. The ThursdAI panel pointed out how crucial these large, openly available reasoning datasets are. As Wolfram put it, âWe canât rely on the big labs alone. More open data means more replicable breakouts like DeepSeek R1.â
Mistral Small 2501 (24B)
One-sentence summary: Mistral AI returns to the open-source spotlight with a 24B model that fits on a single 4090, scoring over 81% on MMLU while under Apache 2.0.
Long rumored to be âgoing more closed,â Mistral AI re-emerged this week with Mistral-Small-24B-Instruct-2501âan Apache 2.0 licensed LLM that runs easily on a 32GB VRAM GPU. That 81% MMLU accuracy is no joke, putting it well above many 30Bâ70B competitor models. It was described as âthe perfect size for local inference and a real sweet spot,â noting that for many tasks, 24B is âjust big enough but not painfully heavy.â Mistral also finally started comparing themselves to Qwen 2.5 in official benchmarksâa big shift from their earlier reluctance, which we applaud!
Berkeley TinyZero & RAGEN (R1 Replications)
One-sentence summary: Two separate projects (TinyZero and RAGEN) replicated DeepSeek R1-zeroâs reinforcement learning approach, showing you can get âahaâ reasoning moments with minimal compute.
If you were wondering whether R1 is replicable: yes, it is. Berkeleyâs TinyZero claims to have reproduced the core R1-zero behaviors for $30 using a small 3B model. Meanwhile, the RAGEN project aims to unify RL + LLM + Agents with a minimal codebase. While neither replication is at R1-level performance, they demonstrate how quickly the open-source community pounces on new methods. âWeâre now seeing those same âreasoning sparksâ in smaller reproductions,â said Nisten. âThatâs huge.â
Agents
Codename Goose by Blocks (X, Github)
One-sentence summary: Jack Dorseyâs company Blocks released Goose, an open-source local agent framework letting you run keyboard automation on your machine.
Ever wanted your AI to press keys and move your mouse in real time? Goose does exactly that with AppleScript, memory extensions, and a fresh approach to âlocal autonomy.â On the show, I tried Goose, but found it occasionally âwent rogue, trying to delete my WhatsApp chats.â Security concerns aside, Goose is significant: itâs an open-source playground for agent-building. The plugin system includes integration with Git, Figma, a knowledge graph, and more. If nothing else, Goose underscores how hot âagenticâ frameworks are in 2025.
OpenAIâs Operator: One-Week-In
Itâs been a week since Operator went live for Pro-tier ChatGPT users. âItâs the first agent that can run for multiple minutes without bugging me every single second,â. Yet itâs still far from perfectâcaptchas, login blocks, and repeated confirmations hamper tasks. The potential, though, is enormous: âI asked Operator to gather my X.com bookmarks and generate a summary. It actually tried,â I shared, âbut it got stuck on three links and needed constant nudges.â Simon Willison added that itâs âa neat tech demoâ but not quite a productivity boon yet. Next steps? Possibly letting the brand-new reasoning models (like O1 Pro Reasoning) do the chain-of-thought under the hood.
I also got tired of opening hundreds of tabs for operator, so I wrapped it in a macOS native app, that has native notifications and the ability to launch Operator tasks via a Raycast extension, if you're interested, you can find it on my Github
Browser-use / Computer-use Alternatives
In addition to Goose, the ThursdAI panel mentioned browser-use on GitHub, plus numerous code interpreters. So far, none blow minds in reliability. But 2025 is evidently âthe year of agents.â If youâre itching to offload your browsing or file editing to an AI agent, expect to tinker, troubleshoot, and yes, babysit. The show consensus? âItâs not about whether agents are coming, itâs about how soon theyâll become truly robust,â said Wolfram.
Big CO LLMs + APIs
Alibaba Qwen2.5-Max (& Hidden Video Model) (Try It)
One-sentence summary: Alibabaâs Qwen2.5-Max stands toe-to-toe with GPT-4 on some tasks, while also quietly rolling out video-generation features.
While Western media fixates on DeepSeek, Alibabaâs Qwen team quietly dropped the Qwen2.5-Max MoE model. It clocks in at 69% on MMLU-Proâbeating some OpenAI or Google offeringsâand comes with a 1-million-token context window. And guess what? The official Chat interface apparently does hidden video generation, though Alibaba hasnât publicized it in the English internet.
In the Chinese AI internet, this video generation model is called Tongyi Wanxiang, and even has itâs own website, can support first and last video generation and looks really really good, they have a gallery up there, and it even has audio generation together with the video!
This one was an img2video, but the movements are really natural!
Zuckerberg on LLama4 & LLama4 Mini
In Metaâs Q4 earnings call, Zuck was all about AI (sorry, Metaverse). He declared that LLama4 is in advanced training, with a smaller âLLama4 Miniâ finishing pre-training. More importantly, a âreasoning modelâ is in the works, presumably influenced by the mania around R1. Some employees had apparently posted on Blind about âWhy are we paying billions for training if DeepSeek did it for $5 million?â so the official line is that Meta invests heavily for top-tier scale.
Zuck also doubled down on saying "Glasses are the perfect form factor for AI" , to which I somewhat agree, I love my Meta Raybans, I just wished they were integrated into the ios more.
He also boasted about their HUGE datacenters, called Mesa, spanning the size of Manhattan, being built for the next step of AI.
(Nearly) Announced: O3-Mini
Right before the ThursdAI broadcast, rumors swirled that OpenAI might reveal O3-Mini. Itâs presumably GPT-4âs âlittle cousinâ with a fraction of the cost. ThenâŠsilence. Sam Altman also mentioned they would be bringing o3-mini by end of January, but maybe the R1 crazyness made them keep working on it and training it a bit more? đ€
In any case, we'll cover it when it launches.
This Weekâs Buzz
We're still the #1 spot on Swe-bench verified with W&B programmer, and our CTO, Shawn Lewis, chatted with friends of the pod Swyx and Alessio about it! (give it a listen)
We have two upcoming events:
* AI.engineer in New York (Feb 20â22). Weights & Biases is sponsoring, and I will broadcast ThursdAI live from the summit. If you snagged a ticket, come say hiâthere might be a cameo from the âChef.â
* Toronto Tinkerer Workshops (late February) in the University of Toronto. The Canadian AI scene is hot, so watch out for sign-ups (will add them to the show next week)
Weights & Biases also teased more features for LLM observability (Weave) and reminded folks of their new suite of evaluation tools. âIf you want to know if your AI is actually better, you do evals,â Alex insisted. For more details, check out wandb.me/weave or tune into the next ThursdAI.
Vision & Video
DeepSeek - Janus Pro - multimodal understanding and image gen unified (1.5B & 7B)
One-sentence summary: Alongside R1, DeepSeek also released Janus Pro, a unified model for image understanding and generation (like GPT-4âs rumored image abilities).
DeepSeek apparently never sleeps. Janus Pro is MIT-licensed, 7B parameters, and can both parse images (SigLIP) and generate them (LlamaGen). The model outperforms DALL·E 3 and SDXL! on some internal benchmarksâthough at a modest 384Ă384 resolution.
NVIDIAâs Eagle 2 Redux
One-sentence summary: NVIDIA re-released the Eagle 2 vision-language model with 4K resolution support, after mysteriously yanking it a week ago.
Eagle 2 is back, boasting multi-expert architecture, 16k context, and high-res video analysis. Rumor says it competes with big 70B param vision models at only 9B. But itâs overshadowed by Qwen2.5-VL (below). Some suspect NVIDIA is aiming to outdo Metaâs open-source hold on visionâjust in time to keep GPU demand strong.
Qwen 2.5 VL - SOTA oss vision model is here
One-sentence summary: Alibabaâs Qwen 2.5 VL model claims state-of-the-art in open-source vision, including 1-hour video comprehension and âobject grounding.â
The Qwen team didnât hold back: âItâs the final boss for vision,â joked Nisten. Qwen 2.5 VL uses advanced temporal modeling for video and can handle complicated tasks like OCR or multi-object bounding boxes.
Featuring advances in precise object localization, video temporal understanding and agentic capabilities for computer, this is going to be the model to beat!
Voice & Audio
YuE 7B (Open âSunoâ)
Ever dream of building the next pop star from your code editor? YuE 7B is your ticket. This model, now under Apache 2.0, supports chain-of-thought creation of structured songs, multi-lingual lyrics, and references. Itâs slow to infer, but itâs arguably the best open music generator so far in the open source
What's more, they have changed the license to apache 2.0 just before we went live, so you can use YuE everywhere!
Refusion Fuzz
Refusion, a new competitor to paid audio models like Suno and Udio, launched âFuzz,â offering free music generation online until GPU meltdown.
If you want to dabble in âprompt to jam trackâ without paying, check out Refusion Fuzz. Will it match the emotional nuance of premium services like 11 Labs or Hauio? Possibly not. But hey, free is free.
Tools (that have integrated R1)
Perplexity with R1
In the perplexity.ai chat, you can choose âPro with R1â if you pay for it, harnessing R1âs improved reasoning to parse results. For some, itâs a major upgrade to âsearch-based question answering.â Others prefer it to paying for O1 or GPT-4.
I always check Perplexity if it knows what the latest episode of ThursdAI was, and it's the first time it did a very good summary! I legit used it to research the show this week! It's really something.
Meanwhile, Exa.ai also integrated a âDeepSeek Chatâ for your agent-based workflows. Like it or not, R1 is everywhere.
Krea.ai with DeepSeek
Our friends at Krea, an AI art tool aggregator, also hopped on the R1 bandwagon for chat-based image searching or generative tasks.
Conclusion
Key Takeaways
* DeepSeekâs R1 has massive cultural reach, from #1 apps to spooking the stock market.
* Reasoning mania is upon usâeveryone from Mistral to Meta wants a piece of the logic-savvy LLM pie.
* Agentic frameworks like Goose, Operator, and browser-use are proliferating, though theyâre still baby-stepping through reliability issues.
* Vision and audio get major open-source love, with Janus Pro, Qwen 2.5 VL, YuE 7B, and more reshaping multimodality.
* Big Tech (Meta, Alibaba, OpenAI) is forging ahead with monster models, multi-billion-dollar projects, and cross-country expansions in search of the best reasoning approaches.
At this point, itâs not even about where the next big model drop comes from; itâs about how quickly the entire ecosystem can adopt (or replicate) that new methodology.
Stay tuned for next weekâs ThursdAI, where weâll hopefully see new updates from OpenAI (maybe O3-Mini?), plus the ongoing race for best agent. Also, catch us at AI.engineer in NYC if you want to talk shop or share your own open-source success stories. Until then, keep calm and carry on training.
TLDR
* Open Source LLMs
* DeepSeek Crashes the Stock Market: Did $5.5M training or hype do it?
* Open Thoughts Reasoning Dataset OpenThoughts-114k (X, HF)
* Mistral Small 2501 (24B, Apache 2.0) (HF)
* Berkeley TinyZero & RAGEN (R1-Zero Replications) (Github, WANDB)
* Allen Institute - Tulu 405B (Blog, HF)
* Agents
* Goose by Blocks (local agent framework) - (X, Github)
* Operator (OpenAI) â One-Week-In (X)
* Browser-use - oss version of Operator (Github)
* Big CO LLMs + APIs
* Alibaba Qwen2.5-Max (+ hidden video model) - (X, Try it)
* Zuckerberg on LLama4 & âReasoning Modelâ (X)
* This Weekâs Buzz
* Shawn Lewis interview on Latent Space with swyx & Alessio
* Weâre sponsoring the ai.engineer upcoming summit in NY (Feb 19-22), come say hi!
* After that, weâll host 2 workshops with AI Tinkerers Toronto (Feb 23-24), make sure youâre signed up to Toronto Tinkerers to receive the invite (we were sold out quick last time!)
* Vision & Video
* DeepSeek Janus Pro - 1.5B and 7B (Github, Try It)
* NVIDIA Eagle 2 (Paper, Model, Demo)
* Alibaba Qwen 2.5 VL (Project, HF, Github, Try It)
* Voice & Audio
* Yue 7B (Open Suno) - (Demo, HF, Github)
* Refusion Fuzz (free for now)
* Tools
* Perplexity with R1 (choose Pro with R1)
* Exa integrated R1 for free (demo)
* Participants
* Alex Volkov (@altryne)
* Wolfram Ravenwolf (@WolframRvnwlf)
* Nisten Tahiraj (@nisten )
* LDJ (@ldjOfficial)
* Simon Willison (@simonw)
* W&B Weave (@weave_wb)
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode