Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0 cover image

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0

Latest episodes

undefined
6 snips
Dec 22, 2024 • 57min

2024 in Vision [LS Live @ NeurIPS]

In this engaging discussion, Isaac Robinson and Peter Robicheaux from Roboflow share insights on the latest trends and groundbreaking papers in computer vision for 2024. They highlight the shift towards video-based models like 'Sora' and advancements in real-time object detection. Vik Korrapati, founder of Moondream, presents challenges in developing vision language models and introduces a compact, pruned model. Together, they explore how these innovations can reshape the landscape of computer vision and enhance pre-trained model efficiencies.
undefined
44 snips
Dec 21, 2024 • 52min

The State of AI Startups [LS Live @ NeurIPS]

The opening keynote at NeurIPS 2024 dives into the evolving landscape of AI startups and investment trends. Experts discuss transformative developments and the shift towards open-source models. They analyze advancements in video modalities and challenges in scaling businesses. The conversation touches on integrating multimodal data in enterprises, the impact of declining AI costs, and the growing interest in consumer-focused AI startups. Exciting insights highlight the future opportunities for innovation and competition in the AI industry.
undefined
178 snips
Dec 13, 2024 • 1h 7min

Windsurf: The Enterprise AI IDE - with Varun and Anshul of Codeium AI

Varun Mohan, CEO of Codeium AI, leads the charge in revolutionizing developer tools with his innovative IDE, Windsurf. He discusses the challenges faced by traditional platforms like VS Code and introduces the advanced features of Windsurf, such as its intuitive interface and enhanced user experience. Varun reveals insights from user feedback post-launch and emphasizes the importance of balancing automation with human input. He also shares Codeium's evolution and strategies for catering to both individual developers and enterprises while maintaining their commitment to free access.
undefined
70 snips
Dec 10, 2024 • 7h 8min

Generative Video WorldSim, Diffusion, Vision, Reinforcement Learning and Robotics — ICML 2024 Part 1

Regular tickets are now sold out for Latent Space LIVE! at NeurIPS! We have just announced our last speaker and newest track, friend of the pod Nathan Lambert who will be recapping 2024 in Reasoning Models like o1! We opened up a handful of late bird tickets for those who are deciding now — use code DISCORDGANG if you need it. See you in Vancouver!We’ve been sitting on our ICML recordings for a while (from today’s first-ever SOLO guest cohost, Brittany Walker), and in light of Sora Turbo’s launch (blogpost, tutorials) today, we figured it would be a good time to drop part one which had been gearing up to be a deep dive into the state of generative video worldsim, with a seamless transition to vision (the opposite modality), and finally robots (their ultimate application).Sora, Genie, and the field of Generative Video World SimulatorsBill Peebles, author of Diffusion Transformers, gave his most recent Sora talk at ICML, which begins our episode:* William (Bill) Peebles - SORA (slides)Something that is often asked about Sora is how much inductive biases were introduced to achieve these results. Bill references the same principles brought by Hyung Won Chung from the o1 team - “sooner or later those biases come back to bite you”.We also recommend these reads from throughout 2024 on Sora.* Lilian Weng’s literature review of Video Diffusion Models* Sora API leak* Estimates of 100k-700k H100s needed to serve Sora (not Turbo)* Artist guides on using Sora for professional storytellingGoogle DeepMind had a remarkably strong presence at ICML on Video Generation Models, winning TWO Best Paper awards for:* Genie: Generative Interactive Environments (covered in oral, poster, and workshop)* VideoPoet: A Large Language Model for Zero-Shot Video Generation (see website)We end this part by taking in Tali Dekel’s talk on The Future of Video Generation: Beyond Data and Scale.Part 2: Generative Modeling and DiffusionSince 2023, Sander Dieleman’s perspectives (blogpost, tweet) on diffusion as “spectral autoregression in the frequency domain” while working on Imagen and Veo have caught the public imagination, so we highlight his talk:* Wading through the noise: an intuitive look at diffusion modelsThen we go to Ben Poole for his talk on Inferring 3D Structure with 2D Priors, including his work on NeRFs and DreamFusion:Then we investigate two flow matching papers - one from the Flow Matching co-authors - Ricky T. Q. Chen (FAIR, Meta)And how it is implemented in Stable Diffusion 3 with Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Our last hit on Diffusion is a couple of oral presentations on speech, which we leave you to explore via our audio podcast* NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models* Speech Self-Supervised Learning Using Diffusion Model Synthetic DataPart 3: VisionThe ICML Test of Time winner was DeCAF, which Trevor Darrell notably called “the OG vision foundation model”.Lucas Beyer’s talk on “Vision in the age of LLMs — a data-centric perspective” was also well received online, and he talked about his journey from Vision Transformers to PaliGemma.We give special honorable mention to MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark.Part 4: Reinforcement Learning and RoboticsWe segue vision into robotics with the help of Ashley Edwards, whose work on both the Gato and the Genie teams at Deepmind is summarized in Learning actions, policies, rewards, and environments from videos alone.Brittany highlighted two poster session papers:* Behavior Generation with Latent Actions* We also recommend Lerrel Pinto’s On Building General-Purpose Robots* PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMsHowever we must give the lion’s share of space to Chelsea Finn, now founder of Physical Intelligence, who gave FOUR talks on* "What robots have taught me about machine learning"* developing robot generalists* robots that adapt autonomously* how to give feedback to your language model* special mention to PI colleague Sergey Levine on Robotic Foundation ModelsWe end the podcast with a position paper that links generative environments and RL/robotics: Automatic Environment Shaping is the Next Frontier in RL.Timestamps* [00:00:00] Intros* [00:02:43] Sora - Bill Peebles* [00:44:52] Genie: Generative Interactive Environments* [01:00:17] Genie interview* [01:12:33] VideoPoet: A Large Language Model for Zero-Shot Video Generation* [01:30:51] VideoPoet interview - Dan Kondratyuk* [01:42:00] Tali Dekel - The Future of Video Generation: Beyond Data and Scale.* [02:27:07] Sander Dieleman - Wading through the noise: an intuitive look at diffusion models* [03:06:20] Ben Poole - Inferring 3D Structure with 2D Priors* [03:30:30] Ricky Chen - Flow Matching* [04:00:03] Patrick Esser - Stable Diffusion 3* [04:14:30] NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models* [04:27:00] Speech Self-Supervised Learning Using Diffusion Model Synthetic Data* [04:39:00] ICML Test of Time winner: DeCAF* [05:03:40] Lucas Beyer: “Vision in the age of LLMs — a data-centric perspective”* [05:42:00] Ashley Edwards: Learning actions, policies, rewards, and environments from videos alone.* [06:03:30] Behavior Generation with Latent Actions interview* [06:09:52] Chelsea Finn: "What robots have taught me about machine learning"* [06:56:00] Position: Automatic Environment Shaping is the Next Frontier in RL Get full access to Latent Space at www.latent.space/subscribe
undefined
206 snips
Dec 2, 2024 • 1h 39min

Bolt.new, Flow Engineering for Code Agents, and >$8m ARR in 2 months as a Claude Wrapper

In this discussion, Itamar Friedman, CEO of Codo, and Eric Simons, CEO of StackBlitz, dive into the rapidly evolving world of AI-powered software development. They explore the impressive rise of Bolt.new, achieving over $8 million ARR in just two months. The duo highlights the transformation of StackBlitz and the role of AI in streamlining code generation. They also discuss challenges in platform development, the significance of user experience, and the exciting tools being developed to empower non-engineers in web development.
undefined
308 snips
Nov 28, 2024 • 1h 11min

The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic

Erik Schluntz, a member of technical staff at Anthropic and former CTO of Cobalt Robotics, dives deep into the advancements of Claude 3.5 Sonnet. He discusses how this model transformed coding agents and tackled challenging benchmarks like SWE-Bench, improving performance dramatically. Erik also explores the complexities of building multimodal tools and the importance of user-friendly design. He shares insights on enhancing AI memory, the significance of real-world applications, and the future of cloud technology in automation.
undefined
34 snips
Nov 25, 2024 • 58min

Why Compound AI + Open Source will beat Closed AI

Lin Qiao, Co-founder and CEO of Fireworks AI, shares insights into the dynamic world of open-source AI. She discusses the challenges and growth of her startup, emphasizing adaptability in a volatile environment. Lin highlights the advantages of open-source over closed source, arguing for its superior scalability and innovation potential. The conversation also delves into the evolution of PyTorch and the concept of Compound AI, showcasing how diverse modalities can enhance user experiences. Ultimately, Lin stresses the importance of community feedback in driving AI advancements.
undefined
141 snips
Nov 15, 2024 • 1h 10min

Agents @ Work: Lindy.ai

Flo Crivello, founder of Lindy.ai and former Uber tech specialist, shares insights on transforming AI workflows. He discusses the evolution from Lindy 1.0's complex interfaces to the user-friendly Lindy 2.0, emphasizing visual workflow design. The conversation dives into the challenges of AI customer support, innovations in evaluation tools, and the intense competition in the AI space. Flo also touches on the impact of remote work on company culture and the role of community in thriving within vertical and horizontal AI solutions.
undefined
36 snips
Nov 11, 2024 • 1h

Agents @ Work: Dust.tt

Stanislas Polu, Co-founder and CEO of Dust.tt, shares insights from his journey through tech giants like Stripe and OpenAI, where he honed his skills in mathematical reasoning. He reveals the inspiration behind Dust's evolution from a developer framework to user-friendly AI assistants. Discussions delve into the blending of creativity and formal verification in problem-solving, the strategic development of LLM products, and the challenges of integrating AI in fraud solutions. Stan also contrasts vertical and horizontal approaches in AI, emphasizing their impact on growth and innovation.
undefined
29 snips
Nov 1, 2024 • 41min

In the Arena: How LMSys changed LLM Benchmarking Forever

Anastasios Angelopoulos and Wei-Lin Chiang, both PhD students at UC Berkeley, lead the Chatbot Arena—a pioneering platform for AI evaluation. They discuss the evolution of crowdsourced benchmarking and the philosophical challenges of measuring AI intelligence. Emphasizing the limitations of static benchmarks, they advocate for user-driven assessments. The duo also tackles human biases in evaluations and the significance of community engagement, showcasing innovative strategies in AI red teaming and collaboration, all aimed at refining how language models are compared.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode