Deep Papers cover image

Deep Papers

Latest episodes

undefined
Jan 14, 2025 • 25min

Training Large Language Models to Reason in Continuous Latent Space

The discussion highlights recent advancements in AI, including NVIDIA's innovations and a new platform for robotics. A standout topic is the groundbreaking Coconut method, which allows large language models to reason in a continuous latent space, breaking away from traditional language constraints. This innovative approach promises to enhance the efficiency and performance of AI systems, making reasoning more fluid and adaptable. Stay tuned for insights into the interconnected future of AI!
undefined
Dec 23, 2024 • 29min

LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods

Explore the fascinating world of large language models as judges. Discover their benefits over traditional methods, including enhanced accuracy and consistency. Delve into the various evaluation methodologies and the crucial role human evaluators play. Learn about techniques for improving model performance and the applications in summarization and retrieval-augmented generation. The discussion also highlights significant limitations and ethical concerns, emphasizing the need for audits and domain expertise to ensure responsible AI use.
undefined
Dec 10, 2024 • 29min

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Discover how collaborative strategies can enhance the efficiency of large language models. The discussion dives into potential methods like merging, ensemble, and cooperation, emphasizing their unique strengths. Learn about the impressive open-source ULMO 2 model and its implications for transparency in AI. The podcast also tackles the innovative Pareto frontier metric for evaluating performance, alongside the importance of reflection phases in multi-step agents to optimize their outputs. Tune in for insights that bridge collaboration and AI advancements!
undefined
Nov 23, 2024 • 25min

Agent-as-a-Judge: Evaluate Agents with Agents

Discover the innovative 'Agent-as-a-Judge' framework, where agents grade each other’s performance, offering a refreshing take on evaluation. Traditional methods often miss the mark, but this approach promises continuous feedback throughout tasks. Dive into the development of the DevAI benchmarking dataset aimed at real-world coding evaluations. Compare the capabilities of new agents against traditional ones and witness how scalable self-improvement could revolutionize performance measurement!
undefined
Nov 12, 2024 • 30min

Introduction to OpenAI's Realtime API

We break down OpenAI’s realtime API. Learn how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or enhancing real-time collaboration, we walk through the API’s capabilities, potential use cases, and best practices for implementation. Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Oct 29, 2024 • 47min

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

Discover the fascinating world of OpenAI's Swarm, an experimental framework designed for managing multi-agent systems. The conversation highlights Swarm's educational focus and simplicity. Learn how multiple agents can collaborate effectively, illustrated by a practical airline customer support example. Explore the synergy between large language models and traditional coding for enhanced adaptability. The podcast also compares Swarm with other frameworks, emphasizing its unique advantages in real-world applications like customer service.
undefined
Oct 24, 2024 • 4min

KV Cache Explained

In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but under-discussed component enables the seamless and snappy interactions we expect from modern AI systems.Harrison Chu breaks down how the KV cache works, how it relates to the transformer architecture, and why it's crucial for efficient AI responses. By the end of the episode, you'll have a clearer understanding of how top AI products leverage this technology to deliver fast, high-quality user experiences. Tune in for a simplified explanation of attention heads, KQV matrices, and the computational complexities they present.Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Oct 16, 2024 • 4min

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler. This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transforming LLMs. Harrison talks about how this method improves upon traditional sampling strategies by leveraging entropy and varentropy to produce more dynamic and intelligent responses. Explore its potential to enhance open-source AI models and enable human-like reasoning in smaller language models. Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Oct 15, 2024 • 43min

Google's NotebookLM and the Future of AI-Generated Audio

This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive into some technical underpinnings of the product, specifically the SoundStorm model used for generating high-quality audio, and how it leverages a hierarchical vector quantization approach (RVQ) to maintain consistency in speaker voice and tone throughout long audio durations. The discussion also touches on ethical implications of such technology, particularly the potential for hallucinations and the need to balance creative freedom with factual accuracy. We close out with a few hot takes, and speculate on the future of AI-generated audio. Learn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Sep 27, 2024 • 42min

Exploring OpenAI's o1-preview and o1-mini

OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than their other models, especially science and math questions. We take a closer look at their latest crop of o1 models, and we also highlight some research our team did to see how they stack up against Claude Sonnet 3.5--using a real world use case. Read it on our blog:  https://arize.com/blog/exploring-openai-o1-preview-and-o1-miniLearn more about AI observability and evaluation in our course, join the Arize AI Slack community or get the latest on LinkedIn and X.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode