Deep Papers

Arize AI
undefined
Jul 8, 2025 • 31min

Self-Adapting Language Models: Paper Authors Discuss Implications

Discover how self-adapting language models can redefine AI. The hosts dive into innovative self-editing techniques and the role of reinforcement learning in enhancing model performance. They discuss the challenges of catastrophic forgetting and gradient interference, alongside unique methods like LoRa for efficient updates. Excitingly, they explore the future of pre-training, revealing how models can forge their own learning paths. Get ready for a fascinating look at the evolution of language models!
undefined
10 snips
Jun 20, 2025 • 31min

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

The discussion revolves around a compelling new paper from Apple, challenging traditional evaluations of AI reasoning. It reveals how Large Reasoning Models (LRMs) surprisingly falter on complex tasks while Large Language Models (LLMs) shine in simpler scenarios. The conversation dives into the nuances of problem-solving, contrasting human creativity with algorithmic execution, especially with something as intricate as Rubik's cubes. A philosophical debate unfolds, questioning whether the reasoning showcased by AI is truly genuine or merely an illusion.
undefined
Jun 4, 2025 • 25min

Accurate KV Cache Quantization with Outlier Tokens Tracing

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by introducing a method to identify and exclude outlier tokens that hurt quantization accuracy, striking a better balance between efficiency and performance.Read the paperAccess the slides Read the blogJoin us for Arize ObserveLearn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
May 16, 2025 • 29min

Scalable Chain of Thoughts via Elastic Reasoning

Explore the innovative concept of Elastic Reasoning, a framework that enhances reasoning models by separating the thinking process from finding solutions. Delve into its advancements that improve output quality while managing resource constraints. Learn how these strategies optimize performance in multi-tool agents and reduce AI hallucinations. Discover practical applications that enhance user experience in critical tasks. Finally, discuss the push for sustainable, lightweight models to tackle environmental challenges in AI technology.
undefined
May 2, 2025 • 30min

Sleep-time Compute: Beyond Inference Scaling at Test-time

Imagine if your AI could anticipate your questions before you even ask! This intriguing discussion centers on sleep-time compute, a method allowing models to prepare answers during idle moments. By precomputing reasoning steps, it significantly cuts down on latency and costs while boosting accuracy. The talk dives into new benchmarks showing impressive reductions in compute use and cost. Additionally, the potential of leveraging idle GPUs for improved efficiency and the challenges of optimizing resources in AI systems make for a fascinating listen.
undefined
Apr 18, 2025 • 27min

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

For this week's paper read, we dive into our own research.We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost. So, over the past few weeks, the Arize team generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.We talk about what we built, the process we took, and the bottom line results. You can read the recap of LibreEval here. Dive into the research, or sign up to join us next time. Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Apr 4, 2025 • 26min

AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam

Dive into the advancements of Google's Gemini 2.5 as it tackles the Humanities Last Exam, showcasing its impressive reasoning and multimodal capabilities. Discover how this AI model outperforms rivals in key benchmarks and the complexities it faces in expert-level problem-solving. The discussion also highlights the significance of traditional benchmarks and the ongoing debate about model optimization versus overall performance. Finally, learn about the community's role in shaping the future of AI evaluation and collaboration.
undefined
Mar 25, 2025 • 15min

Model Context Protocol (MCP)

We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was well worth digging into. Learn how this open standard is revolutionizing AI by enabling seamless integration between LLMs and external data sources, fundamentally transforming them into capable, context-aware agents. We explore the key benefits of MCP, including enhanced context retention across interactions, improved interoperability for agentic workflows, and the development of more capable AI agents that can execute complex tasks in real-world environments.Read our analysis of MCP on the blog, or dive into the latest AI research. Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Mar 1, 2025 • 30min

AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs

This podcast explores cutting-edge AI developments, including DeepSeek's launch of FlashMLA, a revolutionary decoding kernel for NVIDIA GPUs. It also dives into Claude 3.7, showcasing its hybrid reasoning capabilities and improvements in AI coding assistance. The discussion highlights DeepSeek's new DPP communication library and the strategic optimizations for server efficiency. With a focus on benchmarking AI innovations and open-source advancements, listeners gain insights into the latest trends that are shaping the future of artificial intelligence.
undefined
Feb 21, 2025 • 30min

How DeepSeek is Pushing the Boundaries of AI Development

Discover the remarkable advancements in AI with DeepSeek, particularly its groundbreaking inference speed. The team discusses the evolution of AI reasoning and the innovative use of reinforcement learning techniques. Dive into the challenges and triumphs of local deployment, along with the playful nature of these models. A live demo showcases practical applications like sentiment analysis and topic modeling, revealing the fine-tuning capabilities of the DeepSeek model. Explore the exciting future of AI shaped by major tech investments.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app