Deep Papers cover image

Deep Papers

Latest episodes

undefined
Feb 8, 2024 • 40min

RAG vs Fine-Tuning

This podcast explores the tradeoffs between RAG and fine-tuning for LLMs. It discusses implementing RAG in production, question and answer generation using JSON and LOM models, using GPT for test question generation in agriculture, evaluating relevance in email retrieval, and the use of RAG and fine-tuning for QA pair generation.
undefined
Feb 2, 2024 • 36min

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

We discuss HyDE: a thrilling zero-shot learning technique that combines GPT-3’s language understanding with contrastive text encoders. HyDE revolutionizes information retrieval and grounding in real-world data by generating hypothetical documents from queries and retrieving similar real-world documents. It outperforms traditional unsupervised retrievers, rivaling fine-tuned retrievers across diverse tasks and languages. This leap in zero-shot learning efficiently retrieves relevant real-world information without task-specific fine-tuning, broadening AI model applicability and effectiveness. Link to transcript and live recording: https://arize.com/blog/hyde-paper-reading-and-discussion/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Feb 2, 2024 • 44min

Phi-2 Model

The podcast delves into the Phi-2 model, showcasing its superior performance compared to larger models on various benchmarks, especially in coding and math tasks. Despite its smaller size, Phi-2 outperforms Google's Gemini Nano 2 model. The discussion also covers the benefits of small language models over large ones, including trainability with less data and easier fine-tuning for specific tasks.
undefined
Dec 27, 2023 • 48min

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.
undefined
Dec 18, 2023 • 45min

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

We’re thrilled to be joined by Shuaichen Chang, LLM researcher and the author of this week’s paper to discuss his findings. Shuaichen’s research investigates the impact of prompt constructions on the performance of large language models (LLMs) in the text-to-SQL task, particularly focusing on zero-shot, single-domain, and cross-domain settings. Shuaichen and his team explore various strategies for prompt construction, evaluating the influence of database schema, content representation, and prompt length on LLMs’ effectiveness. The findings emphasize the importance of careful consideration in constructing prompts, highlighting the crucial role of table relationships and content, the effectiveness of in-domain demonstration examples, and the significance of prompt length in cross-domain scenarios.Read the blog and watch the discussion: https://arize.com/blog/how-to-prompt-llms-for-text-to-sql-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Nov 30, 2023 • 41min

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

In this podcast, Samuel Marks, a Postdoctoral Research Associate at Northeastern University, discusses his paper on the linear structure of true/false datasets in LLM representations. They explore how language models can linearly represent truth or falsehood, introduce a new probing technique called mass mean probing, and analyze the process of embedding truth in LLM models. They also discuss the future research directions and limitations of the paper.
undefined
Nov 20, 2023 • 45min

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

In this paper read, we discuss “Towards Monosemanticity: Decomposing Language Models Into Understandable Components,” a paper from Anthropic that addresses the challenge of understanding the inner workings of neural networks, drawing parallels with the complexity of human brain function. It explores the concept of “features,” (patterns of neuron activations) providing a more interpretable way to dissect neural networks. By decomposing a layer of neurons into thousands of features, this approach uncovers hidden model properties that are not evident when examining individual neurons. These features are demonstrated to be more interpretable and consistent, offering the potential to steer model behavior and improve AI safety.Find the transcript and more here: https://arize.com/blog/decomposing-language-models-with-dictionary-learning-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Oct 18, 2023 • 44min

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

We discuss RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. While researchers have successfully applied LLMs such as ChatGPT to reranking in an information retrieval context, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. RankVicuna provides access to a fully open-source LLM and associated code infrastructure capable of performing high-quality reranking.Find the transcript and more here: https://arize.com/blog/rankvicuna-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Oct 17, 2023 • 36min

Explaining Grokking Through Circuit Efficiency

The podcast explores the concept of grokking and its relationship with network performance. It discusses the use of circuits as modules, module addition and generalization, balancing cross entropy loss and weight decay in deep learning models, circuit efficiency and its role in performance, grokking and the impact on model strength, and the relationship between circuit efficiency and generalization.
undefined
Sep 29, 2023 • 42min

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.  In this episode, we discuss the paper, “Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior.” This episode is led by SallyAnn Delucia (ML Solutions Engineer, Arize AI), and Amber Roberts (ML Solutions Engineer, Arize AI).  The research they discuss highlights that while LLMs have great generalization capabilities, they struggle to effectively predict and optimize communication to get the desired receiver behavior. We’ll explore whether this might be because of a lack of “behavior tokens” in LLM training corpora and how Large Content Behavior Models (LCBMs) might help to solve this issue.Find the transcript and more here: https://arize.com/blog/large-content-and-behavior-models-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner