Deep Papers

Arize AI
undefined
Mar 15, 2024 • 45min

Reinforcement Learning in the Era of LLMs

Exploring reinforcement learning in the era of LLMs, the podcast discusses the significance of RLHF techniques in improving LLM responses. Topics include LM alignment, online vs offline RL, credit assignment, prompting strategies, data embeddings, and mapping RL principles to language models.
undefined
Mar 1, 2024 • 45min

Sora: OpenAI’s Text-to-Video Generation Model

This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: Sora.According to OpenAI, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.” At the time of this recording, the model had not been widely released yet, but was becoming available to red teamers to assess risk, and also to artists to receive feedback on how Sora could be helpful for creatives.At the end of our discussion, we also explore EvalCrafter: Benchmarking and Evaluating Large Video Generation Models. This recent paper proposed a new framework and pipeline to exhaustively evaluate the performance of the generated videos, which we look at in light of Sora.Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Feb 8, 2024 • 40min

RAG vs Fine-Tuning

This podcast explores the tradeoffs between RAG and fine-tuning for LLMs. It discusses implementing RAG in production, question and answer generation using JSON and LOM models, using GPT for test question generation in agriculture, evaluating relevance in email retrieval, and the use of RAG and fine-tuning for QA pair generation.
undefined
Feb 2, 2024 • 44min

Phi-2 Model

The podcast delves into the Phi-2 model, showcasing its superior performance compared to larger models on various benchmarks, especially in coding and math tasks. Despite its smaller size, Phi-2 outperforms Google's Gemini Nano 2 model. The discussion also covers the benefits of small language models over large ones, including trainability with less data and easier fine-tuning for specific tasks.
undefined
Feb 2, 2024 • 36min

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

We discuss HyDE: a thrilling zero-shot learning technique that combines GPT-3’s language understanding with contrastive text encoders. HyDE revolutionizes information retrieval and grounding in real-world data by generating hypothetical documents from queries and retrieving similar real-world documents. It outperforms traditional unsupervised retrievers, rivaling fine-tuned retrievers across diverse tasks and languages. This leap in zero-shot learning efficiently retrieves relevant real-world information without task-specific fine-tuning, broadening AI model applicability and effectiveness. Link to transcript and live recording: https://arize.com/blog/hyde-paper-reading-and-discussion/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Dec 27, 2023 • 48min

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.
undefined
Dec 18, 2023 • 45min

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

We’re thrilled to be joined by Shuaichen Chang, LLM researcher and the author of this week’s paper to discuss his findings. Shuaichen’s research investigates the impact of prompt constructions on the performance of large language models (LLMs) in the text-to-SQL task, particularly focusing on zero-shot, single-domain, and cross-domain settings. Shuaichen and his team explore various strategies for prompt construction, evaluating the influence of database schema, content representation, and prompt length on LLMs’ effectiveness. The findings emphasize the importance of careful consideration in constructing prompts, highlighting the crucial role of table relationships and content, the effectiveness of in-domain demonstration examples, and the significance of prompt length in cross-domain scenarios.Read the blog and watch the discussion: https://arize.com/blog/how-to-prompt-llms-for-text-to-sql-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Nov 30, 2023 • 41min

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

In this podcast, Samuel Marks, a Postdoctoral Research Associate at Northeastern University, discusses his paper on the linear structure of true/false datasets in LLM representations. They explore how language models can linearly represent truth or falsehood, introduce a new probing technique called mass mean probing, and analyze the process of embedding truth in LLM models. They also discuss the future research directions and limitations of the paper.
undefined
Nov 20, 2023 • 45min

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

In this paper read, we discuss “Towards Monosemanticity: Decomposing Language Models Into Understandable Components,” a paper from Anthropic that addresses the challenge of understanding the inner workings of neural networks, drawing parallels with the complexity of human brain function. It explores the concept of “features,” (patterns of neuron activations) providing a more interpretable way to dissect neural networks. By decomposing a layer of neurons into thousands of features, this approach uncovers hidden model properties that are not evident when examining individual neurons. These features are demonstrated to be more interpretable and consistent, offering the potential to steer model behavior and improve AI safety.Find the transcript and more here: https://arize.com/blog/decomposing-language-models-with-dictionary-learning-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
undefined
Oct 18, 2023 • 44min

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

We discuss RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. While researchers have successfully applied LLMs such as ChatGPT to reranking in an information retrieval context, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. RankVicuna provides access to a fully open-source LLM and associated code infrastructure capable of performing high-quality reranking.Find the transcript and more here: https://arize.com/blog/rankvicuna-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app