Deep Papers

Arize AI

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

Episodes

Mentioned books

Mar 15, 2024 • 45min

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.

Dec 18, 2023 • 45min

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

We’re thrilled to be joined by Shuaichen Chang, LLM researcher and the author of this week’s paper to discuss his findings. Shuaichen’s research investigates the impact of prompt constructions on the performance of large language models (LLMs) in the text-to-SQL task, particularly focusing on zero-shot, single-domain, and cross-domain settings. Shuaichen and his team explore various strategies for prompt construction, evaluating the influence of database schema, content representation, and prompt length on LLMs’ effectiveness. The findings emphasize the importance of careful consideration in constructing prompts, highlighting the crucial role of table relationships and content, the effectiveness of in-domain demonstration examples, and the significance of prompt length in cross-domain scenarios.Read the blog and watch the discussion: https://arize.com/blog/how-to-prompt-llms-for-text-to-sql-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Nov 30, 2023 • 41min

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

In this podcast, Samuel Marks, a Postdoctoral Research Associate at Northeastern University, discusses his paper on the linear structure of true/false datasets in LLM representations. They explore how language models can linearly represent truth or falsehood, introduce a new probing technique called mass mean probing, and analyze the process of embedding truth in LLM models. They also discuss the future research directions and limitations of the paper.

Nov 20, 2023 • 45min

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

In this paper read, we discuss “Towards Monosemanticity: Decomposing Language Models Into Understandable Components,” a paper from Anthropic that addresses the challenge of understanding the inner workings of neural networks, drawing parallels with the complexity of human brain function. It explores the concept of “features,” (patterns of neuron activations) providing a more interpretable way to dissect neural networks. By decomposing a layer of neurons into thousands of features, this approach uncovers hidden model properties that are not evident when examining individual neurons. These features are demonstrated to be more interpretable and consistent, offering the potential to steer model behavior and improve AI safety.Find the transcript and more here: https://arize.com/blog/decomposing-language-models-with-dictionary-learning-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Oct 18, 2023 • 44min

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

We discuss RankVicuna, the first fully open-source LLM capable of performing high-quality listwise reranking in a zero-shot setting. While researchers have successfully applied LLMs such as ChatGPT to reranking in an information retrieval context, such work has mostly been built on proprietary models hidden behind opaque API endpoints. This approach yields experimental results that are not reproducible and non-deterministic, threatening the veracity of outcomes that build on such shaky foundations. RankVicuna provides access to a fully open-source LLM and associated code infrastructure capable of performing high-quality reranking.Find the transcript and more here: https://arize.com/blog/rankvicuna-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Deep Papers

Episodes

Mentioned books

Reinforcement Learning in the Era of LLMs

Sora: OpenAI’s Text-to-Video Generation Model

RAG vs Fine-Tuning