

Deep Papers
Arize AI
Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
Episodes
Mentioned books

Apr 26, 2024 • 45min
Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models
Exploring the ReAct approach in language models, combining reasoning and actionable outputs. Discussion on challenges of interpretability in LM and the importance of self-reflection. Comparing reasoning-only and action-only methods in QA tasks. Reducing hallucinations through model fine-tuning. Implementing chatbox class with OpenAI and enhancing models with self-reflection and decision-making strategies.

Apr 4, 2024 • 45min
Demystifying Chronos: Learning the Language of Time Series
This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization. Chronos however, is built on a language model architecture and trained with billions of tokenized time series observations, enabling it to provide accurate zero-shot forecasts matching or exceeding purpose-built models.We dive into time series forecasting, some recent research our team has done, and take a community pulse on what people think of Chronos. Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Mar 25, 2024 • 43min
Anthropic Claude 3
The podcast delves into the latest buzz in AI with the arrival of Claude 3, challenging GPT-4. It explores new models in the LLM space like Haiku, Sonnet, and Opus, offering a balance of intelligence, speed, and cost. The discussion covers AI ethics, model transparency, prompting techniques, and advancements in text and code generation with creative visualizations. It also addresses improvements in AI models, language challenges, and the future of AI technology.

Mar 15, 2024 • 45min
Reinforcement Learning in the Era of LLMs
Exploring reinforcement learning in the era of LLMs, the podcast discusses the significance of RLHF techniques in improving LLM responses. Topics include LM alignment, online vs offline RL, credit assignment, prompting strategies, data embeddings, and mapping RL principles to language models.

Mar 1, 2024 • 45min
Sora: OpenAI’s Text-to-Video Generation Model
This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: Sora.According to OpenAI, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.” At the time of this recording, the model had not been widely released yet, but was becoming available to red teamers to assess risk, and also to artists to receive feedback on how Sora could be helpful for creatives.At the end of our discussion, we also explore EvalCrafter: Benchmarking and Evaluating Large Video Generation Models. This recent paper proposed a new framework and pipeline to exhaustively evaluate the performance of the generated videos, which we look at in light of Sora.Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Feb 8, 2024 • 40min
RAG vs Fine-Tuning
This podcast explores the tradeoffs between RAG and fine-tuning for LLMs. It discusses implementing RAG in production, question and answer generation using JSON and LOM models, using GPT for test question generation in agriculture, evaluating relevance in email retrieval, and the use of RAG and fine-tuning for QA pair generation.

Feb 2, 2024 • 36min
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels
We discuss HyDE: a thrilling zero-shot learning technique that combines GPT-3’s language understanding with contrastive text encoders. HyDE revolutionizes information retrieval and grounding in real-world data by generating hypothetical documents from queries and retrieving similar real-world documents. It outperforms traditional unsupervised retrievers, rivaling fine-tuned retrievers across diverse tasks and languages. This leap in zero-shot learning efficiently retrieves relevant real-world information without task-specific fine-tuning, broadening AI model applicability and effectiveness. Link to transcript and live recording: https://arize.com/blog/hyde-paper-reading-and-discussion/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Feb 2, 2024 • 44min
Phi-2 Model
The podcast delves into the Phi-2 model, showcasing its superior performance compared to larger models on various benchmarks, especially in coding and math tasks. Despite its smaller size, Phi-2 outperforms Google's Gemini Nano 2 model. The discussion also covers the benefits of small language models over large ones, including trainability with less data and easier fine-tuning for specific tasks.

Dec 27, 2023 • 48min
A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I
ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.

Dec 18, 2023 • 45min
How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings
We’re thrilled to be joined by Shuaichen Chang, LLM researcher and the author of this week’s paper to discuss his findings. Shuaichen’s research investigates the impact of prompt constructions on the performance of large language models (LLMs) in the text-to-SQL task, particularly focusing on zero-shot, single-domain, and cross-domain settings. Shuaichen and his team explore various strategies for prompt construction, evaluating the influence of database schema, content representation, and prompt length on LLMs’ effectiveness. The findings emphasize the importance of careful consideration in constructing prompts, highlighting the crucial role of table relationships and content, the effectiveness of in-domain demonstration examples, and the significance of prompt length in cross-domain scenarios.Read the blog and watch the discussion: https://arize.com/blog/how-to-prompt-llms-for-text-to-sql-paper-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.