AI Breakdown

agibreakdown
undefined
Dec 27, 2023 • 4min

arxiv preprint - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

In this episode, we discuss MotionCtrl: A Unified and Flexible Motion Controller for Video Generation by Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan. The study introduces MotionCtrl, a novel approach for video generation that can separately regulate camera and object motions, addressing limitations in previous methodologies that lacked precise control over these two motion types. MotionCtrl's design and training strategy reflect the distinct nature of camera and object movements and are less influenced by object appearance, enabling a more nuanced manipulation of motion within generated videos. Experimental results show that MotionCtrl outperforms existing models in its ability to produce diverse and controlled motion dynamics, while also maintaining the capability of adapting to various camera positions and trajectories.
undefined
Dec 26, 2023 • 5min

arxiv preprint - Model-tuning Via Prompts Makes NLP Models Adversarially Robust

In this episode we discuss Model-tuning Via Prompts Makes NLP Models Adversarially Robust by Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachary C. Lipton, Danish Pruthi. The discussed paper presents a new method called Model-tuning Via Prompts (MVP) that significantly improves the adversarial robustness of pretrained language models over the standard multilayer perceptron fine-tuning (MLP-FT) approach. MVP appends a prompt to the input instead of an MLP head, leading to an average 8% performance increase against adversarial attacks across various datasets and models, and even surpassing state-of-the-art defenses by 3.5%. The research suggests that MVP's robustness gains stem from better alignment with pre-training tasks and avoidance of the vulnerabilities introduced by the random initialization of MLP parameters.
undefined
Dec 22, 2023 • 5min

arxiv preprint - Training Chain-of-Thought via Latent-Variable Inference

In this episode we discuss Training Chain-of-Thought via Latent-Variable Inference by Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous. The paper introduces a fine-tuning strategy for large language models that improves their problem-solving accuracy by focusing on maximizing the probability of correct answers using chain-of-thought (CoT) prompts without requiring detailed rationale supervision. It tackles the challenge of sampling from the posterior distribution of possible rationales with a novel Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm, which also incorporates a control-variate technique to reduce variance in gradient estimates. The method outperforms existing fine-tuning methods, including the self-taught reasoner (STaR) and prompt-tuning with CoT, in generating more accurate answers on various complex reasoning tasks.
undefined
Dec 21, 2023 • 4min

arxiv preprint - Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

In this episode we discuss Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation by Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler. The paper presents Marigold, a new method for monocular depth estimation that utilizes the learned priors from generative diffusion models, specifically derived from Stable Diffusion. Marigold is affine-invariant and can be fine-tuned efficiently on synthetic data with a single GPU, offering significant performance improvements, including over 20% gains in certain datasets. The project demonstrates the potential of leveraging the capabilities of generative models for enhancing depth estimation tasks, with a focus on better generalization and state-of-the-art results.
undefined
Dec 20, 2023 • 4min

arxiv preprint - Instruction-tuning Aligns LLMs to the Human Brain

In this episode we discuss Instruction-tuning Aligns LLMs to the Human Brain by Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut. The paper examines whether instruction-tuning, a method for fine-tuning large language models (LLMs), makes their processing more human-like through two metrics: brain alignment and behavioral alignment. Results indicate instruction-tuning increases brain alignment with human neural activity by 6% on average but does not significantly impact behavioral alignment. A strong correlation is found between brain alignment and both the size of the model and its performance on tasks requiring world knowledge, suggesting that as LLMs better encode world knowledge, their internal representations align more closely with human brain activity.
undefined
Dec 19, 2023 • 6min

arxiv preprint - WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

In this episode we discuss WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia by Sina J. Semnani, Violet Z. Yao, Heidi C. Zhang, Monica S. Lam. The paper introduces WikiChat, a chatbot that uses a few-shot Large Language Model (LLM) grounded in Wikipedia to provide accurate, engaging responses with minimal hallucinations. WikiChat was distilled into a smaller 7 billion-parameter model from GPT-4, improving response times and reducing costs without much loss in quality. It outperforms other chatbots in terms of factual accuracy and knowledge coverage, according to a unique evaluation involving human-and-LLM interactions, achieving highly accurate responses and favorable user feedback.
undefined
Dec 18, 2023 • 4min

arxiv preprint - DemoFusion: Democratising High-Resolution Image Generation With No $$$

In this episode we discuss DemoFusion: Democratising High-Resolution Image Generation With No $$$ by Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma. The paper introduces DemoFusion, a framework designed to enhance open-source Latent Diffusion Models (LDMs) for higher-resolution image generation. It incorporates Progressive Upscaling, Skip Residual, and Dilated Sampling to improve image quality while ensuring the process remains accessible to a broader audience. Additionally, DemoFusion's progressive approach allows for intermediate "previews" that support quick iterations of image prompts.
undefined
Dec 15, 2023 • 5min

arxiv preprint - Recommender Systems with Generative Retrieval

In this episode we discuss Recommender Systems with Generative Retrieval by Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, Maheswaran Sathiamoorthy. The paper presents a novel generative approach for large-scale retrieval in recommender systems, where a model autoregressively decodes the identifiers (Semantic IDs) of target items. It introduces Semantic IDs, composed of semantically meaningful tuples, to represent items, and uses a Transformer-based sequence-to-sequence model to predict the next item a user will interact with based on their session history. The approach outperforms current state-of-the-art models on multiple datasets and demonstrates improved generalization, effectively retrieving items without prior interactions.
undefined
Dec 14, 2023 • 4min

arxiv preprint - Mamba: Linear-Time Sequence Modeling with Selective State Spaces

In this episode we discuss Mamba: Linear-Time Sequence Modeling with Selective State Spaces by Albert Gu, Tri Dao. The paper presents Mamba, an innovative neural network architecture that outperforms traditional Transformer models, especially in handling very long sequences. Mamba's design incorporates selective structured state space models (SSMs) whose parameters depend on input tokens, enabling content-based reasoning and memory management over sequence lengths. The result is a model with fast inference, linear scaling with sequence length, and state-of-the-art performance in various modalities, including language, audio, and genomics, even surpassing Transformers that are twice its size.
undefined
Dec 13, 2023 • 5min

arxiv preprint - Block-State Transformers

In this episode we discuss Block-State Transformers by Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin. The paper introduces the Block-State Transformer (BST) architecture that merges state space models and block-wise attention to effectively capture long-range dependencies and improve performance on language modeling tasks. The BST incorporates an SSM sublayer for long-range context and a Block Transformer sublayer for local sequence processing, enhancing parallellization and combining the strengths of both model types. Experiments demonstrate the BST's superior performance over traditional Transformers in terms of perplexity, generalization to longer sequences, and a significant acceleration in processing speed due to model parallelization.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app