#163 - Llama 3, Grok-1.5 Vision, new Atlas robot, RHO-1, Medium ban
Apr 24, 2024
auto_awesome
Llama 3 and Grok-1.5 Vision claim the spotlight with their AI advancements. Boston Dynamics unveils a new Atlas robot for commercial use. US blacklists key partners, while Elon Musk hints at massive GPU requirements for Grok 3 model. Dr. Andrew Ng joins Amazon's board, and collaborative robotics secures a $100M investment. Exciting advancements in AI and robotics technology unfold in this episode.
Llama 3 by Meta touted as top open model, Grok-1.5 Vision overshadows GPT-4V, Reka Core competes with GPT-4
Boston Dynamics introduces new commercial Atlas robot, TSMC's $65B investment lacks complete chip puzzle, China blacklists Intel's key partner
Next-gen Grok 3 model requires 100,000 Nvidia H100 GPUs, Dr. Andrew Ng joins Amazon Board, $100M funding for Collaborative Robotics
Deep dives
Selective Language Modeling: Efficient Training on Useful Tokens
This paper introduces the concept of selective language modeling, focusing on training only on tokens that align with distribution instead of predicting every token. By selectively training on challenging tokens, the model optimizes performance more efficiently. They use a reference model to measure loss on each token, allowing for a more efficient training scheme, resulting in a 16% performance boost on benchmarks for 1 billion parameter models.
Optimal Granularity in Mixture of Experts
Investigating the ideal number of experts for models with various sizes, this paper highlights that the optimal granularity in a mixture of experts increases as model size scales up. The mixture of experts models are found to be more efficient than dense transformers across different model sizes. This empirical work contributes to understanding efficient training strategies for large models.
R-H-O-1: Training Dynamics in Language Models
This study delves into token-level training dynamics, identifying different types of tokens based on their difficulty in training. They propose a selective training approach, focusing on tokens that are harder to learn, resulting in faster convergence and improved performance. The method leads to a 16% performance boost relative to other models, showcasing the benefits of tailored training strategies.
Scaling Loss for Mixture of Experts
Exploring the granularity parameter in mixture of experts models, this research determines that the optimal granularity changes as model size increases. While the granularity initially is lower for smaller models, scaling up necessitates more granularity. The study highlights the superiority of mixture of experts over dense transformers and provides insights into optimizing training strategies for efficient model performance.
Study Reveals Hardware Costs of Fine-Grained Communication
The podcast episode delves into the impact of fine-grained communication on hardware costs in AI systems. Sending data to numerous experts, loading experts on GPUs, and coordinating these processes can become prohibitively expensive. A key takeaway is the compute cost of routing decisions for experts, highlighting the complexity and cost implications involved.
Medium Implements Policy Against Fully AI-Generated Content
Medium has introduced a new policy prohibiting fully AI-generated stories from appearing behind paywalls. Users may face expulsion from Medium for violating this rule, which aims to maintain authentic content creation on the platform. Additionally, the policy requires proper labeling and sourcing of AI-generated images to ensure transparency and authenticity.