#152 - live translation on phones, Meta aims at AGI, AlphaGeometry, political deepfakes
Jan 28, 2024
auto_awesome
This episode covers topics such as live translation on Samsung phones, Meta's aim for AGI, AI-powered Premiere Pro features, Waymo's plan for robotaxis in LA, OpenAI CEO Sam Altman's pursuit of AI chips, ElevenLabs' $80M funding round and marketplace of cloned voices, and the DOJ and FTC investigating Microsoft's OpenAI partnership.
Alpha Geometry achieves impressive performance in solving complex geometry problems.
Stability AI introduces Stable LM2 1.6B, a powerful language model for code generation.
Google's Lumiere model generates near-photorealistic videos in one pass, improving text-to-video generation.
Deep dives
Alpha Geometry: Advancing AI in Geometry Problem Solving
Google DeepMind has developed Alpha Geometry, an Olympiad-level AI system for solving complex geometry problems. Alpha Geometry combines a language model with a symbolic deduction engine to achieve high-level strategy and mathematical proof generation. The neurosymbolic approach demonstrates impressive performance, outperforming even top human geometrists.
Stability AI Releases Smaller, More Efficient Language Model
Stability AI introduces Stable LM2 1.6B, a 1.6 billion parameter language model that delivers remarkable performance comparable to larger models like Microsoft Fi. The release also includes table code 3B for code generation, demonstrating Stability's commitment to developing advanced models and furthering open-source contributions.
Lumiere: Google's Innovative Video Generation Model
Google's Lumiere is a new video generation model that adopts a one-pass approach, generating videos in their entirety rather than frame-by-frame. The space-time diffusion model showcases state-of-the-art text-to-video generation, producing near-photorealistic results with improved consistency through simultaneous generation.
Chat QA Achieves Conversational QA Accuracy Comparable to GPT-4
Researchers propose a novel two-stage approach in training conversational QA models, achieving GP4-level accuracies. The approach combines supervised fine-tuning with a retrieval augmented generation process, enabling the model to utilize an external database for answering questions, improving its conversational capabilities.
Retrieval-Augmented Generation Models Show Promise in Language Modeling
A breakthrough in retrieval-augmented generation models has shown comparable performance to GPT-4 in answering queries and generating responses based on relevant information from documents. A team of NVIDIA researchers developed a large dataset of 7,000 conversational dialogues to train the models. This approach enhances the grounding of language models in reality, reducing the risk of generating inaccurate or hallucinated responses.
Vision Mamba: A Promising State-Space Model for Visual Representation Learning
Researchers have introduced Vision Mamba, an adaptation of state-space models for visual tasks. Vision Mamba demonstrates promising results in various vision tasks such as classification, detection, and segmentation. This model, based on the previous success of Mamba in language modeling, offers a potential alternative to transformer-based models, addressing scalability and efficiency in sequence-based data processing.