#149 - Reflecting on 2023, Midjourney v6, Anthropic Revenue, Unified-IO 2, NY Times sues OpenAI
Jan 7, 2024
auto_awesome
In this episode, Davis Summarizes discusses the reflections on the year 2023 and the interesting developments in the language model and chatbot space. They also talk about onboard AI on Google phone and Samsung's AI-driven smartphone features. Additionally, they discuss the revenue projections and growth of AI companies like Microsoft, Anthropic, and OpenAI. The chapter also explores true multi-modality and robot control, and XAI's registration as a benefit corporation.
Anthropic forecasts more than $850 mln in annualized revenue rate by 2024-end.
A new auto-aggressive multi-modal model has been developed, enabling the understanding and generation of images, text, audio, and action.
Deep dives
generated title of the paragraph: Unified IO2 Scaling - Auto-Aggressive Multi-Modal Models for Image Understanding and Generation
generated body text of the paragraph: In a collaborative effort between the Allen Institute for AI and academic partners, a new auto-aggressive multi-modal model has been developed. This model has the capability to understand and generate images, covering various modalities such as vision, language, audio, and action. The unified IO2 scaling approach enables the model to process and generate multi-modal data more efficiently and effectively. This advancement opens up possibilities for improved image recognition, captioning, and generation, with potential applications in fields such as computer vision, natural language processing, and multimedia analysis.
Multi-Modal AI System with Unified Embedding Space
A research organization has developed a multi-modal AI system that can take various types of inputs, such as text, images, videos, and audio, and generate outputs in any of these modalities. The system merges all these inputs into a shared embedding space, allowing the model to understand and generate different types of data consistently. This represents a paradigm shift towards true multi-modality, expanding beyond previous systems that could only generate text outputs. The model's capabilities were demonstrated through impressive outputs, including image and video generation, as well as controlling a robot.
Task Contamination and Evaluation of Language Models
Researchers have highlighted the issue of task contamination and its impact on evaluating language models. Task contamination refers to benchmarks or tasks being influenced by examples that were already available on the internet at the time of training the model. The researchers found that models performed better on data sets that were not available during training, suggesting that the models had memorized previous examples rather than demonstrating true capabilities. This raises concerns about the reliability of evaluating language models and calls into question the validity of performance metrics on existing benchmarks.
Our 149th episode with a summary and discussion of last week's big AI news!
Check out our sponsor, the SuperDataScience podcast. You can listen to SDS across all major podcasting platforms (e.g., Spotify, Apple Podcasts, Google Podcasts) plus there’s a video version on YouTube.