Yoav, co-founder of AI21, discusses the groundbreaking Jamba model, a fusion of non-transformer and attention layers for efficient AI. The conversation covers the evolution of language models, Meta's llama 3 release, efficient AI system architecture, and the future of AI focusing on trust and reliability.
Jamba combines Mamba's non-transformer goodness with attention layers for a highly performant model.
AI21 emphasizes the importance of large language models in unlocking the value of text data.
Deep dives
Overview of AI21's History and Mission
AI21 was founded with the vision of integrating deep learning and reasoning to advance AI. They believe that modern AI necessitates a combination of statistics and reasoning. Initially focused on vision, AI21 transitioned to language models, noting the complexity and nuances of language. With a focus on the enterprise, they aim to unlock the value of text data, highlighting the significance of large language models in understanding and processing textual information.
Evolution of AI21's Language Models
AI21 developed models like Jurassic One and later Jamba to address the limitations of existing models. Jamba, their latest model, incorporates elements of both Mamba's structure space state model and transformer models. The company emphasizes serving efficiency and scalability, aiming for a single 80GB GPU deployment. The release of Jamba as an open-source model aims to encourage community contributions and innovations in model optimization and training infrastructure.
Jamba's Implications for Enterprise Innovation
The release of Jamba aims to enhance serving efficiency and extend the adaptability of models across various infrastructures. Task-specific models within the Jamba family enable tailored solutions for diverse enterprise needs. By providing a base model for experimentation and fine-tuning, AI21 encourages advancements in model performance and efficiency within the enterprise context.
Future Vision for AI21 and the Industry
AI21's future vision emphasizes trust, reliability, and the fundamental understanding of language models. They foresee a shift towards more sophisticated AI systems that truly comprehend tasks, ensuring practical applications align with understanding. The industry trajectory highlights the importance of specialized models, robust AI systems, and the philosophical pursuit of model understanding as key drivers of innovation.
First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.
Changelog++ members save 3 minutes on this episode because they made the ads disappear. Join today!
Sponsors:
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.