DisTrO and the Quest for Community-Trained AI Models
Sep 27, 2024
auto_awesome
Bowen Peng and Jeffrey Quesnelle from Nous Research discuss their mission to revive open-source AI, emphasizing the DisTrO project, which enables rapid training of AI models over the internet. They explore the challenges faced by independent builders in AI and the critical role of community collaboration. The conversation dives into impressive innovations like the Hermes models, designed for neutral interactions and enhanced with synthetic data. They reflect on the tension between decentralization and centralization in AI protocols and advocate for community-driven solutions.
The DisTrO project exemplifies a paradigm shift in AI model training, utilizing decentralized networks to drastically reduce bandwidth requirements and enhance accessibility.
Nous Research emphasizes the importance of open-source innovation as a catalyst for growth, empowering communities to engage in transformative technology development.
A critical reevaluation of past assumptions in AI training suggests opportunities for novel strategies that leverage global participation and diverse perspectives.
Deep dives
The Current State of Open Source AI
The discussion centers on the hypothetical scenario where companies might cease to release open-source AI models, emphasizing the significant challenges that would arise in the open-source AI community. Chief among these challenges is the requirement for GPUs to train models in a co-located setting, usually within a single data center, which contrasts sharply with the decentralized nature of the open-source movement. Historically, this paradigm dates back to past assumptions made in the early development of AI technologies, which have since constrained innovation. By reevaluating these assumptions, the conversation suggests that there are opportunities for growth and collaboration in developing AI technologies that are more open and accessible.
Noose Research's Objectives and Innovations
Noose Research aims to democratize access to cutting-edge AI by ensuring that both the technology itself and the underlying code remain open-source. This goal stems from the belief that open-source innovation serves as a multiplier effect within the technology stack, enabling individuals, particularly beginners, to engage with transformative technologies. Their efforts are exemplified by their Hermes models, designed to empower users by allowing them to adopt any persona the user sees fit. As researchers, they focus on fundamental research questions, exploring diverse alternatives rapidly to push the boundaries of what can be achieved with minimal computational resources.
The Distro Algorithm: A Game Changer
The introduction of Distro marks a significant advancement in the ability to train AI models over distributed infrastructure utilizing standard, everyday internet connections. This algorithm boasts a staggering reduction in bandwidth requirements, needing 857 times less bandwidth than traditional methods, thus leveling the playing field for smaller teams and projects. The innovative nature of Distro lies in its capability to support training without the necessity of high-speed interconnects between GPUs, suggesting a shift from centralized training environments to more distributed networks. This could mimic the successful approaches taken by initiatives like SETI@home in other fields, fostering a broader scope for innovation in AI.
Challenges and Assumptions in Training AI Models
A critical examination reveals that many existing assumptions in the AI field are relics of earlier methodologies, particularly those from the '90s. The current state of training methodologies often assumes that all GPUs participating in training must work in close proximity, but emerging research indicates there may be more effective approaches. This reevaluation of previous practices allows researchers to develop novel strategies that leverage diverse perspectives and a more individualized approach to training AI models. As discussions evolve, the potential for breakthrough innovations becomes evident, demonstrating that the AI field is ripe for exploration and experimentation.
The Future of Collaborative AI Development
The discussion features a vision for a future where global participation can contribute to the development of AI models, transcending traditional centralized infrastructures. By allowing individuals to contribute their computing resources, there is a potential for creating a more inclusive platform for AI research and development. This model envisions everyone having the opportunity to train a collective AI, moving towards a decentralized framework where all can benefit from advancements without gatekeeping. As the community mobilizes, the advancement of collaborative efforts may not only ensure broader access to AI but also set the stage for future innovations that respond to a wide range of needs and aspirations.
In this episode of AI + a16z, Bowen Peng and Jeffrey Quesnelle of Nous Research join a16z General Partner Anjney Midha to discuss their mission to keep open source AI research alive and activate the community of independent builders. The focus is on a recent project called DisTrO, which demonstrates it's possible to train AI models across the public internet much faster than previously thought possible. However, Nous is behind a number of other successful open source AI projects, including the popular Hermes family of "neutral" and guardrail-free language models.
Here's an excerpt of Jeffrey explaining how DisTrO was inspired by the possibility that major open source AI providers could turn their efforts back inward:
"What if we don't get Llama 4? That's like an actual existential threat because the closed providers will continue to get better and we would be dead in the water, in a sense.
"So we asked, 'Is there any real reason we can't make Llama 4 ourselves?' And there is a real reason, which is that we don't have 20,000 H100s. . . . God willing and the creek don't rise, maybe we will one day, but we don't have that right now.
"So we said, 'But what do we have?' We have a giant activated community who's passionate about wanting to do this and would be willing to contribute their GPUs, their power, to it, if only they could . . . but we don't have the ability to activate that willingness into actual action. . . . The only way people are connected is over the internet, and so anything that isn't sharing over the internet is not gonna work.
"And so that was the initial premise: What if we don't get Llama 4? And then, what do we have that we could use to create Llama 4? And, if we can't, what are the technical problems that, if only we slayed that one technical problem, the dam of our community can now flow and actually solve the problem?"