NUS Research started as a collaborative name among a few individuals passionate about open-source language models and evolved into an open-source research organization with over 40 contributors, focusing on the development of innovative language models like Hermes, Yarn, and Capybara.
NUS Research pioneers the use of data distillation and data synthesis techniques, such as compressing large models' knowledge into a more concise format and generating synthetic data for training smaller models, which have shown significant boosts in performance and unlock further potential in the field of language models.
Deep dives
The Journey of NUS Research
NUS Research is an open-source research organization that is also a company focused on language models. The organization started as a collective of individuals passionate about open-source language models. They began by tinkering with models like GPT2 and LAMA and experimenting with fine-tuning and data synthesis. As they gained more experience and expertise, they gained attention for their models and started working on more concrete projects. What began as a collaborative name among a few people evolved into an open-source research org with over 40 contributors. Their models, such as Hermes, Yarn, and Capybara, have gained popularity and have become innovative examples in the field. They continue to prioritize open-source research and development while also exploring opportunities as a company.
Exploring Synthetic Data and Distillation
NUS Research focuses on the creation and use of synthetic data for training language models. Synthetic data refers to data generated by other language models or AI systems that can be used to train smaller models. This approach is helpful when working with limited computing resources and enables the training of models that can compete with larger, more computationally intensive models. NUS Research pioneers the use of data distillation, which involves compressing large models' knowledge into a more concise format that can be used for training smaller models. This technique has shown significant boosts in performance. They also emphasize the importance of exploring novel methods for distillation and data synthesis to unlock further potential in the field.
Advancements and Challenges in Fine-Tuning
NUS Research highlights various techniques and advancements in fine-tuning language models. They suggest paying attention to hyperparameters, as they play a crucial role in model performance. Additionally, they encourage training models for more tokens and epochs if overfitting is not a concern. The use of model merging, soft prompting, and activation hacking are also effective approaches. These techniques involve combining models, compressing prompts, and manipulating model vectors to shape outputs. NUS Research believes that further advancements in sampling methods, allowing for better token selection, could bring significant changes to language models. They also encourage researchers and practitioners to continue exploring and experimenting with fine-tuning methods to unlock new possibilities.
Future Directions and Values
Moving forward, NUS Research aims to empower individuals by prioritizing locality, offline capabilities, and the ability for users to run models on their own devices. They are committed to the open-source community and view their corporate branch as a means to enhance and support open-source work, rather than restricting it. They plan to provide tools and services that complement the open-source models they develop, with the goal of benefiting the entire community. NUS Research values innovation, accessibility, and advancing the field of language models, while remaining rooted in the passion and ethos of the collective AI community.
Nous Research has been pumping out some of the best open access LLMs using SOTA data synthesis techniques. Their Hermes family of models is incredibly popular! In this episode, Karan from Nous talks about the origins of Nous as a distributed collective of LLM researchers. We also get into fine-tuning strategies and why data synthesis works so well.
Changelog++ members save 2 minutes on this episode because they made the ads disappear. Join today!
Sponsors:
Read Write Own – Read, Write, Own: Building the Next Era of the Internet—a new book from entrepreneur and investor Chris Dixon—explores one possible solution to the internet’s authenticity problem: Blockchains. From AI that tracks its source material to generative programs that compensate—rather than cannibalize—creators. It’s a call to action for a more open, transparent, and democratic internet. One that opens the black box of AI, tracks the origins we see online, and much more. Order your copy of Read, Write, Own today at readwriteown.com
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.