Hyper parameters play a crucial role in model training, so it's important to research and determine the right hyper parameters for each model. Instead of adhering to set training times, it's advisable to continue training as long as overfitting is not occurring, if computational resources allow. Additionally, using a versatile trainer like Axolotl is recommended for various model architectures, despite potential bugs that may need to be addressed.
Nous Research has been pumping out some of the best open access LLMs using SOTA data synthesis techniques. Their Hermes family of models is incredibly popular! In this episode, Karan from Nous talks about the origins of Nous as a distributed collective of LLM researchers. We also get into fine-tuning strategies and why data synthesis works so well.
Leave us a comment
Changelog++ members save 2 minutes on this episode because they made the ads disappear. Join today!
Sponsors:
- Read Write Own – Read, Write, Own: Building the Next Era of the Internet—a new book from entrepreneur and investor Chris Dixon—explores one possible solution to the internet’s authenticity problem: Blockchains. From AI that tracks its source material to generative programs that compensate—rather than cannibalize—creators. It’s a call to action for a more open, transparent, and democratic internet. One that opens the black box of AI, tracks the origins we see online, and much more. Order your copy of Read, Write, Own today at readwriteown.com
- Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
Featuring:
Show Notes:
Something missing or broken? PRs welcome!