Practical AI

Data synthesis for SOTA LLMs

24 snips
Feb 6, 2024
Karan Malhotra, co-founder of Nous Research, dives into the exciting world of open-access language models. He shares the journey of Nous and their leading Hermes models, which leverage advanced data synthesis techniques. The conversation explores innovative fine-tuning strategies and the significance of collaboration in AI development. Karan emphasizes ethics and transparency in AI commercialization, while also discussing the role of open-source principles in empowering the broader AI community. This blend of insights makes for a fascinating listen!
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Noose Research Origin

  • Karan Malhotra's journey into AI began with GPT-2, fine-tuning it for creative writing.
  • This led to a deeper exploration of LLMs for learning and automation, eventually co-founding Noose Research.
INSIGHT

Synthetic Data and Distillation

  • Synthetic data, generated by other AIs, can be surprisingly effective for training smaller models.
  • This 'distillation' process compresses complex information, making it easier for smaller models to learn.
ANECDOTE

Google vs. OpenAI

  • Google's Bard allegedly violated OpenAI's TOS by training on their outputs, yet OpenAI didn't pursue legal action.
  • This highlights potential hypocrisy surrounding licensing and large language model training data.
Get the Snipd Podcast app to discover more snips from this episode
Get the app