
801: Merged LLMs Are Smaller And More Capable, with Arcee AI's Mark McQuade and Charles Goddard
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Efficient Training of Language Models with Sparse Upcycling and MergeKit
The chapter explores the concept of mixtures in a model and the merging hack within architecture, highlighting the implementation of a mixture model using sparse upcycling and its benefits. It discusses the efficiency of sparse models like mixture of experts and introduces tools like MergeKit for merging large language models without costly GPU hardware, ultimately focusing on enhancing training methods for language models.
Transcript
Play full episode