Super Data Science: ML & AI Podcast with Jon Krohn cover image

801: Merged LLMs Are Smaller And More Capable, with Arcee AI's Mark McQuade and Charles Goddard

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Efficient Training of Language Models with Sparse Upcycling and MergeKit

The chapter explores the concept of mixtures in a model and the merging hack within architecture, highlighting the implementation of a mixture model using sparse upcycling and its benefits. It discusses the efficiency of sparse models like mixture of experts and introduces tools like MergeKit for merging large language models without costly GPU hardware, ultimately focusing on enhancing training methods for language models.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app