NLP Highlights cover image

141 - Building an open source LM, with Iz Beltagy and Dirk Groeneveld

NLP Highlights

CHAPTER

How to Train a Model to Not Generate Toxic Content

The goal of the ALMO project is to keep academia and open source developers in the game. The approach we are taking right now is on the pre-training data side, we are filtering out significant percentage of the toxic content. This is another reason that open source language models should be an important thing because this is where people can start executing simpler methods that would do the same job. We see a lot of formerly public research disappearing behind closed doors and we're worried about it.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner