
141 - Building an open source LM, with Iz Beltagy and Dirk Groeneveld
NLP Highlights
How to Train a Model to Not Generate Toxic Content
The goal of the ALMO project is to keep academia and open source developers in the game. The approach we are taking right now is on the pre-training data side, we are filtering out significant percentage of the toxic content. This is another reason that open source language models should be an important thing because this is where people can start executing simpler methods that would do the same job. We see a lot of formerly public research disappearing behind closed doors and we're worried about it.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.