How to Optimize a Text Prediction Task

The answer here is actually complicated and I think pretty interesting. So in practice for these sort of general text prediction tasks like whether it's translation or caption generation or something like that, you use edit distance because it's actually quite easy to write down. You basically just sort of decompose the standard string edit distance dynamic program. The worst case loss that you get for any single decision is one with one exception, which is if you accidentally produce end of sequence prematurely. That will obviously cost a lot if you've only produced one word and the actual thing should have 20 words but string edit distance is very easy to compute and it's pretty easy to optimize.

Play episode from 09:40

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app