BERT vs Legal BERT - What's the Difference?

We've used both BERT and legal BERT to measure perplexity on our data set of cases. And legal BERT was doing a lot better, although it has never seen anything from those cases. So it's much better attuned to the domain of our data set. We're doing a small hyperparameter search around the sort of recommended setting for BERT. For the word vectors, so it's a bit more involved. Now we have a method of embedding each word in the statutes, the cases and the question. And so we embed words using that. Then we use a method from a 2017 paper from Sanjeev Aurora to compute an embedding for the sequence.

Play episode from 32:10

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app