Launched cover image

63: Nihongo - Chris Vasselli

Launched

00:00

The Science of Tokenization

There's a whole area of like natural language processing around that called tokenization. It's just a much harder problem in Japanese right and Chinese too because Chinese doesn't have spaces either. You know the route I should have gone was using one of the existing things like me tab is what's the popular one at the time. The existing open source libraries for this but I ended up the trick is that I really wanted it to be really tightly integrated with the dictionary  for example.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app