Launched cover image

63: Nihongo - Chris Vasselli

Launched

CHAPTER

The Science of Tokenization

There's a whole area of like natural language processing around that called tokenization. It's just a much harder problem in Japanese right and Chinese too because Chinese doesn't have spaces either. You know the route I should have gone was using one of the existing things like me tab is what's the popular one at the time. The existing open source libraries for this but I ended up the trick is that I really wanted it to be really tightly integrated with the dictionary  for example.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner