
63: Nihongo - Chris Vasselli
Launched
The Science of Tokenization
There's a whole area of like natural language processing around that called tokenization. It's just a much harder problem in Japanese right and Chinese too because Chinese doesn't have spaces either. You know the route I should have gone was using one of the existing things like me tab is what's the popular one at the time. The existing open source libraries for this but I ended up the trick is that I really wanted it to be really tightly integrated with the dictionary for example.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.