AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Science of Tokenization
There's a whole area of like natural language processing around that called tokenization. It's just a much harder problem in Japanese right and Chinese too because Chinese doesn't have spaces either. You know the route I should have gone was using one of the existing things like me tab is what's the popular one at the time. The existing open source libraries for this but I ended up the trick is that I really wanted it to be really tightly integrated with the dictionary for example.