Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Failed for me. I use Chinese and Japanese. There are no spaces to split on. I also search code where this also fails. I know it was meant to be simple and illustrate some points. I think the point it fails at is that this is actually a much harder problem in real life than a simple 150 line solution suggests.


if you use a library for Chinese/Japanese tokenization (which is harder because the lack of space), it seems like the rest of the code would work?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: