Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The onigiri/jelly donut thing is not particularly uncommon if you try to shove complex text or dialogue through Google Translate and similar MTL systems, because they are trained heavily on existing text. If the japanese string you plug in is uncommon enough you may get back imageboard slang, text from video game wikis, or racial slurs.


I'm pretty sure the onigiri/jelly donut thing only happened because 4Kids didn't think American children would know what onigiri was - they were infamous for their terrible localization.

Google Translate translates it more accurately to "rice balls" today - something as simple as that wouldn't be a problem for machine translation.


That's assuming the training set is all of the media that mainstream Google Translate was trained on. The paper talks about more specialized training sets so it's quite possible something like onigiri could end up under-represented (though I suspect it's common enough in manga that it would be something more obscure). Naturally the "jelly donut" part would only slip in if they used untrustworthy data like forum posts or 4kids localizations, but that's exactly the kind of data that Google Translate and similar algos scrape from the internet for training.


That means an overreliance on the word bruh if we go by current standards. Hopefully it does not come to that but if current AI trends is anything to go by...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: