We're updating the issue view to help you get more done. 

Japanese segmentation improvement

Description

http://source.icu-project.org/repos/icu/icu/trunk/source/data/brkitr/dictionaries/cjdict.txt

The above page contains lists of words in Japanese (e.g. for segmentation). It needs some additional words. I would like you to add the word:

クライアント (client – as in client-server interaction).

This problem was discovered while appying ICU Japanese segmentation to the text at:

http://www.unicode.org/standard/translations/japanese.html

the segmenter could not parse the Japanese phrase:

クライアントサーバー (Client-Server)

because the dictionary file does not contain this often used word, クライアント. It does contain transliteration of words like "Ryan", "Bryant", etc. And this leads to a hilarious (but sad) segmentation of the above phrase. One quick fix is to add:

クライアント

to the above dictionary.
In the long run, it would be better if you used an open source Japanese dictionary sources such as Jim Breen's Internet Japanese data:

http://nihongo.monash.edu/japanese.html

Environment

Status

Assignee

Andy Heninger

Reporter

TracBot

Time Needed

Days

tracCc

andy,mark,markus

tracCreated

May 17, 2016, 10:19 PM

tracOwner

andy

tracProject

all

tracReporter

katmomoi@f74d39fa044aa309

tracStatus

accepted

tracWeeks

0.5

Components

Fix versions

Priority

medium