Skip to:
We have some translations of the UDHR (Universal Declaration of Human Rights) in our repo for performance test data:
https://github.com/unicode-org/icu/tree/main/icu4j/perf-tests/data/udhr
https://github.com/unicode-org/icu/tree/main/icu4j/perf-tests/data/collation
There are some questions whether we have rights to do so.
We should look at the (kinds of) texts that ICU4X has used for testing.
@Shane Carr can you please provide a link?
It’s a known issue (https://github.com/unicode-org/test-corpora/issues/2 ) which I haven’t been able to prioritize fixing. Just use text from before the translation stopped.
@Shane Carr I looked at a couple of examples from that repo, and it seems like the texts switch to English near the end.Here’s an example where Arabic switches to English:https://github.com/unicode-org/test-corpora/blob/main/gutenberg/Melville-2701/output/ar/7173506697041068518_2701-h-1.htm.html …and another example where German switches to English:https://github.com/unicode-org/test-corpora/blob/main/gutenberg/Melville-2701/output/de/7173506697041068518_2701-h-1.htm.html
https://github.com/unicode-org/test-corpora
We have some translations of the UDHR (Universal Declaration of Human Rights) in our repo for performance test data:
https://github.com/unicode-org/icu/tree/main/icu4j/perf-tests/data/udhr
https://github.com/unicode-org/icu/tree/main/icu4j/perf-tests/data/collation
There are some questions whether we have rights to do so.
We should look at the (kinds of) texts that ICU4X has used for testing.
@Shane Carr can you please provide a link?