replace UDHR test data with other texts

Description

We have some translations of the UDHR (Universal Declaration of Human Rights) in our repo for performance test data:

  1. https://github.com/unicode-org/icu/tree/main/icu4j/perf-tests/data/udhr

  2. https://github.com/unicode-org/icu/tree/main/icu4j/perf-tests/data/collation

There are some questions whether we have rights to do so.

We should look at the (kinds of) texts that ICU4X has used for testing.

can you please provide a link?

Activity

Show:
Shane Carr
December 27, 2023 at 6:21 PM

It’s a known issue (https://github.com/unicode-org/test-corpora/issues/2 ) which I haven’t been able to prioritize fixing. Just use text from before the translation stopped.

Chris Chapman
December 21, 2023 at 6:58 PM

I looked at a couple of examples from that repo, and it seems like the texts switch to English near the end.

Here’s an example where Arabic switches to English:
https://github.com/unicode-org/test-corpora/blob/main/gutenberg/Melville-2701/output/ar/7173506697041068518_2701-h-1.htm.html

…and another example where German switches to English:
https://github.com/unicode-org/test-corpora/blob/main/gutenberg/Melville-2701/output/de/7173506697041068518_2701-h-1.htm.html

Shane Carr
December 4, 2023 at 6:03 PM

Details

Assignee

Reporter

Components

Labels

Priority

Fix versions

Created December 4, 2023 at 5:23 PM
Updated May 1, 2025 at 3:59 PM