Provide coverage level info in cldrtoicu tool.

Description

In CLDR, there is substantial savings in size if the comprehensive-level paths are excluded. For example, /main/ drops from 37.7 MB to 31.4 MB in the CLDR source. For example, here are just a few of the data values omitted from fr.xml if this is done.

<language type="glk">gilaki</language>
<language type="gmh">moyen haut-allemand</language>
<language type="gn">guarani</language>
<language type="goh">ancien haut allemand</language>
<language type="gom">konkani de Goa</language>
<language type="gon">gondi</language>
<language type="gor">gorontalo</language>
<language type="got">gotique</language>
<language type="grb">grebo</language>
<language type="grc">grec ancien</language>

Any filtering, however, should be done on the ICU side, which requires the information as to the coverage level. The internal CLDR api for that is:

Level coverage = SDI.getCoverageLevel(xpath, localeId);
where
static final SupplementalDataInfo SDI = CLDRConfig.getInstance().getSupplementalDataInfo();

If we expose the coverage level info in the CLDR API, then the conversion tool could use it to build a version of ICU with a reduction in data size. Of course, a particular implementation might want to retain some types of paths, and they could do additional tests to retain what they want.

Status

Assignee

David Beaumont

Reporter

Mark Davis

Labels

None

Reviewer

None

Time Needed

Days

Start date

None

Components

Fix versions

Priority

major