For SurveyTool for CLDR 1.8 submissions there is a new feature that
tries to encourage casing (uppercase/lowercase) consistency within
certain sets of locale data within each locale.
However, this approach, even if it can be slightly improved, has
a number of problems.
1) It is within each locale only. There is no comparison to other
locales that may have similar rules (and more generally, similar
translations). The only other locale data compared to is English,
and English has rather unique rules for casing.
2) For some datasets (in particular currencies and eras) it may
well be the case that the data for a locale may quite appropriately
have casing variations.
3) Using English only as model language gives the impression that
most things are supposed to be given with an initial capital letter,
especially since the English data inappropriately uses uppercase initial
also for words that are not normally written with uppercase initial.
This is very likely the root cause for many of the casing problems
that are present in the CLDR data.
4) Giving warnings for casing "problems" as done now often gives the
warning inappropriately for data that is cased properly. This may lead
to inappropriate changes.
Instead, I would suggest not using the current approach to casing tests,
not even a slightly improved version.
In its place I would suggest the following:
A) Allow SurveyTool users to set model language as a user preference.
English is used as default model language (but can be chosen
explicitly, explicit and default English are different, see below).
B) Having chosen a model language, the "currently winning" data items
for that language are used instead of English data for the data to
C) Having chosen a model language explicitly, enables the per item casing
test (can be disabled as a user preference) if both the model and the
target locales's scripts are cased.
This casing test can be turned off completely as a user preference.
The casing test compares the case of the first letter of the model
language data item (the currently winning one) and the case of the first
letter of the targe language data item (the currently winning one).
If different, and the test has not been disabled (per user preference),
a mild warning on casing mismatch is given. It should be made clear that
that this is only a raw test, and need not indicate an actual error.
D) In addition to using default English, users should be encouraged to
select a model language related to the target language, to not only to
make casing comparisons, but also translation comparisons, in order
to try to consolidate translations.
Note that this bug report supersedes bug report 1693.