UTS #35 unit identifier text is sloppy
Description
Activity
much improved
Suggestions 1 through 4 look good to me, and if ambiguous productions are intentional then I’m willing to withhold judgment for now.
BTW, I realize this is short notice but we’re trying to get the spec to beta for v40 on Wednesday, so if you have a chance to look at this on Monday, I’d appreciate it.
Looking this over, here’s what I’m thinking of to address this:
https://unicode.org/reports/tr35/tr35-general.html#Unit_Identifiers
We need to define the core unit identifiers first, then define the long unit identifiers, and make it clearer what the relation is (the long identifiers just add a “grouping” prefix used in the survey tool).
Will add here (as well as in other places) that identifier is often abbreviated as ID.
Both need clear definitions. ‘complex unit’ is defined in P2, but only used in P6. It is a useful term for “mixed unit that consists of two or more simple units linked by one or more ‘-and-'”. A compound unit needs to be clearly defined, and added to the larger table at the top of https://unicode.org/reports/tr35/tr35-general.html#Unit_Identifiers (the format of both of the tables got a bit mangled translating to Markdown, and needs also to be fixed.)
The breakdown is:
mixed unit encompasses simple units and complex units (simple units linked with -and-)
compound unit is a unit that is not mixed unit, so neither a simple unit nor a mixed unit. It has a product, or a per, or a dimensionality, or a prefix.
Unresolved references. Here’s what I found
single-unit should be single_unit (we also need both single_unit and simple_unit examples up top).
type is undefined: it probably should be changed to something like grouping. The definition will be: Prefix for units currently used in the CLDR Survey tool and in translations to group conceptually similar units. The long unit identifier are unnecessary in CLDR data, and planned to be replaced by the short unit identifier.
Ambiguous productions. A mixed unit can be a single_unit, and a core unit can be a single_unit, but I don’t see why this is an issue, so can you explain more? It is on purpose, because mixed units can be used for some purposes (eg foot or foot-and-inch) and core units can be used in some places (eg foot or square-foot). Some units (eg foot) are both mixed and core, and can be used in both places.
In contrast with the generally high quality of Unicode Technical Reports, the text of UTS #35 Part 2 has some problems when it comes to units.
Section 6.2 introduces terms like "long unit identifier" and "core unit identifier" with examples, but never actually defines them.
"identifier" is then replaced with "ID" in subsequent use without explanation or warning.
"complex unit" is introduced, but never used... "compound unit" is used but lacks a satisfactory definition, and I cannot tell whether or not the terms are intended as synonyms.
The formal syntax of unit_identifier and subordinate tokens includes what appear to be unresolved references (most notably "type") and ambiguous productions (e.g.,
unit_identifier → mixed_unit_identifier → (single_unit | pu_single_unit) ("-and-" (single_unit | pu_single_unit ))*
[note the optionality of "-and-…"] vs.unit_identifier → core_unit_identifier → product_unit ("-per-" product_unit)*
withproduct_unit → single_unit ("-" single_unit)* ("-" pu_single_unit)*
).