The collation element iterator currently returns primary+secondary+case+tertiary
level data. It also needs to return quaternary-level data (for shifted and
Hiragana). Add a function that returns all the bits. Reserve some bits for
further, future levels.
No need to return the identical level because that is just the NFD form of the
12/24/02 17:45:37 hshih changed notes2
04/14/03 19:09:12 ram changed notes2
05/30/03 11:34:49 hshih changed notes2
06/16/03 18:03:03 hshih changed notes2
02/05/04 19:57:23 weiv changed notes2
02/06/04 16:31:14 weiv changed notes2
02/17/04 03:32:57 weiv changed notes2
Tue Sep 27 10:12:31 2005 weiv changed notes2: target: "3.0" to "UNSCH", xref: "3536" to "3536 4782", comments: "
" to "",
Setting priority=zero because in ten years no one seems to have found a need for this. The collation element iterator is used in string search (e.g., web browser ctrl-F in-page search), and the tendency there is towards ignoring lower-level differences.
The collation element iterator should also do at least some pre-processing according to the Collator attributes, e.g., alternate=shifted blanking levels 1..3, upperFirst inverting the case weights.
This would also be useful because an API that returns (partially processed) weights would make it easier to change the bit fields of a collation element integer. We should deprecate the API that returns 32-bit integers.
Replying to (Comment 3 markus):
... some pre-processing according to the Collator attributes, e.g., alternate=shifted blanking levels 1..3, ...
Complication: With alternate=shifted, primary ignorables become completely ignorable after a shifted CE. When iterating backwards and we get a primary ignorable, we will have to iterate further until we get the next primary CE, buffer the intervening primary CEs with their source indexes, and discard them if the primary CE is variable. We could leave this to the caller, but then they have to do this same processing.
The Boyer-Moore String Search implementation might go away or might be redone. We might be able to limit a new collation iterator API to only iterating forward, or only returning primary weights when going backwards. It may be even better to encapsulate the key/pattern in a class that holds the internal CE representation and has a "matchesAt" function that checks for a match from a starting point in the text. If we can limit backward iteration to reading collation grapheme clusters and returning whether there is any primary weight, we might not need to expose CEs and weights on a new API at all.