Problems with SHIFTED mode in StringSearch.
1. As far as I understand
(http://www.unicode.org/unicode/reports/tr10 ([ICU-unknown])/#Variable_Weighting)
SHIFTED mode should be default, but it is not in ICU.
2. Setting SHIFTED mode explicitly causes “infinite loop”
in StringSearch when searched string ends with pattern.
I tried both 2.6 and the latest 2.8 jars.
This problem stops me from adapting ICU for our projects.
Let me know if there is a standard way of bug reporting.
test-case
import java.text.StringCharacterIterator;
import com.ibm.icu.text.Collator;
import com.ibm.icu.text.RuleBasedCollator;
import com.ibm.icu.text.StringSearch;
public class ICUCollator{
public static void main(String argv[]){ indexesOf("d", "d"); }
static void indexesOf(final String text, final String pattern){
RuleBasedCollator m_collator = (RuleBasedCollator)Collator.getInstance();
m_collator.setAlternateHandlingShifted(true);// Ok without this line
final StringCharacterIterator target = new
StringCharacterIterator(text);
StringSearch m_search = new StringSearch(pattern, target, m_collator);
int startOffset = m_search.first();
}
}
Fix the loop bug. Refile the support for quatenary level in search.
Investigate claims. String search cannot do shifted, since iterators don't deal
with quaternary level.
Why are we not using SHIFTED as a default?
02/02/04 20:55:44 weiv changed notes2
02/02/04 20:55:44 weiv moved from incoming to collation
02/04/04 22:47:35 weiv changed notes2
02/06/04 15:00:08 weiv changed notes2
02/06/04 16:31:09 weiv changed notes2
02/06/04 17:01:13 weiv changed notes2
07/12/04 01:29:11 weiv changed notes2
Tue Sep 27 10:12:13 2005 weiv changed notes2: target: "3.2" to "3.6",
Tue Sep 27 10:12:27 2005 weiv changed notes2: xref: "2485" to "2485 4782",
Fri Oct 13 18:03:43 2006 andy changed notes2: assign: "weiv" to "andy", target: "3.6" to "3.8 Candidate",
Fri Oct 20 22:19:00 2006 andy changed notes2: assign: "andy" to "weiv",
The infinite loop is caused by the comparison of signed integers in the getCE method. As a result, the IGNORABLE collation element is returned incorrectly. During the comparison, the integers should be unsigned as in the ICU4C code.