UnicodeSet contains(String) processes the string character by character. It shouldn't do this. Example (that should be in test):
The above won't work if the implementation is done character by character.
Instead, it should use the matches interface, recursively.
07/21/02 14:52:44 mark moved from incoming to transliterate
07/26/02 17:56:39 mark changed notes
07/30/02 17:09:40 alan sent reply 1
08/30/02 17:54:30 alan changed notes2
08/30/02 17:54:30 alan changed notes
05/30/03 10:15:02 hshih changed notes2
05/30/03 10:15:02 hshih changed notes
10/28/03 17:28:09 andy changed notes2
10/28/03 17:28:09 andy changed notes
10/28/03 17:28:09 andy moved from transliterate to properties
02/05/04 15:38:13 alan changed notes2
02/05/04 15:38:13 alan changed notes
02/10/04 20:16:07 alan sent reply 2
02/10/04 20:16:55 alan changed notes2
02/10/04 20:16:55 alan changed notes
02/10/04 20:17:24 alan changed notes2
02/10/04 20:17:24 alan changed notes
07/08/04 13:11:43 schererm changed notes2
07/08/04 13:11:43 schererm changed notes
Thu Dec 2 11:39:57 2004 weiv changed notes2: assign: "alan" to "andy",
Mon Nov 14 11:18:26 2005 weiv changed notes2: xref: "" to "4923",
Mon Nov 14 11:39:47 2005 weiv changed notes2: comments: "
Could name the method index() or indexOf() –
" to "Could name the method index() or indexOf() –
",
Wed Nov 8 18:12:37 2006 emmons changed notes2: assign: "andy" to "mark",
Wed Nov 8 18:12:37 2006 emmons changed notes
I think this is a question of semantics.
contains("abc")
Currently means "Does the set contain the multi-character string 'abc', that is,
is it of the form [...{abc}...]?".
This is in line with several other API that take a single String argument. If
you look at the code you see identical structure. See add(String),
remove(String), and complement(String).
The proposed function is something different. It seems like more of a
sequential matching test, something like "spansSubstrings(String x)". The
semantics would be "Can string x be divided into one or more non-overlapping
contiguous substrings, each of length 1 or more, such that contains(a) is true
for each substring a?"
I propose that this be recast as an RFE for new API:
/**
Return an index into str of the first offset x >= start at
which the character str.char32 ()At or the string
str.substr(x, y) is contained in this set. If
isContained is false, return the first offset
at which the character or string is not contained
in this set. If no index matches, return -1.
*/
int index(String str, int start, boolean isContained);
Then the desired operation is really:
if (set.index(str, 0, false) >= 0) ...
(The next question is of course, is this API necessary? The justification for it is that UnicodeSet can implement this more easily than client code. The counter argument is that no one is asking for this, so why add it?)
Part of UnicodeSet.span port.
What's the semantics of UnicodeSet contains(String)?
In UnicodeSet.java: 1781
/**
Returns <tt>true</tt> if this set contains the given
multicharacter string.
@param s string to be checked for containment
@return <tt>true</tt> if this set contains the specified string
@stable ICU 2.0
*/
public final boolean contains(String s) {...}
and there is a test in UnicodeSetTest.java: 1211
public void TestContainsString() {
UnicodeSet x = new UnicodeSet("[a{bc}]");
if (x.contains("abc")) errln("FAIL");
}
which asserts the opposite.
Are you talking about a different API?