Port UnicodeSet.span work from C to Java

Description

UnicodeSet contains(String) processes the string character by character. It shouldn't do this. Example (that should be in test):

The above won't work if the implementation is done character by character.
Instead, it should use the matches interface, recursively.

Activity

Show:
TracBot
July 1, 2018, 9:26 AM
Trac Comment by auditor—1970-01-01T01:43:55.000Z
  • 07/21/02 14:52:44 mark moved from incoming to transliterate

  • 07/26/02 17:56:39 mark changed notes

  • 07/30/02 17:09:40 alan sent reply 1

  • 08/30/02 17:54:30 alan changed notes2

  • 08/30/02 17:54:30 alan changed notes

  • 05/30/03 10:15:02 hshih changed notes2

  • 05/30/03 10:15:02 hshih changed notes

  • 10/28/03 17:28:09 andy changed notes2

  • 10/28/03 17:28:09 andy changed notes

  • 10/28/03 17:28:09 andy moved from transliterate to properties

  • 02/05/04 15:38:13 alan changed notes2

  • 02/05/04 15:38:13 alan changed notes

  • 02/10/04 20:16:07 alan sent reply 2

  • 02/10/04 20:16:55 alan changed notes2

  • 02/10/04 20:16:55 alan changed notes

  • 02/10/04 20:17:24 alan changed notes2

  • 02/10/04 20:17:24 alan changed notes

  • 07/08/04 13:11:43 schererm changed notes2

  • 07/08/04 13:11:43 schererm changed notes

  • Thu Dec 2 11:39:57 2004 weiv changed notes2: assign: "alan" to "andy",

  • Mon Nov 14 11:18:26 2005 weiv changed notes2: xref: "" to "4923",

  • Mon Nov 14 11:39:47 2005 weiv changed notes2: comments: "

  • Could name the method index() or indexOf() –

  • " to "Could name the method index() or indexOf() –

  • ",

  • Wed Nov 8 18:12:37 2006 emmons changed notes2: assign: "andy" to "mark",

  • Wed Nov 8 18:12:37 2006 emmons changed notes

TracBot
July 1, 2018, 9:26 AM
Trac Comment by alan.liu@a95c9666650cfc8d—2002-07-31T00:09:40.000Z

I think this is a question of semantics.

contains("abc")

Currently means "Does the set contain the multi-character string 'abc', that is,
is it of the form [...{abc}...]?".

This is in line with several other API that take a single String argument. If
you look at the code you see identical structure. See add(String),
remove(String), and complement(String).

The proposed function is something different. It seems like more of a
sequential matching test, something like "spansSubstrings(String x)". The
semantics would be "Can string x be divided into one or more non-overlapping
contiguous substrings, each of length 1 or more, such that contains(a) is true
for each substring a?"

TracBot
July 1, 2018, 9:26 AM
Trac Comment by alanliu@63ab4e4d4e2312f9—2004-02-11T03:16:07.000Z

I propose that this be recast as an RFE for new API:

/**

  • Return an index into str of the first offset x >= start at

  • which the character str.char32 ()At or the string

  • str.substr(x, y) is contained in this set. If

  • isContained is false, return the first offset

  • at which the character or string is not contained

  • in this set. If no index matches, return -1.
    */
    int index(String str, int start, boolean isContained);

Then the desired operation is really:

if (set.index(str, 0, false) >= 0) ...

(The next question is of course, is this API necessary? The justification for it is that UnicodeSet can implement this more easily than client code. The counter argument is that no one is asking for this, so why add it?)

TracBot
July 1, 2018, 9:26 AM
Trac Comment 6 by —2007-09-26T23:21:22.000Z

Part of UnicodeSet.span port.

TracBot
July 1, 2018, 9:26 AM
Trac Comment 10 by zhou—2009-10-13T18:49:49.000Z

What's the semantics of UnicodeSet contains(String)?

In UnicodeSet.java: 1781

/**

  • Returns <tt>true</tt> if this set contains the given

  • multicharacter string.

  • @param s string to be checked for containment

  • @return <tt>true</tt> if this set contains the specified string

  • @stable ICU 2.0
    */
    public final boolean contains(String s) {...}

and there is a test in UnicodeSetTest.java: 1211

public void TestContainsString() {
UnicodeSet x = new UnicodeSet("[a{bc}]");
if (x.contains("abc")) errln("FAIL");
}

which asserts the opposite.

Are you talking about a different API?

Fixed

Assignee

TracBot

Reporter

TracBot

Components

Labels

Reviewer

None

Priority

major

Time Needed

Days

Fix versions