64-bit regex APIs

Description

The regex implementation is 64-bit index capable internally as of ICU 4.4 with UText-regex implementation, but several regex/uregex API are still restricted to 32 bit indexes.
On Mac OS X, significant clients of ICU also use 64bit indexes, but the regex API have a bit of a chokehold on this data type.

Beyond data type passthru, there's the more practical concern over whether it makes sense to deal with 64bit indexable text. To that end, we've already introduced a FindProgress callback (see ticket #7666) in ICU 4.4 and allow subsequent match/reset operations to succeed after an interrupted/cancelled match. More work coming on this front, such as mechanism to set both region and starting position independently, but looks like 64bit is here

Activity

Show:
TracBot
July 1, 2018, 12:07 AM
Trac Comment 5 by mishonok—2010-07-24T00:21:06.647Z

Sent proposal for review to icu-core, neglected icu-design

TracBot
July 1, 2018, 12:07 AM
Trac Comment 6 by anonymous—2010-07-28T18:40:44.884Z

64bit Regex Index access

regex.h

Modified public methods:
UBool RegexMatcher::find(int32_t start... -> UBool RegexMatcher::find(int64_t start...
UBool RegexMatcher::lookingAt(int32_t start... -> UBool RegexMatcher::lookingAt(int64_t start...
UBool RegexMatcher::matches(int32_t start... -> UBool RegexMatcher::matches(int64_t start...
RegexMatcher &RegexMatcher::region(int32_t start, int32_t limit... -> RegexMatcher &RegexMatcher::region(int64_t start, int64_t limit...
RegexMatcher &RegexMatcher::reset(int32_t position... -> RegexMatcher &RegexMatcher::reset(int64_t position...

New public methods:
virtual int64_t start64(UErrorCode &status) const;
virtual int64_t start64(int32_t group, UErrorCode &status) const;
virtual int64_t end64(UErrorCode &status) const;
virtual int64_t end64(int32_t group, UErrorCode &status) const;
virtual int64_t regionStart64() const;
virtual int64_t regionEnd64() const;

uregex.h

API that will not be changed to access with 64bit indices ... i.e. use the new UText API instead:
For 64bit
uregex_setText uregex_setUText
uregex_getText uregex_getUText
uregex_group uregex_groupUText
uregex_replaceAll uregex_replaceAllUText
uregex_replaceFirst uregex_replaceFirstUText
uregex_appendReplacement uregex_appendReplacementUText
uregex_appendTail uregex_appendTailUText
uregex_group uregex_groupUText

API that take int32_t parameter(s) or return such will acquire 64bit versions: Add new xxx64 API
For 64bit
uregex_matches uregex_matches64
uregex_lookingAt uregex_lookingAt64
uregex_find uregex_find64
uregex_reset uregex_reset64
uregex_setRegion uregex_setRegion64
uregex_start uregex_start64
uregex_end uregex_end64
uregex_regionEnd uregex_regionEnd64
uregex_regionStart uregex_regionStart64

TracBot
July 1, 2018, 12:07 AM
Trac Comment 7 by mishonok—2010-09-17T18:41:46.291Z

Merged changes from trunk (r28320:28640) into branch at icu/branches/mishonok/regex-NativeIndex64bit
Committed revision 28641.

TracBot
July 1, 2018, 12:07 AM
Trac Comment 9 by mishonok—2010-09-18T03:36:17.649Z

Committed all regex changes together under this bug. For reference, here is the whole set of approved tickets:

7813: 64bit regex API

7675: UText-based Regex to use native indexes

7764: Improved UText-regex API error handling

7855: UText regex group API returns shallow clone

7851: Set region and start position

7763: Inline regex progress callback function.

TracBot
July 1, 2018, 12:07 AM
Trac Comment 11 by —2010-09-24T05:42:45.765Z

Review Comments:

Fixed

Assignee

TracBot

Reporter

TracBot

Components

Labels

None

Reviewer

None

Priority

medium

Time Needed

None

Fix versions