Regex, behavior of \cx (Control-X) different from Java and Perl

Description

\cX in a regular expression pattern means Control-X

ICU ands X with 0x1f, using the ICU Unescape() function.

Java XORs X with 0x40. This gives the same result for A-Z, but differs for everything else.

PCRE first uppercases X, then XORs with 0x40. I don't know whether the upper-casing works outside of ASCII range.

Perl uppercases, then XORs with 0x40. Non-ASCII behavior is unknown.

Perl and Java differing may be a Java bug. Ask Sun.

Assignee

Andy Heninger

Reporter

Andy Heninger

Components

Labels

None

Reviewer

None

Priority

minor

Time Needed

Hours

Fix versions

None
Configure