Uploaded image for project: 'ICU'
  1. ICU-6068

Regex, behavior of \cx (Control-X) different from Java and Perl

    Details

    • Type: Bug
    • Status: Accepted (View workflow)
    • Priority: minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: regexp
    • Labels:
      None
    • Time Needed:
      Hours
    • tracOwner:
      andy
    • tracProject:
      ICU4C
    • tracReporter:
      andy
    • tracStatus:
      accepted
    • tracWeeks:
      0.1

      Description

      \cX in a regular expression pattern means Control-X

      ICU ands X with 0x1f, using the ICU Unescape() function.

      Java XORs X with 0x40. This gives the same result for A-Z, but differs for everything else.

      PCRE first uppercases X, then XORs with 0x40. I don't know whether the upper-casing works outside of ASCII range.

      Perl uppercases, then XORs with 0x40. Non-ASCII behavior is unknown.

      Perl and Java differing may be a Java bug. Ask Sun.

        Attachments

          Activity

            People

            • Assignee:
              andy.heninger Andy Heninger
              Reporter:
              andy.heninger Andy Heninger
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                tracCreated: