Regex crash due to writing past end of fSmallData[]

Description

In RegexMatcher we have instance variables

And a method void RegexMatcher::init2(UText *input, UErrorCode &status) which sets up fData to use fSmallData if it is adequate, otherwise it mallocs memory and points fData to that. The current computation for this (lines 246-247 in i18n/rematch.cpp) is:

but of course "sizeof(fSmallData)/sizeof(int32_t))" is now wrong, and makes fSmallData seems twice as big as it is. Not changing this to match the new int64_t[] type for fData was an unfortunate omission in my 64-bit alignment fixes for regex in ICU 4.4 (part of #4521). Even better than using sizeof(int64_t) would be the following, to avoid problems of this sort in the future:

With the current code, compiled pattern data is written past the end of fData, overwriting other stuff (like stuffing 1 into where an address should go). This causes crashes, as Michael Grady discovered debugging crashes with long expressions such as "((https?\\:\\/\\/|www\\.)\\S+(?<

\\),\\.:;\\]
u0080\\uFFFF])|(?<

[A-Za-z0-9])[\\##][A-Za-z0-9_][A-Za-z0-9_\\u00c0-\\u00d6\\u00c8-\\u00f6\\u00f8-\\u00ff]*|
$[A-Za-z]+)".

Fixed

Assignee

Peter Edberg

Reporter

Peter Edberg

Components

Labels

None

Reviewer

None

Priority

blocks-release

Time Needed

None

Fix versions