Invalid-free in icu::Locale::operator= in ASAN while locale internally is 157 long
Description
Activity
Markus Scherer March 17, 2021 at 4:55 PM
We use the “Design” state when we need to discuss the shape of an API, or an algorithm, or some architectural issue.

Frank Yung-Fong Tang March 17, 2021 at 4:41 AM(edited)
ok, I think I find the problem. it is not a one off issue. The problem is there are a baseName and fullName in the locale, and they could be the same if there are no extension. When there are extension, it need to be two different string. The logic to free the baseName was incorrect and assume if it is different than the fullName then it is an extra allocated member but not the one preallocate in the object (fullNameBuffer). but if the first extension is exceeding the fullNameBuffer size, then we allocate fullName and then leave the baseName still pointing to fullNameBuffer. For that case, we should not free the baseName

Frank Yung-Fong Tang March 17, 2021 at 1:50 AM
Simple test case
patch the file above and run the following (have to run exactly “./intltest collate/CollationTest/TestLongLocale” only)

Frank Yung-Fong Tang March 17, 2021 at 1:04 AM
Here is the stack track
INFO: Loaded 1 modules (131704 inline 8-bit counters): 131704 [0x55d50327eff0, 0x55d50329f268),
INFO: Loaded 1 PC tables (131704 PCs): 131704 [0x55d50329f268,0x55d5034a19e8),
storage/googlesql/testing/fuzzing/collator_fuzzer: Running 1 inputs 100 time(s) each.
Running: /tmp/testcase-5775807073615872
==629331==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7f0ac3577290 in thread T0
#0 0x55d5021166e2 in free third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:127:3
#1 0x55d5026f0186 in setToBogus third_party/icu/source/common/locid.cpp:1827:9
#2 0x55d5026f0186 in icu::Locale::operator=(icu::Locale const&) third_party/icu/source/common/locid.cpp:440:5
#3 0x55d5021fd20b in icu::CollationLoader::loadFromLocale(UErrorCode&) third_party/icu/source/i18n/ucol_res.cpp:242:12
#4 0x55d5027a80ae in icu::UnifiedCache::_get(icu::CacheKeyBase const&, icu::SharedObject const*&, void const*, UErrorCode&) const third_party/icu/source/common/unifiedcache.cpp:394:17
#5 0x55d5021ffdc0 in void icu::UnifiedCache::geticu::CollationCacheEntry(icu::CacheKeyicu::CollationCacheEntry const&, void const*, icu::CollationCacheEntry const*&, UErrorCode&) const third_party/icu/source/common/unifiedcache.h:234:8
#6 0x55d5021fc2f9 in getCacheEntry third_party/icu/source/i18n/ucol_res.cpp:467:12
#7 0x55d5021fc2f9 in icu::CollationLoader::loadTailoring(icu::Locale const&, UErrorCode&) third_party/icu/source/i18n/ucol_res.cpp:164:19
#8 0x55d5021a9a00 in icu::Collator::makeInstance(icu::Locale const&, UErrorCode&) third_party/icu/source/i18n/coll.cpp:467:40
#9 0x55d5021a9ed1 in icu::Collator::createInstance(icu::Locale const&, UErrorCode&) third_party/icu/source/i18n/coll.cpp:448:16

Frank Yung-Fong Tang March 17, 2021 at 12:29 AM
Internally sie-u-co-bcs-ukvsz-x-lvariant-e-uc-lm-x-mmcsie-u-gb-ucie-su6g-csue-lmcsue-lvariant-variant-cEzcx-steu-c-lmcsx-ucEzcsi-uZsiu-CEie-1g-su1gcEie-su6su6g-ubge-eieszni2 became sie__1G_C_CEIE_CEZCX_CSUE_E_EIESZNI2_GB_LM_LMCSUE_LMCSX_LVARIANT_MMCSIE_STEU_SU1GCEIE_SU6G_SU6SU6G_U_UBGE_UC_UCEZCSI_UCIE_UZSIU_VARIANT_X@collation=bcs-ukvsz
which is 157 long
Details
Details
Assignee

Reporter

This is discovered by a google internal fuzzer
The test case is
sie-u-co-bcs-ukvsz-x-lvariant-e-uc-lm-x-mmcsie-u-gb-ucie-su6g-csue-lmcsue-lvariant-variant-cEzcx-steu-c-lmcsx-ucEzcsi-uZsiu-CEie-1g-su1gcEie-su6su6g-ubge-eieszni2 as input which cause the internal locale to be 157 char long
Project: //storage/googlesql/testing/fuzzing
Fuzzing Engine: libFuzzer
Fuzz Target: collator_fuzzer
Job Type: libfuzzer_asan_storage-googlesql-testing-fuzzing_opt
Platform Id: linux
Crash Type: Invalid-free
Crash Address: 0x7f67b2e09e90
Crash State:
icu::Locale::operator=
icu::CollationLoader::loadFromLocale
icu::UnifiedCache::_get
Sanitizer: address (ASAN)
I will try to reproduce it