https://tools.ietf.org/html/rfc5646#section-2.1.1 does not have provision on the length of a language tag.
However, ICU locale id handling code has a fixed size buffer and cannot deal with too long language tags.
I think we should do the following:
Change class Locale to use growable buffers.
We should have a simple/fast/small implementation for most common IDs/tags, with an optional pointer to a larger structure.
Add class Locale getters that write to ByteSink rather than array+capacity.
Change ICU library code to use C++ Locale and write to a ByteSink which wraps CharString.
This will also eliminate the string-not-terminated problems.
The problem that prompted this ticket to be filed in the first place has now been resolved, ICU4C is now able to icu::Locale::forLanguageTag() and icu::Locale::toLanguageTag() the long language tags with long and large amounts of keywords problematic to the Chromium project:
https://cs.chromium.org/chromium/src/v8/test/intl/general/invalid-locale.js
Further work on improving the memory management in the ICU4C locale code will be tracked by ticket ICU-20158.