We have a command line license acceptance tool that reads in license files in UTF-16 format. When using the 4.4 version of ICU, we are seeing different output from previous versions of ICU, for Japanese, Korean, Simplified Chinese, and Traditional Chinese. I'm attaching screenshot of a Japanese license display using the 4.4 version of ICU and a previous version. I can provide source for the tool if needed to help diagnose the issue, which includes our regression bucket.
what is the text supposed to be? which ICU functions are called?
what platform is this on?
It looks like the problem is taking a (large) UnicodeString and outputting the contents to cout, as follows:
UnicodeString ustring = readUTF16file(filename);
cout << ustring << endl;
If however I iterate through the UnicodeString character by character, all the characters are fine.
With a large UnicodeString outputted to cout, it looks like it's every 200 characters that the problem occurs.
We may want to add a more interesting test case later (the current test case reproduce the issue reported. However, internal buffer will be already filled-in full for every conversion iteration.)