Getting extra characters when displaying Japanese Unicode characters

Description

We have a command line license acceptance tool that reads in license files in UTF-16 format. When using the 4.4 version of ICU, we are seeing different output from previous versions of ICU, for Japanese, Korean, Simplified Chinese, and Traditional Chinese. I'm attaching screenshot of a Japanese license display using the 4.4 version of ICU and a previous version. I can provide source for the tool if needed to help diagnose the issue, which includes our regression bucket.

Activity

Show:
TracBot
July 1, 2018, 9:53 AM
Trac Comment 1 by —2010-03-12T20:18:53.000Z

what is the text supposed to be? which ICU functions are called?
what platform is this on?

TracBot
July 1, 2018, 9:53 AM
Trac Comment 2 by vassil@63ab4e4d4e2312f9—2010-03-13T12:33:55.000Z

It looks like the problem is taking a (large) UnicodeString and outputting the contents to cout, as follows:

UnicodeString ustring = readUTF16file(filename);
cout << ustring << endl;

If however I iterate through the UnicodeString character by character, all the characters are fine.

TracBot
July 1, 2018, 9:53 AM
Trac Comment 3 by vassil@63ab4e4d4e2312f9—2010-03-13T13:05:25.000Z

With a large UnicodeString outputted to cout, it looks like it's every 200 characters that the problem occurs.

TracBot
July 1, 2018, 9:53 AM
Trac Comment 7 by —2010-04-26T18:56:08.000Z

We may want to add a more interesting test case later (the current test case reproduce the issue reported. However, internal buffer will be already filled-in full for every conversion iteration.)

Fixed

Assignee

mow@icu-project.org

Reporter

TracBot

Components

Labels

None

Reviewer

None

Priority

major

Time Needed

Days

Fix versions