We're updating the issue view to help you get more done. 

Add Charset Detection for EBCDIC Codepages

Description

Filed on behalf of Katsuhiko Masuda:

ICU4C and ICU4J should have a capability of detecting EBCDIC codepage(s) for strings, as the same as PC codepage and unicode strings. Right now, the ICU (V4.0) does not have any data to detect EBCDIC codepages, and we get no proper codepage candidates for EBCDIC encoded strings.

All EBCDIC codepages (including DBCS) should be supported. ibm-37 ibm-273 ibm-277 ibm-278 ibm-280 ibm-284 ibm-285 ibm-290 ibm-297 ibm-420 ibm-424 ibm-500 ibm-803 ibm-838 ibm-870 ibm-871 ibm-875 ibm-918 ibm-930 ibm-931 ibm-932 ibm-933 ibm-939 ibm-1025 ibm-1097 ibm-1112 ibm-1122 ibm-1123 ibm-1130 ibm-1132 ibm-1137 ibm-1153 ibm-1154 ibm-1156 ibm-1157 ibm-1158 ibm-1160 ibm-1164 ibm-1364 ibm-1390 ibm-1399 ibm-4517 ibm-4899 ibm-4971 ibm-5026 ibm-5035 ibm-8482 ibm-9030 ibm-9066 ibm-9067 ibm-12712 ibm-16684 ibm-16804 ibm-20780

And the associated languages would be in the following support priority order:
Chinese (simplified) Chinese (traditional) Korean Portuguese(Brazil) Japanese German French Italian Spanish Thai Arabic Turkish Danish Finnish Norwegian(Bokmal) Swedish Catalan Dutch Greek Hebrew Portuguese(Portugal) Bulgarian Croatian Czech Hungarian Polish Romanian Russian Slovak Slovenian Ukrainian Icelandic2 Assamese Bengali Gujarati Hindi Indonesian Kannada Konkani Malay Malayalam Marathi Nepali Oriya Punjabi Sinhala Tamil Telugu Vietnamese Urdu Afrikaans Albanian Armenian Azerbaijani(Latin) Belarusian Estonian Georgian Kazakh Latvian Lithuanian Macedonian Serbian (Cyrillic) Serbian (Latin) Swahili Welsh Maltese.

Status

Assignee

mow@icu-project.org

Reporter

TracBot

Labels

None

Reviewer

None

Time Needed

None

Start date

None

Components

Priority

assess