We're updating the issue view to help you get more done. 

Consider defaulting to UTF-8 on Linux

Description

I'm using Ubuntu Linux 8.04 (the latest distribution at this time). When I install every single available locale, they are all UTF-8 based locales, except for the C and POSIX locales. Some other recent Linux distributions are also removing the non-UTF-8 based locales.

Currently, int_getDefaultCodepage in putil.c can't call setlocale(LC_CTYPE, "") to force nl_langinfo to provide the actual codepage being used. This is because this call is not thread safe, and we can't force users to call this function or expect them to call the function in a thread safe manner. This is why the only setlocale(LC_CTYPE, NULL) is used instead. Unfortunately, this prevents the correct codepage from being detected. So if zh_CN is used instead of zh_CN.utf8, ICU defaults to US-ASCII instead of UTF-8, which usually isn't helpful.

This scenario is similar to what happens on Mac OS X, but instead of nl_langinfo returning US-ASCII, Mac OS X returns "" from nl_langinfo. There is no perfect solution to fix this problem, but as more Linux distributions default to UTF-8 locales, it might make more sense to have ICU default to UTF-8 on Linux too. When that happens, the U_LINUX section of remapPlatformDependentCodepage might want to remap US-ASCII to UTF-8 for the non-C/POSIX locales.

Environment

Status

Assignee

mow@icu-project.org

Reporter

George Rhoten

Time Needed

Hours

tracCreated

Oct 03, 2008, 2:07 AM

tracOwner

michaelow

tracProject

ICU4C

tracReporter

grhoten

tracResolution

fixed

tracReviewer

emmons

tracStatus

closed

tracWeeks

0.2

Components

Fix versions

Priority

medium