Adjust Belarusian-speaking population for Belarus
Description
is duplicated by
Activity
Native tongue or language at home? Since the goal is CLDR is to reflect Common usage (hence the C in CLDR) I choose to use the table reflecting how people use the language – regardless of which they are ancestrally tied to.
Neither native tongue nor language at home allow multiple selections (pretty much by definition), yet data for other countries obviously does (Belgium sums to 189% according to the comment in the PR). So using data that only allows single selection is inconsistent with other countries. I also wouldn’t equate “language at home” with “Common usage”, people speaking one language at home might very well commonly use other languages (workplace, media, etc).
Re-opening this because of discussion on the PR https://github.com/unicode-org/cldr/pull/4398
I know it’s disappointing but I’m not yet convinced that my methodology was in error.
It should add up to 100%: Nope, Usually there is missing data and even people that don’t speak languages. It is rare a country would add up to 100% - in fact due to bilingualism it should be more. In this case the data source I had only indicated 1 language preference per person.
Which table was used? (census link) The Github comment referred to the table on p37 about people’s native tongue, showing >50% Belarusian. I used the table on p41 reflecting the language people most commonly use at home “на котором обычно разговаривают дома”, which shows the Belarusian ethnic majority prefers to use Russian, even at home.
Native tongue or language at home? Since the goal is CLDR is to reflect Common usage (hence the C in CLDR) I choose to use the table reflecting how people use the language – regardless of which they are ancestrally tied to.
I can see a future where we list BOTH pieces of data in CLDR. I do like the idea of separating language counts by overall Common Use, Comprehension, Speech, and Writing. The database is too limited today.
Tone of Tickets: I don’t like when people demand things. However, I have experience launching products in Belarus and Belarusian adoption was indeed a fraction of Russian adoption. I checked various sources – many who strongly disagree – but I examined the data objectively and saw it was right.
Why 2025? We have a significant back-log of tickets updating population numbers. Lots of missing languages, lots of stale data. I’m working on 1) clearing the backlog and 2) designing a better way to handle the data so it can scale better and, as said, work on expanding the kinds of data we can offer.
@Conrad Nied Closing again for lack of response. Please open a new ticket (feel free to link to this one) if you have a solid case for us to make a change.
Note: Ticket was reopened, there’s commentary in GitHub here https://github.com/unicode-org/cldr/pull/4398#issuecomment-2761037612
Please continue discussion here on the ticket for 48.
🛬 Merged PR
@conradarcturus merged a PR to unicode-org/cldr:main
CLDR-14479 Fix Belarus locale demographics (#4398) https://github.com/unicode-org/cldr/pull/4398 https://github.com/unicode-org/cldr/commit/7167ed4a62f09d5ed2d76b0236ce1d46ba2f85c1
Adding to the Migration section of the 47 release notes.
Hi!
I talked to the creators of Calamares on GitHub in order to understand why the main language for Belarus is Belarusian. I was given a link: https://unicode-org.github.io/cldr-staging/charts/38.1/supplemental/territory_language_information.html
Then I was unpleasantly surprised. I do not know where you got the data from, but in the list, Russian and Belarusian need to be swapped: in Belarusian, it says good if 1 million. The vast majority of the population - 9 million-uses only Russian every day at home, at work and in other areas of life.
I understand that the Belarusian-speaking radical minority is trying to impose Belarusian on the vast majority, but this does not mean that it is the native and most familiar language for the whole country.
I do not ask, but I demand to deal with this bug and take urgent measures to correct it. And correct the data that is shown in the table at the link above - they are not just incorrect, they are ridiculous. And, of course, they do not correspond to reality.