export ICU data files in dist?

Description

So I've been publishing ICU data to NPM to make the the full-icu package work. But I wonder, now that we are publishing to GitHub, could we simply include the (say) icudt64l.dat / icudt64b.dat / icudt64e.dat files in the release (perhaps zipped up) - even as a .zip file it would be about 33MB to include all 3 versions. The benefit is that builders (such as node.js) would not need to build the byte swapper in order to pick up the data file.

Hmm. Thinking about this more. The little endian is already inside the src.zip/src.tgz. So perhaps a zip file that only contained big endian + ebcdic.

I'm willing to make the small change to 'make dist' to enable this.

Activity

Show:
Steven R. Loomis
March 11, 2020 at 6:45 PM

TC consensus: 2 files:

icu4c-67.1-data-bin-b.zip

icu4c-67.1-data-bin-l.zip


Skip EBCDIC (at least for now)

Stephen Gallagher
January 14, 2020 at 6:17 PM

I maintain the Node.js packages for Fedora and Red Hat Enterprise Linux. We recently switched over to a hybrid approach to offering ICU for Node.js; we ship the binary built with small-icu as well as the patch from https://github.com/nodejs/node/pull/30825 to enable a standard location from which to auto-load the ICU data file if it is present.

Because we need to be able to ship the data file on multiple processor architectures (including some big-endian architectures), right now I need to build the byte swapping application and run it during the package-building process. It would simplify my life (and reduce the chances of error) if the data files were shipped pre-built for the non-little-endian systems.

Steven R. Loomis
December 6, 2019 at 7:50 PM
(edited)

 

name it icu4c-67.1-data-bin.zip ?

name it icu4c-67.1-data-bin.tar.bz2 ? (save some space)

.. split it into separate [b,l,e] zipfiles?


I think splitting into separate files might make the most sense even though it’s more overhead. A zip file of just one file will be ~11MB. The raw files are ~27. The most compressed possible will still be ~30mb for three, possibly more. Most users will only need one format.

 

In fact, can we make it really simple for naming, and just name the data files: icudt64l.zip or icudt64b.zip ? No, this doesn’t work- because there might be a 67.1, 67.2 etc.

Proposal

So how about this:

icu4c-67.1-data-bin-b.zip

icu4c-67.1-data-bin-e.zip

icu4c-67.1-data-bin-l.zip

 

The zipfile will consist of:

  • the .dat file

  • LICENSE

  • some generated README-icu4c-67.1-data-bin-b.txt that has a very short readme.

 

Steven R. Loomis
May 22, 2019 at 11:27 PM
Fixed

Details

Assignee

Reporter

Components

Priority

Fix versions

Created May 14, 2019 at 12:31 AM
Updated September 27, 2021 at 10:12 PM
Resolved March 31, 2020 at 2:32 AM