export ICU data files in dist?

Description

So I've been publishing ICU data to NPM to make the the full-icu package work. But I wonder, now that we are publishing to GitHub, could we simply include the (say) icudt64l.dat / icudt64b.dat / icudt64e.dat files in the release (perhaps zipped up) - even as a .zip file it would be about 33MB to include all 3 versions. The benefit is that builders (such as node.js) would not need to build the byte swapper in order to pick up the data file.

Hmm. Thinking about this more. The little endian is already inside the src.zip/src.tgz. So perhaps a zip file that only contained big endian + ebcdic.

I'm willing to make the small change to 'make dist' to enable this.

Activity

Show:
Steven R. Loomis
March 11, 2020, 6:45 PM

TC consensus: 2 files:

icu4c-67.1-data-bin-b.zip

icu4c-67.1-data-bin-l.zip


Skip EBCDIC (at least for now)

Stephen Gallagher
January 14, 2020, 6:17 PM

I maintain the Node.js packages for Fedora and Red Hat Enterprise Linux. We recently switched over to a hybrid approach to offering ICU for Node.js; we ship the binary built with small-icu as well as the patch from to enable a standard location from which to auto-load the ICU data file if it is present.

Because we need to be able to ship the data file on multiple processor architectures (including some big-endian architectures), right now I need to build the byte swapping application and run it during the package-building process. It would simplify my life (and reduce the chances of error) if the data files were shipped pre-built for the non-little-endian systems.

Steven R. Loomis
December 6, 2019, 7:50 PM
Edited

 

name it icu4c-67.1-data-bin.zip ?

name it icu4c-67.1-data-bin.tar.bz2 ? (save some space)

.. split it into separate [b,l,e] zipfiles?


I think splitting into separate files might make the most sense even though it’s more overhead. A zip file of just one file will be ~11MB. The raw files are ~27. The most compressed possible will still be ~30mb for three, possibly more. Most users will only need one format.

 

In fact, can we make it really simple for naming, and just name the data files: icudt64l.zip or icudt64b.zip ? No, this doesn’t work- because there might be a 67.1, 67.2 etc.

Proposal

So how about this:

icu4c-67.1-data-bin-b.zip

icu4c-67.1-data-bin-e.zip

icu4c-67.1-data-bin-l.zip

 

The zipfile will consist of:

  • the .dat file

  • LICENSE

  • some generated README-icu4c-67.1-data-bin-b.txt that has a very short readme.

 

Steven R. Loomis
May 22, 2019, 11:27 PM
Fixed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Steven R. Loomis

Reporter

Steven R. Loomis

Components

Priority

medium

Fix versions