export ICU data files in dist?
Description
relates to
Activity

TC consensus: 2 files:
icu4c-67.1-data-bin-b.zip
icu4c-67.1-data-bin-l.zip
Skip EBCDIC (at least for now)

I maintain the Node.js packages for Fedora and Red Hat Enterprise Linux. We recently switched over to a hybrid approach to offering ICU for Node.js; we ship the binary built with small-icu
as well as the patch from https://github.com/nodejs/node/pull/30825 to enable a standard location from which to auto-load the ICU data file if it is present.
Because we need to be able to ship the data file on multiple processor architectures (including some big-endian architectures), right now I need to build the byte swapping application and run it during the package-building process. It would simplify my life (and reduce the chances of error) if the data files were shipped pre-built for the non-little-endian systems.

name it icu4c-67.1-data-bin.zip
?
name it icu4c-67.1-data-bin.tar.bz2
? (save some space)
.. split it into separate [b,l,e] zipfiles?
I think splitting into separate files might make the most sense even though it’s more overhead. A zip file of just one file will be ~11MB. The raw files are ~27. The most compressed possible will still be ~30mb for three, possibly more. Most users will only need one format.
In fact, can we make it really simple for naming, and just name the data files: icudt64l.zip
or icudt64b.zip
? No, this doesn’t work- because there might be a 67.1, 67.2 etc.
Proposal
So how about this:
icu4c-67.1-data-bin-b.zip
icu4c-67.1-data-bin-e.zip
icu4c-67.1-data-bin-l.zip
The zipfile will consist of:
the
.dat
fileLICENSE
some generated
README-icu4c-67.1-data-bin-b.txt
that has a very short readme.
So I've been publishing ICU data to NPM to make the the
full-icu
package work. But I wonder, now that we are publishing to GitHub, could we simply include the (say) icudt64l.dat / icudt64b.dat / icudt64e.dat files in the release (perhaps zipped up) - even as a .zip file it would be about 33MB to include all 3 versions. The benefit is that builders (such as node.js) would not need to build the byte swapper in order to pick up the data file.Hmm. Thinking about this more. The little endian is already inside the src.zip/src.tgz. So perhaps a zip file that only contained big endian + ebcdic.
I'm willing to make the small change to 'make dist' to enable this.