export ICU data files in dist?
So I've been publishing ICU data to NPM to make the the full-icu package work. But I wonder, now that we are publishing to GitHub, could we simply include the (say) icudt64l.dat / icudt64b.dat / icudt64e.dat files in the release (perhaps zipped up) - even as a .zip file it would be about 33MB to include all 3 versions. The benefit is that builders (such as node.js) would not need to build the byte swapper in order to pick up the data file.
Hmm. Thinking about this more. The little endian is already inside the src.zip/src.tgz. So perhaps a zip file that only contained big endian + ebcdic.
I'm willing to make the small change to 'make dist' to enable this.
TC consensus: 2 files:
Skip EBCDIC (at least for now)
I maintain the Node.js packages for Fedora and Red Hat Enterprise Linux. We recently switched over to a hybrid approach to offering ICU for Node.js; we ship the binary built with small-icu as well as the patch from to enable a standard location from which to auto-load the ICU data file if it is present.
Because we need to be able to ship the data file on multiple processor architectures (including some big-endian architectures), right now I need to build the byte swapping application and run it during the package-building process. It would simplify my life (and reduce the chances of error) if the data files were shipped pre-built for the non-little-endian systems.
name it icu4c-67.1-data-bin.zip ?
name it icu4c-67.1-data-bin.tar.bz2 ? (save some space)
.. split it into separate [b,l,e] zipfiles?
I think splitting into separate files might make the most sense even though it’s more overhead. A zip file of just one file will be ~11MB. The raw files are ~27. The most compressed possible will still be ~30mb for three, possibly more. Most users will only need one format.
In fact, can we make it really simple for naming, and just name the data files: icudt64l.zip or icudt64b.zip ? No, this doesn’t work- because there might be a 67.1, 67.2 etc.
So how about this:
The zipfile will consist of:
the .dat file
some generated README-icu4c-67.1-data-bin-b.txt that has a very short readme.