ICU4J optionally read data from .dat and .res

Description

Add the ability to read .dat and .res files.

  • Reduces heap memory usage: `open file, getChannel().map() -> MappedByteBuffer`

  • Allows sharing of data between ICU4C and ICU4J

  • Easy update of zoneinfo64.res by copying a new version into a configured path (no merging into .jar file)

All data reading should be via ByteBuffer; this is in progress via other tickets.

Have a cache that maps from a data path to a SoftReference<ByteBuffer>. Pass a clone of the ByteBuffer to the services' "Reader" classes.

The path would include either a ClassLoader path or something (anything?) for reading from files. Lookup under both versions, probably from files first.

There would be a runtime property for a list of file system paths.

We could enumerate all ICU files there early, populate the cache with entries that indicate where data is available, but without opening them (null ByteBuffer); may need to handle non-ICU files there (.txt?). We could open .dat files early, enumerate them too and populate the cache with entries for the .dat items.

Or this might be too expensive, and we could lazy-init cache entries, and the cache entry may not need to carry anything but the ByteBuffer.

Activity

Show:
TracBot
July 1, 2018, 12:12 AM
Trac Comment 3 by —2014-07-29T21:00:55.662Z

http://codereview.appspot.com/121870043

I am caching the list of files on the configured path.
I am not opening individual files and caching their ByteBuffer's because
1. Startup overhead: I want to avoid reading files that won't be needed.
2. Almost all of the ICU binary data are cached after deserialization and will never be loaded again. (The exception seems to be .cnv charset conversion table files.)
3. There may be non-ICU-binary-data files on the path.

I am opening .dat package files right away and cache their ByteBuffer's because we will need to search through them many times. I am not caching their table of contents.

TracBot
July 1, 2018, 12:12 AM
Trac Comment 4 by —2014-07-31T18:55:05.071Z

Merged into trunk.

I collected the time of running the tests in ICU4J "ant check":

  • Before merging, at 36105: 208s

  • After merging, with data files from the jar: 194s

  • After merging, with data files from little-endian ICU4C files: 188s

Fixed

Assignee

Markus Scherer

Reporter

Markus Scherer

Components

Labels

Reviewer

None

Priority

major

Time Needed

Days

Fix versions