ICU4C build fails on Windows when building from a long folder path

Description

Reported by Jamie Dale on the icu-support mailing list:

I downloaded the src and data zips for the ICU 64 release, extracted both and replaced the src version of source/data with the data folder from the data zip.

I then opened up allinone.sln and built makedata (VS 2017) and it fails when genrb.exe is invoked for the locales build with the usePoolBundle flag, as that pushes the command line over its limit.

My root directory for ICU is D:\Downloads\icu4c-64_1-src\icu and I'm running Windows 10 (1709).

I could probably try running it from a shorter root directory, as the call with the writePoolBundle flag works, but adding the path for usePoolBundle makes it too long.

I can confirm that the data builds fine if built from D:\icu

Jeff:
I've been mostly building ICU from very short folder paths, which might explain why I haven’t seen this before.
Also, the CI builds also use very short paths as well, which is why it didn't show up there.

 

Pasting in the error message from the logs for search-ability:

Activity

Show:
Jeff Genovy
April 17, 2019, 5:39 PM

Thanks for the comment Shane.

I'm able to reproduce this locally, and I'll attach a snippet of the log with the failing line.

The issue is with the command line length being passed to genrb

The max length for a command on Windows (on XP and above) is 8,191 characters.
Whereas, it looks like the max command length is 131,072 characters for Linux (generally speaking).

The number of locales being passed uses up 7,837 characters, or ~95.7% of the max length.
(So the path length is just enough to cause it to be pushed over the limit.)

If ICU adds more locales in the future (which it likely will) then we will eventually run up against this limit regardless of the path length...

I think we'll need to do what Shane suggested and have genrb read the list of locales from a temporary file that is passed on the command line instead. (We'll need to modify the BUILDRULES.py script to create this file, and modify genrb to read from it instead.)

Shane Carr
April 17, 2019, 9:05 PM
Edited

In terms of creating the locale list file, you can follow a model in BUILDRULES.py similar to what we do for the res_index, which first creates a text file full of locales and then processes it into a resource bundle.

Markus suggests adding new features to genrb as “verbose” flags (--feature instead of -f). In terms of semantics for genrb, I’d say that if the new flag is present (--locale-list=foo.lst), then the command line argument locale files are ignored.

I’m happy to review the change.

Jeff Genovy
July 23, 2019, 12:51 AM

FWIW, I did reach out the owners of CMD in Windows and unfortunately there is no plan to ever remove the 8,191 character restriction from CMD. Apparently there is an unfortunately large amount of legacy software written which takes a hard dependency on this limit. Their recommendation was to use PowerShell – however, Visual Studio falls over if you change the default shell on Windows to be PowerShell instead of CMD.

Jeff Genovy
August 29, 2019, 6:44 PM

Okay after some testing I think I can work around this by changing to execute the command with PowerShell at at the last second in the “common_exec.py” Python file, rather than changing the prompt for all of the entire build to PowerShell (which cause VS to fail).

Another possible method of working around this that I tried was to modify/hack the Python data build scripts.
However, that has the the unfortunate result of losing some of the parallelism in some cases, as the
RepeatedOrSingleExecutionRequest needs to be changed to a SingleExecutionRequest in order to move the decision about whether or not to pass the the locale listing as an argument list or as a file to genrb.exe earlier in the data build process. I ended up not liking this approach.

Instead, we can temporarily use PowerShell to execute the command when the length exceeds CMD's limit. (Note that PowerShell is available on Windows 7 and above, which is the minimum version of Windows that ICU currently supports).

I only temporarily switch over to use PowerShell for the command if it exceeds the CMD limit, and then switch back to CMD for other commands. From testing, I found that using PowerShell was actually much slower, so we'd only want to use it when needed.

Jeff Genovy
August 29, 2019, 7:23 PM
Fixed

Assignee

Jeff Genovy

Reporter

Jeff Genovy

Components

Labels

Reviewer

Shane Carr

Priority

major

Time Needed

None

Fix versions

Configure