cldr-staging size questions
Description
relates to
Activity
Bulk moving all issues to the next version which aren't in component type: brs, charts, docs, docs-spec
Bulk moving all tickets which are not in component (BRS, charts, docs, docs-spec, keyboards) with status Investigate status to v45
investigating alternative hosting for charts
@Steven R. Loomis The zips you can get from the GitHub Actions runs / that temp location you uploaded are good.
We’ve tested against those and haven’t found any issues.
But you can’t see it because you’re not authorized.
FWIW, any logged in GitHub user can see the artifacts from the GitHub Actions runs, but GitHub doesn’t allow for anonymous users to see them. So if you want a CI system to automatically pull from GitHub Actions, you need to supply it with GitHub credentials.
but it could be attached to all tags in cldr-staging by some action
I think this would be the ideal solution. Either:
uploading the pre-release ZIPs into some directory on http://unicode.org/Public/cldr/
uploading the pre-release ZIPs as artifacts on the cldr-staging GitHub releases
It would allow for anonymous, automated downloading of the pre-releases for automated testing by third parties.
Open questions:
which should be in main branch (production data or docs)?
will break links into the data?
will this disrupt external links?
ISSUE:
In working on the automated zipfile generation https://unicode-org.atlassian.net/browse/CLDR-15134 I noticed that cldr-staging’s buildbot takes 6 minutes just to fetch cldr-staging. This repository is growing at quite a pace, see below:
7.5G ( all of cldr-staging )
793M .git
229M production/
6.5G docs/charts/
626M docs/charts/36
180K docs/charts/36.1
906M docs/charts/37
1014M docs/charts/38
978M docs/charts/38.1
991M docs/charts/39
1.0G docs/charts/40
PROPOSAL:
I wonder if we could consider just moving docs to a ‘gh_pages’ branch, to be only used for github pages, and then delete docs from cldr-staging.
This would allow people to only make a shallow clone of cldr-staging with just 300M of the production data, rather than having to have nearly 8GB total including charts.
NOTE:
It’s easy with worktrees to checkout that branch in a different directory from the rest of cldr-staging. So you could have a “cldr-staging” work area with the data, and a “cldr-staging-pages” directory at the same time with just the docs.
Also note that the “build ICU from CLDR” action <https://github.com/unicode-org/cldr/actions/workflows/build-icu.yml> already creates production data from the cldr repo as a downloadable byproduct.