CentOS 6 is pinned to glibc 2.12, but newer Android build-tools (like
aapt) require glibc 2.14. It's not possible to safely upgrade CentOS
6 distributions to glibc 2.14.
CentOS 7 is pinned to glibc 2.17, which is new enough for newer
Android build-tools. However, I had great difficulty bringing forward
our existing centos:6 Docker image to centos:7. In particular,
installing recent enough Mercurial, git, Python, and pip versions was
difficult enough that I elected to not pursue this approach.
Instead, I've elected to follow glandium's suggestion from
https://bugzilla.mozilla.org/show_bug.cgi?id=1370119#c5: base on
Debian with snapshots.debian.org for reproducibility.
The most significant changes here:
- using Debian's snapshots repository
- using Python and related tools provided by Debian and baked into the
build image
- using the JDK and JRE provided by Debian and baked into the build
image, rather than versions from tooltool (or eventually a toolchain
build)
Moving the builds over to use this image will follow in the patches
ahead.
CentOS 6 is pinned to glibc 2.12, but newer Android build-tools (like
aapt) require glibc 2.14. It's not possible to safely upgrade CentOS
6 distributions to glibc 2.14.
CentOS 7 is pinned to glibc 2.17, which is new enough for newer
Android build-tools. However, I had great difficulty bringing forward
our existing centos:6 Docker image to centos:7. In particular,
installing recent enough Mercurial, git, Python, and pip versions was
difficult enough that I elected to not pursue this approach.
Instead, I've elected to follow glandium's suggestion from
https://bugzilla.mozilla.org/show_bug.cgi?id=1370119#c5: base on
Debian with snapshots.debian.org for reproducibility.
The most significant changes here:
- using Debian's snapshots repository
- using Python and related tools provided by Debian and baked into the
build image
- using the JDK and JRE provided by Debian and baked into the build
image, rather than versions from tooltool (or eventually a toolchain
build)
Moving the builds over to use this image will follow in the patches
ahead.
This includes adding TASKCLUSTER_VOLUMES to docker image builds directly. The
env variable is not added as part of the task transform because `run-task` is
not in payload.command. In fact, build-image.sh calls run-task after doing
some other housekeeping.
Ideally image builds would be turned into jobs and all of this would occur
automatically, but that turns out to be quite a bit too complex for this
incidental fix -- perhaps best solved in another bug.
MozReview-Commit-ID: FYHvafJras7
This includes adding TASKCLUSTER_VOLUMES to docker image builds directly. The
env variable is not added as part of the task transform because `run-task` is
not in payload.command. In fact, build-image.sh calls run-task after doing
some other housekeeping.
Ideally image builds would be turned into jobs and all of this would occur
automatically, but that turns out to be quite a bit too complex for this
incidental fix -- perhaps best solved in another bug.
MozReview-Commit-ID: FYHvafJras7
Using /home/worker is the build directory has a 30% talos performance
loss, because test machines has a /home mount directory.
MozReview-Commit-ID: 554IPMRWgzK
The updated Docker image contains robustcheckout and run-task support
for sparse checkouts, which are obvious prerequisites.
We change the cache name so sparse and non-sparse checkouts don't
use the same working directory. If we didn't do this, tasks running
from images with old Mercurial clients or without a sparse aware
robustcheckout would fail.
The effect of using a sparse checkout is that we reduce the number
of files in the checkout from ~234,000 to ~3,600. This reduces time
for a fresh checkout from several dozen seconds to under 2s.
MozReview-Commit-ID: IJz794g8ZKH
The updated Docker image contains robustcheckout and run-task support
for sparse checkouts, which are obvious prerequisites.
We change the cache name so sparse and non-sparse checkouts don't
use the same working directory. If we didn't do this, tasks running
from images with old Mercurial clients or without a sparse aware
robustcheckout would fail.
The effect of using a sparse checkout is that we reduce the number
of files in the checkout from ~234,000 to ~3,600. This reduces time
for a fresh checkout from several dozen seconds to under 2s.
MozReview-Commit-ID: IJz794g8ZKH
`run-task` is taught a --sparse-profile argument to be passed down
to `hg robustcheckout` for the main source checkout. It does what
you expect: performs a sparse checkout using the named profile.
The Taskgraph YAML for run-task is taught a "sparse-profile"
property to define the sparse profile. When defined, --sparse-profile
will be passed down to `run-task` and the cache name will be updated
to reflect the use of sparse checkout.
Our cache checking transform is updated to audit for the use of
--sparse-profile without the corresponding "-sparse" cache name
variation.
The reason we need a distinct cache name for sparse is because
clients that aren't sparse aware will be unable to read checkouts
that are sparse. By forcing sparse and non-sparse into different
cache pools, we avoid compatibility issues.
In the ideal world, we probably support sparse profiles on all the
VCS checkouts that `run-task` supports (e.g. --tools-checkout).
Perfect is the enemy of done. All of this is defined in-tree and
it is easy enough to change atomically.
MozReview-Commit-ID: 79k7Vul0hHO
The UID and GID that a task executes under is dynamic. As a result,
caches need to be aware of the UID and GID that owns files otherwise
subsequent tasks could run into permission denied errors. This is
why `run-task --chown-recursive` exists. By recursively changing
ownership of persisted files, we ensure the current task is able
to read and write all existing files.
When you take a step back, you realize that chowning of cached
files is an expensive workaround. Yes, this results in cache hits.
But the cost is you potentially have to perform hundreds of thousands
of I/O system calls to mass chown. The ideal situation is that
UID/GID is consistent across tasks on any given cache and
potentially expensive permissions setting can be avoided. So, that's
what this commit does.
We add the task's UID and GID to run-task's requirements. When we
first see a cache, we record a UID and GID with it and chown the
empty cache directory to that UID and GID. Subsequent tasks using
this cache *must* use the same UID and GID or else run-task will
fail.
Since run-task now guarantees that all cache consumers use the same
UID and GID, we can avoid a potentially expensive recursive chown.
But there is an exception. In untrusted environments (namely Try),
we recursively chown existing caches if there is a uid/gid mismatch.
We do this because Try is a sandbox and any random task could
experiment with a non-standard uid/gid. That populated cache would
"poison" the cache for the next caller. Or vice-versa. It would be
annoying if caches were randomly poisoned due to Try pushes that
didn't realize there was a UID/GID mismatch. We could outlaw "bad"
UID and GIDs. But that makes the barrier to testing things on Try
harder. So, we go with the flow and recursively chown caches in
this scenario.
This change will shine light on all tasks using inconsistent UID
and GID values on the same cache. Bustage is anticipated.
Unfortunately, we can't easily know what will break. So it will be
one of those things where we will have to fix problems as they arise.
Fortunately, because caches are now tied to the content of run-task,
we only need to back out this change and tasks should revert to caches
without UID and GID pinning requirements and everything will work
again.
MozReview-Commit-ID: 2ka4rOnnXIp
run-task's --chown and --chown-recursive are only used on volumes and
caches - the only locations that aren't controlled by the Docker image
itself and thus whose permissions could be "undefined."
Previous commits have taught run-task about the locations of all caches
and volumes. Therefore, we no longer need to manually define paths to
chown. Instead, we can chown as a side-effect of the path being a
cache or a volume.
So, this commit changes run-task to chown caches and volumes
automatically. Since we no longer have a use for --chown and
--chown-recursive, those arguments are removed.
There /could/ be some paths that are caches or volumes but aren't
getting defined as such in Taskgraph. I consider this a bug in
Taskgraph and the recourse is to properly define a path as a cache or
a volume there.
MozReview-Commit-ID: 1yqrhjil6gy
We recently introduced support for telling run-task about caches so
it could sanitize them automatically. We also recently taught
docker-worker and docker-engine how to declare volumes.
Building on that work, we now pass a list of paths corresponding
to Docker volumes to run-task.
run-task now verifies volumes behave as expected. Unless the volume
paths correspond to caches, run-task verifies they are empty and chowns
them to an appropriate owner.
Requiring empty volumes is an arbitrary decision. But as the inline
comment says, it keeps things simpler and makes caches and volumes
behave more like each other.
MozReview-Commit-ID: 5lm2uIitrS3
We're about to ban files in Docker volumes so they behave almost
identically to caches (which start empty).
We move the install of nexus.xml from Docker image time to
task time. This also means that changes to nexus.xml don't result
in having to rebuild the Docker image.
MozReview-Commit-ID: JIjeJN4mt2
See the inline comment for the rationale here.
This check may not catch all volumes and caches. But after subsequent
commits refactor how permissions for caches and volumes are handled,
this edge case will likely result in permissions errors in the task,
so it isn't worth worrying about.
Several Dockerfile have been updated to add missing VOLUME so the check
passes.
In the case of desktop1604-test, we stopped removing
/home/worker/.cache because you can't remove a mount point, which is
what volumes are inside Docker containers.
MozReview-Commit-ID: GEyNkkX00kN