So far, the best we've been able to do is to upload an image to the
docker hub, and point an image's Dockerfile's FROM to the version
uploaded onto the hub.
That is a cumbersome process, and makes the use of "layered" docker
images painful.
This change allows to declare a parent docker image in the
taskcluster/ci/docker-image/kind.yml definitions, which will be
automatically loaded before building the image. The Dockerfile can then
reference the image, using the DOCKER_IMAGE_PARENT argument, which will
contain the full image name:tag.
Some details are left off, for now, such as VOLUMEs. At this point,
VOLUMEs should all be defined in leaf docker images.
In many cases, building docker images starts on machines that don't have
a cached checkout, and it often takes forever to get a full clone. It
used to be worsened when 3 jobs could run at the same time because the
worker would start up clean, and 3 jobs would be doing a mercurial clone
at the same time, thrashing I/O, but that part is fortunately fixed.
It is still, however, appreciable not to waste time in the mercurial
clone part of image creation.
The image builder image we use to build docker images is updated
manually, and not necessarily when changes occur in tree that should be
reflected by a new image builder image. For instance, its run-task is
currently outdated. Not enough that it's actually a problem, but it
could rapidly become a problem.
There is also a lot of friction when trying to make changes in how
docker images are built, and while last time I tried, I ended up not
being able to do the changes I wanted to make because the docker version
on the host is too old, but this is already the second time I've been
trying to make things better and hit a wall because the the image
builder is essentially fixed in stone on the docker hub.
So with this change, we make all the docker images use the in-tree image
builder image, except itself, obviously. That one uses the last version
that was uploaded. We may want to update it at some point, but not doing
so will only impact building the image builder image itself, not the
other ones.
taskcluster/taskgraph/transforms/task.py sets an expiry to 28 days for
tasks on try, vs. 1 year on other projects, but only do that when an
expiry is not already set, which docker images do. And they do always
set to 1 year.
But it doesn't make sense to keep large docker images from try for a
year. So use the same retention policy as the default one. We /could/
just remove the expiry from docker images and get the task.py default,
but it seems like whatever future change might happen to that default
shouldn't affect docker images, so it's better to duplicate the setting.
In bug 1427326, we added package tasks that can be depended upon by
docker image tasks. In that case, we add the routes containing a digest
for those package tasks when computing the docker image digests.
The problem is that those routes start with 'index.gecko.cache.level-n'
where n varies between try and e.g. mozilla-central. That means the
digest for those docker images varies between try and e.g.
mozilla-central, which then prevents try from using the cached versions
for mozilla-central when there is one, like for other docker images
without package dependencies.
What we really need from those routes is the digest part, which is
independent of the level, and we don't actually care about anything else
in the route string, so just use the digest.
Instead of duplicating Dockerfiles between taskcluster/docker/*
directories, which can be error prone for very close images, it can be
desirable to use the same file. This change allows to set the
`definition` keyword on a docker image definition in kind.yml that
will make the task use the files from taskcluster/docker/<definition>
instead of taskcluster/docker/<image_name>.
Ideally, we'd simply use the --build-arg docker argument along with ARG
in the Dockerfile, but that's only supported from Docker API 1.21, and
we're stuck on 1.18 for the moment.
So we add another hack to how we handle the Dockerfile, by adding a
commented syntax that allows to declare arguments to the Dockerfile.
The arguments can be defined in the docker images kind.yml file through
the `args` keyword. Under the hood, they are passed down to the docker
image task through the environment. The mach taskcluster-build-image
command then uses the corresponding values from the environment to
generate a "preprocessed" Dockerfile for its context.
It is not at *all* clear how multiple optimizations for a single task should
interact. No simple logical operation is right in all cases, and in fact in
most imaginable cases the desired behavior turns out to be independent of all
but one of the optimizations. For example, given both `seta` and
`skip-unless-files-changed` optimizations, if SETA says to skip a test, it is
low value and should be skipped regardless of what files have changed. But if
SETA says to run a test, then it has likely been skipped in previous pushes, so
it should be run regardless of what has changed in this push.
This also adds a bit more output about optimization, that may be useful for
anyone wondering why a particular job didn't run.
MozReview-Commit-ID: 3OsvRnWjai4
It is not at *all* clear how multiple optimizations for a single task should
interact. No simple logical operation is right in all cases, and in fact in
most imaginable cases the desired behavior turns out to be independent of all
but one of the optimizations. For example, given both `seta` and
`skip-unless-files-changed` optimizations, if SETA says to skip a test, it is
low value and should be skipped regardless of what files have changed. But if
SETA says to run a test, then it has likely been skipped in previous pushes, so
it should be run regardless of what has changed in this push.
This also adds a bit more output about optimization, that may be useful for
anyone wondering why a particular job didn't run.
MozReview-Commit-ID: 3OsvRnWjai4
It is not at *all* clear how multiple optimizations for a single task should
interact. No simple logical operation is right in all cases, and in fact in
most imaginable cases the desired behavior turns out to be independent of all
but one of the optimizations. For example, given both `seta` and
`skip-unless-files-changed` optimizations, if SETA says to skip a test, it is
low value and should be skipped regardless of what files have changed. But if
SETA says to run a test, then it has likely been skipped in previous pushes, so
it should be run regardless of what has changed in this push.
This also adds a bit more output about optimization, that may be useful for
anyone wondering why a particular job didn't run.
MozReview-Commit-ID: 3OsvRnWjai4
This includes adding TASKCLUSTER_VOLUMES to docker image builds directly. The
env variable is not added as part of the task transform because `run-task` is
not in payload.command. In fact, build-image.sh calls run-task after doing
some other housekeeping.
Ideally image builds would be turned into jobs and all of this would occur
automatically, but that turns out to be quite a bit too complex for this
incidental fix -- perhaps best solved in another bug.
MozReview-Commit-ID: FYHvafJras7
This includes adding TASKCLUSTER_VOLUMES to docker image builds directly. The
env variable is not added as part of the task transform because `run-task` is
not in payload.command. In fact, build-image.sh calls run-task after doing
some other housekeeping.
Ideally image builds would be turned into jobs and all of this would occur
automatically, but that turns out to be quite a bit too complex for this
incidental fix -- perhaps best solved in another bug.
MozReview-Commit-ID: FYHvafJras7
See the inline comment for the rationale here.
This check may not catch all volumes and caches. But after subsequent
commits refactor how permissions for caches and volumes are handled,
this edge case will likely result in permissions errors in the task,
so it isn't worth worrying about.
Several Dockerfile have been updated to add missing VOLUME so the check
passes.
In the case of desktop1604-test, we stopped removing
/home/worker/.cache because you can't remove a mount point, which is
what volumes are inside Docker containers.
MozReview-Commit-ID: GEyNkkX00kN
valgrind test will try to load debug information for the modules present
in a stack trace. If it fails to do it, we endup with a stack trace with
only memory addresses.
We install debuginfo for all installed packages and look for all libs
in the system common locations, and try to install the corresponding
debug information package.
These are acomplished with debuginfo-install yum utility script.
MozReview-Commit-ID: 76mHOUKKJud
To date we have variously specified both worker-type and worker-implementation,
often manually coordinated. We also embedded a few awkward assumptions such as
that the native engine only runs on OS X.
But a worker type has one and only one implementation, and that implementation
is stable over time (as changing it would require simultaneous landings on all
trees).
Instead, this change makes worker-type the primary configuration, and derives
both a worker implementation (defining the payload format) and worker OS
(determining what to include in the payload) from that value. The derivation
occurs when deciding how to implement a particular job, where the run_using
functions are distinguished by worker implementation.
The two-part logic to determine how and where to run a test task based on its
platform is combined into a single transform, `set_worker_type`.
This contains some other related changes:
- MOZ_AUTOMATION is set in specific jobs, rather than everywhere docker-worker
is used
- the URL to test packages is factored out into a shared function
- docker-worker test defaults are applied in `mozharness_test.py`
- the WORKER_TYPE array in `task.py`, formerly mixing two types of keys, is
split
- the 'invalid' workerType is assigned an 'invalid' implementation
- all tasks that do not use job descriptions but use docker-worker, etc. have
`worker.os` added
Tested to not produce a substantially different taskgraph for a regular push, a
try push, or a nightly cron.
MozReview-Commit-ID: LDHrmrpBo7I
A few commits ago, we bumped up the default zstandard compression level
from 3 to 10 when we switched to multi-threaded compression. Even with
multiple threads, this was a bit slower.
For images that will be built once and read multiple times, it is
worthwhile to burn extra CPU once and produce a small image. However,
for other tasks where the number of reads is limited, it isn't
worth it to use this extra CPU. This commit uses the SCM level as
a proxy for "optimize for speed." If the task is associated with level
1 (a try push), we lower the compression level and optimize for
speed. Otherwise, we keep the higher compression level and
optimize for image size.
Credit goes to Jonas for this terrific idea.
MozReview-Commit-ID: Hui97KsZpgw
Note that the to_json method prefers the taskgraph's dependencies information
(edges) to that from the task.dependencies entries. At a few points in
task-graph generation, these values differ, although that is expected (for
example, the full task set contains no edges, but that information is still in
task.dependencies). Unifying that representation leads to some difficulty with
task transforms that reach into the dependency tree (beetmover), so the
different representations are left as-is.
MozReview-Commit-ID: GeW8HNwFA9Z