This allows to avoid writing out a tar file to then extract it to feed
it to `docker build`. This is essentially what the image-builder docker
image does, except it uses a temporary file for the tar.
While spawning `docker load` is likely to work on developer machines,
on automation, it requires a docker client that is the exact same
version as the server running on the taskcluster worker for
docker-in-docker, which is not convenient. The API required for `docker
load` is rather simple, though, and can be mimicked quite easily.
While this change in itself is not necessary for developer machines,
it will allow to re-use the same command for the image-builder to
load a parent docker images when deriving one from another. We could
keep a code branch using `docker load` but it seems wasteful to maintain
two branches when one can work for both use cases.
Now that we don't need to read the contents of a file to hash the
contents of a docker image context, we can avoid creating a file
in generate_context_hash.
Instead of duplicating Dockerfiles between taskcluster/docker/*
directories, which can be error prone for very close images, it can be
desirable to use the same file. This change allows to set the
`definition` keyword on a docker image definition in kind.yml that
will make the task use the files from taskcluster/docker/<definition>
instead of taskcluster/docker/<image_name>.
Ideally, we'd simply use the --build-arg docker argument along with ARG
in the Dockerfile, but that's only supported from Docker API 1.21, and
we're stuck on 1.18 for the moment.
So we add another hack to how we handle the Dockerfile, by adding a
commented syntax that allows to declare arguments to the Dockerfile.
The arguments can be defined in the docker images kind.yml file through
the `args` keyword. Under the hood, they are passed down to the docker
image task through the environment. The mach taskcluster-build-image
command then uses the corresponding values from the environment to
generate a "preprocessed" Dockerfile for its context.
Giving a directory to %include would copy all leaf files under one
single directory in the context image. The only image affected is
valgrind-build, which ended up having a dot-config/pip.conf file instead
of dot-config/pip/pip.conf, meaning valgrind jobs weren't using the
pip config.
Docker volumes are host-mounted filesystems. We typically mount
caches at their location. But not always. The reason we define
VOLUME in Dockerfiles is we're guaranteed to get a fast host
filesystem instead of AUFS when a cache isn't mounted.
In this commit, we teach the docker-worker payload builder about
the existence of Docker volumes. Docker volumes can be declared
inline in the YAML. More conveniently, we automatically parse out
VOLUME lines from corresponding in-tree Dockerfile.
We'll do useful things with this data in subsequent commits.
MozReview-Commit-ID: BNxp8EDEYw
Various modules under taskcluster are doing ad-hoc url formatting or
requests to taskcluster services. While we could use the taskcluster
client python module, it's kind of overkill for the simple requests done
here. So instead of vendoring that module, create a smaller one with
a limited set of functions we need.
This changes the behavior of the get_artifact function to return a
file-like object when the file is neither a json nor a yaml, but that
branch was never used (and was actually returning an unassigned
variable, so it was broken anyways).
At the same time, make the function that does HTTP requests more
error-resistant, using urllib3's Retry with a backoff factor.
Also add a function that retrieves the list of artifacts, that while
currently unused, will be used by `mach artifact` shortly.
Instead of every file trying to get the top source directory having an
ad-hoc definition that gets wrong if the files gets moved around for
some reason, define it in a more central location.
Various modules under taskcluster are doing ad-hoc url formatting or
requests to taskcluster services. While we could use the taskcluster
client python module, it's kind of overkill for the simple requests done
here. So instead of vendoring that module, create a smaller one with
a limited set of functions we need.
This changes the behavior of the get_artifact function to return a
file-like object when the file is neither a json nor a yaml, but that
branch was never used (and was actually returning an unassigned
variable, so it was broken anyways).
At the same time, make the function that does HTTP requests more
error-resistant, using urllib3's Retry with a backoff factor.
Also add a function that retrieves the list of artifacts, that while
currently unused, will be used by `mach artifact` shortly.
Instead of every file trying to get the top source directory having an
ad-hoc definition that gets wrong if the files gets moved around for
some reason, define it in a more central location.
This adds a HASH file next to the VERSION file in the image
context folders for prebuilt docker images. And uses the
HASH for referencing the image in the tasks created by
the decision task.
This way docker will validate the image hash when pulling it
in production. Thus, attackers won't be able to inject code
by compromising the remote docker registries we use to store
prebuilt images. Further more, this makes validation of the
Chain-Of-Trust artifacts easier as this eliminates the need
for whitelists and hash validation.
MozReview-Commit-ID: FD3B9MyeU9Q
* Compress docker images with zstd
* Removed need for context.tar from decision task
* Index images by level rather than project
MozReview-Commit-ID: 4RL4QXNWmpd
Now that Docker image building is called from Python, we can start to
do advanced stuff with it.
With this commit, we switch from building Docker images directly from
the source directory ("the Docker way") to using our custom Docker image
build contexts.
The main advantage of this is that locally-built Docker images can now
use our custom Dockerfile syntax to include extra files in the build
context!
The code for building a Docker image from a context has been extracted
to its own standalone function. I have nefarious plans for this in the
future, such as the ability to override the FROM syntax to specify
URLs of images. This would allow us to host base images on our own
server, which removes a dependency on Docker Hub and improves
determinism, since images on Docker Hub change all the time.
MozReview-Commit-ID: 5lTdV8yEHkc
We already had code for resolving the image registry and tag. We
refactored it slightly to be more useful then changed build.sh to
accept the tag as an argument.
At this point, build.sh is basically a wrapper around `docker`. But
there's a special case for executing custom "build.sh" files we
need to eliminate first...
MozReview-Commit-ID: A9HVvxgCdG2
A limitation of traditional docker build context generation is it
only includes files from the same directory as the Dockerfile. When
repositories have multiple, related Dockerfiles, this limitation
results file duplication or putting all Dockerfiles in the same
directory (which isn't feasible for mozilla-central since they would
need to be in the root directory).
This commit enhances Dockerfiles to allow *any* file from the
repository checkout to be ADDed to the docker build context.
Using the syntax "# %include <path>" you are able to include paths
or directories (relative from the top source directory root) in the
generated context archive. Files add this way are available under the
"topsrcdir/" path and can be ADDed to Docker images.
Since context archive generation is deterministic and the hash of
the resulting archive is used to determine when images need to be
rebuilt, any extra included file that changes will change the hash
of the context archive and force image regeneration.
Basic tests for the new feature have been added.
MozReview-Commit-ID: 4hPZesJuGQV
This restores order to only having a single hash for a context
directory.
Using a tempfile here is a bit unfortunate. It can be optimized later,
if needed.
MozReview-Commit-ID: LMNsvt3fDYx
Relying on global variables like GECKO is a bit dangerous. To facilitate
testing of archive generation in subsequent commits, let's pass an
path into this function.
The argument is currently unused.
MozReview-Commit-ID: Et1UYraflDP
We recently implemented code in mozpack for performing deterministic
tar file creation. It normalizes things like uids, gids, and mtimes
that creep into archives.
MozReview-Commit-ID: 1tn5eXkqACQ
Upcoming commits will refactor how context tarballs are created. In
preparation for this, we establish a standalone function for creating
context tarballs and refactor docker_image.py to use it.
MozReview-Commit-ID: KEW6ppO1vCl