python-zstandard 0.9 has an API that exposes a file object interface
for compression and decompression. This means we can remove our
stream wrapper in order to consume a zstandard compressed tar file.
MozReview-Commit-ID: DeWWKnigJVa
The latest python-zstandard uses a newer zstandard that is faster.
It also has wheels available, which means installation doesn't require
Python development headers, etc.
MozReview-Commit-ID: 5gRq81KYmX4
Now that `mach taskcluster-build-image` can, we can avoid all the manual
handling based on curl and jq in the image builder.
An additional advantage on relying on `mach taskcluster-build-image`
doing more is that less changes to the build-image.sh script will be
necessary, and thus less updates of the image builder docker image.
This allows to avoid writing out a tar file to then extract it to feed
it to `docker build`. This is essentially what the image-builder docker
image does, except it uses a temporary file for the tar.
While spawning `docker load` is likely to work on developer machines,
on automation, it requires a docker client that is the exact same
version as the server running on the taskcluster worker for
docker-in-docker, which is not convenient. The API required for `docker
load` is rather simple, though, and can be mimicked quite easily.
While this change in itself is not necessary for developer machines,
it will allow to re-use the same command for the image-builder to
load a parent docker images when deriving one from another. We could
keep a code branch using `docker load` but it seems wasteful to maintain
two branches when one can work for both use cases.
The zstd command we spawn, if available at all, might be the wrong
version: zstd changed its stream format in an incompatible way at some
point, and the version shipped in e.g. Ubuntu 16.04 uses the old format,
while the version taskcluster relies on uses the new format.
Relying on gps's zstandard library allows to ensure we use the right
version. Another advantage is that we can trivially pip install it in a
virtualenv if it isn't available on the system running the command.
If we're ridding ourselves of the subprocess spawning for zstd, we might
as well cover curl as well. Especially considering the error handling
when subprocesses are involved is not trivial, such that the current
error handling code is actually broken and leads to dead-lock
conditions, when, for example, curl is still waiting for the python side
to read data, but the python side is not reading data anymore because
an exception was thrown in the tar reading loop.
The used pattern:
except Exception:
error = sys.exc_info()[0]
finally:
...
if error:
raise error
actually loses everything that is interesting about the original
exception. Not catching the exception just makes it thrown up the stack,
except when a different exception is thrown from the finally block,
which is what that if error: raise error is attempting to do... except
it doesn't throw the original exception, but its type only.
Instead of duplicating Dockerfiles between taskcluster/docker/*
directories, which can be error prone for very close images, it can be
desirable to use the same file. This change allows to set the
`definition` keyword on a docker image definition in kind.yml that
will make the task use the files from taskcluster/docker/<definition>
instead of taskcluster/docker/<image_name>.
Ideally, we'd simply use the --build-arg docker argument along with ARG
in the Dockerfile, but that's only supported from Docker API 1.21, and
we're stuck on 1.18 for the moment.
So we add another hack to how we handle the Dockerfile, by adding a
commented syntax that allows to declare arguments to the Dockerfile.
The arguments can be defined in the docker images kind.yml file through
the `args` keyword. Under the hood, they are passed down to the docker
image task through the environment. The mach taskcluster-build-image
command then uses the corresponding values from the environment to
generate a "preprocessed" Dockerfile for its context.
Various modules under taskcluster are doing ad-hoc url formatting or
requests to taskcluster services. While we could use the taskcluster
client python module, it's kind of overkill for the simple requests done
here. So instead of vendoring that module, create a smaller one with
a limited set of functions we need.
This changes the behavior of the get_artifact function to return a
file-like object when the file is neither a json nor a yaml, but that
branch was never used (and was actually returning an unassigned
variable, so it was broken anyways).
At the same time, make the function that does HTTP requests more
error-resistant, using urllib3's Retry with a backoff factor.
Also add a function that retrieves the list of artifacts, that while
currently unused, will be used by `mach artifact` shortly.
Instead of every file trying to get the top source directory having an
ad-hoc definition that gets wrong if the files gets moved around for
some reason, define it in a more central location.
Various modules under taskcluster are doing ad-hoc url formatting or
requests to taskcluster services. While we could use the taskcluster
client python module, it's kind of overkill for the simple requests done
here. So instead of vendoring that module, create a smaller one with
a limited set of functions we need.
This changes the behavior of the get_artifact function to return a
file-like object when the file is neither a json nor a yaml, but that
branch was never used (and was actually returning an unassigned
variable, so it was broken anyways).
At the same time, make the function that does HTTP requests more
error-resistant, using urllib3's Retry with a backoff factor.
Also add a function that retrieves the list of artifacts, that while
currently unused, will be used by `mach artifact` shortly.
Instead of every file trying to get the top source directory having an
ad-hoc definition that gets wrong if the files gets moved around for
some reason, define it in a more central location.
This adds a HASH file next to the VERSION file in the image
context folders for prebuilt docker images. And uses the
HASH for referencing the image in the tasks created by
the decision task.
This way docker will validate the image hash when pulling it
in production. Thus, attackers won't be able to inject code
by compromising the remote docker registries we use to store
prebuilt images. Further more, this makes validation of the
Chain-Of-Trust artifacts easier as this eliminates the need
for whitelists and hash validation.
MozReview-Commit-ID: FD3B9MyeU9Q
* Compress docker images with zstd
* Removed need for context.tar from decision task
* Index images by level rather than project
MozReview-Commit-ID: 4RL4QXNWmpd
Now that Docker image building is called from Python, we can start to
do advanced stuff with it.
With this commit, we switch from building Docker images directly from
the source directory ("the Docker way") to using our custom Docker image
build contexts.
The main advantage of this is that locally-built Docker images can now
use our custom Dockerfile syntax to include extra files in the build
context!
The code for building a Docker image from a context has been extracted
to its own standalone function. I have nefarious plans for this in the
future, such as the ability to override the FROM syntax to specify
URLs of images. This would allow us to host base images on our own
server, which removes a dependency on Docker Hub and improves
determinism, since images on Docker Hub change all the time.
MozReview-Commit-ID: 5lTdV8yEHkc
build.sh had been reduced to invoking `docker`. We move that invocation
to Python and remove build.sh. Long live build.sh!
MozReview-Commit-ID: FQBDJv4HSaU
We already had code for resolving the image registry and tag. We
refactored it slightly to be more useful then changed build.sh to
accept the tag as an argument.
At this point, build.sh is basically a wrapper around `docker`. But
there's a special case for executing custom "build.sh" files we
need to eliminate first...
MozReview-Commit-ID: A9HVvxgCdG2
Now that we have a mach command and Python code for doing Docker image
building, we can start moving code from build.sh to Python.
We start with searching for and validating the `docker` binary works.
MozReview-Commit-ID: 2DCc3b8UyZ3
Docker image building will soon need to use Python in order to produce
the image build context archive.
As the first step towards this, we introduce a Python function that
calls out to build.sh. We also implement a mach command that calls it
so we can test the functionality.
I'm not keen about introducing a new mach command. I'd prefer to have
a sub-command instead. I'm not sure what all uses
`mach taskcluster-load-image`. Perhaps we could make a `taskcluster`
top-level command. Or perhaps we could fold these image commands into
`mach taskgraph`? Either way, the mach side of this isn't terribly
important to the commit series: most of the code will live inside a
Python module outside of mach.
MozReview-Commit-ID: AI8p6H4psNh
This restores order to only having a single hash for a context
directory.
Using a tempfile here is a bit unfortunate. It can be optimized later,
if needed.
MozReview-Commit-ID: LMNsvt3fDYx