This creates a new "job-from" field that contains the relative filename the job was defined
in. The filename is relative to 'config.path'. If the task came from the 'jobs' key defined
in kind.yml, this field will be set to 'kind.yml'.
MozReview-Commit-ID: 9e1tEb6XuZT
Clean up and standardize Treeherder symbols for Talos and AWSY tasks:
* Stylo disabled groups include `sd`
* Stylo sequential groups include `ss`
MozReview-Commit-ID: 7cl6e0XvXNO
Convert all jobs that were exercising Stylo enabled to Stylo disabled instead.
Stylo enabled is now handled by the default jobs.
In Perfherder, Stylo enabled jobs will be untagged and take over the existing
Gecko series. Stylo disabled jobs will have a new `stylo-disabled` tag and
create a new series.
MozReview-Commit-ID: BMXBRg3A95j
This adds some new optimization strategies. For tests, we use Either(SETA,
SkipUnlessSchedules), thereby giving both mechanisms a chance to skip tasks. On
try, SETA is omitted.
MozReview-Commit-ID: GL4tlwyeBa6
It is not at *all* clear how multiple optimizations for a single task should
interact. No simple logical operation is right in all cases, and in fact in
most imaginable cases the desired behavior turns out to be independent of all
but one of the optimizations. For example, given both `seta` and
`skip-unless-files-changed` optimizations, if SETA says to skip a test, it is
low value and should be skipped regardless of what files have changed. But if
SETA says to run a test, then it has likely been skipped in previous pushes, so
it should be run regardless of what has changed in this push.
This also adds a bit more output about optimization, that may be useful for
anyone wondering why a particular job didn't run.
MozReview-Commit-ID: 3OsvRnWjai4
`run-task` is taught a --sparse-profile argument to be passed down
to `hg robustcheckout` for the main source checkout. It does what
you expect: performs a sparse checkout using the named profile.
The Taskgraph YAML for run-task is taught a "sparse-profile"
property to define the sparse profile. When defined, --sparse-profile
will be passed down to `run-task` and the cache name will be updated
to reflect the use of sparse checkout.
Our cache checking transform is updated to audit for the use of
--sparse-profile without the corresponding "-sparse" cache name
variation.
The reason we need a distinct cache name for sparse is because
clients that aren't sparse aware will be unable to read checkouts
that are sparse. By forcing sparse and non-sparse into different
cache pools, we avoid compatibility issues.
In the ideal world, we probably support sparse profiles on all the
VCS checkouts that `run-task` supports (e.g. --tools-checkout).
Perfect is the enemy of done. All of this is defined in-tree and
it is easy enough to change atomically.
MozReview-Commit-ID: 79k7Vul0hHO
The UID and GID that a task executes under is dynamic. As a result,
caches need to be aware of the UID and GID that owns files otherwise
subsequent tasks could run into permission denied errors. This is
why `run-task --chown-recursive` exists. By recursively changing
ownership of persisted files, we ensure the current task is able
to read and write all existing files.
When you take a step back, you realize that chowning of cached
files is an expensive workaround. Yes, this results in cache hits.
But the cost is you potentially have to perform hundreds of thousands
of I/O system calls to mass chown. The ideal situation is that
UID/GID is consistent across tasks on any given cache and
potentially expensive permissions setting can be avoided. So, that's
what this commit does.
We add the task's UID and GID to run-task's requirements. When we
first see a cache, we record a UID and GID with it and chown the
empty cache directory to that UID and GID. Subsequent tasks using
this cache *must* use the same UID and GID or else run-task will
fail.
Since run-task now guarantees that all cache consumers use the same
UID and GID, we can avoid a potentially expensive recursive chown.
But there is an exception. In untrusted environments (namely Try),
we recursively chown existing caches if there is a uid/gid mismatch.
We do this because Try is a sandbox and any random task could
experiment with a non-standard uid/gid. That populated cache would
"poison" the cache for the next caller. Or vice-versa. It would be
annoying if caches were randomly poisoned due to Try pushes that
didn't realize there was a UID/GID mismatch. We could outlaw "bad"
UID and GIDs. But that makes the barrier to testing things on Try
harder. So, we go with the flow and recursively chown caches in
this scenario.
This change will shine light on all tasks using inconsistent UID
and GID values on the same cache. Bustage is anticipated.
Unfortunately, we can't easily know what will break. So it will be
one of those things where we will have to fix problems as they arise.
Fortunately, because caches are now tied to the content of run-task,
we only need to back out this change and tasks should revert to caches
without UID and GID pinning requirements and everything will work
again.
MozReview-Commit-ID: 2ka4rOnnXIp
We recently introduced support for telling run-task about caches so
it could sanitize them automatically. We also recently taught
docker-worker and docker-engine how to declare volumes.
Building on that work, we now pass a list of paths corresponding
to Docker volumes to run-task.
run-task now verifies volumes behave as expected. Unless the volume
paths correspond to caches, run-task verifies they are empty and chowns
them to an appropriate owner.
Requiring empty volumes is an arbitrary decision. But as the inline
comment says, it keeps things simpler and makes caches and volumes
behave more like each other.
MozReview-Commit-ID: 5lm2uIitrS3
See the inline comment for the rationale here.
This check may not catch all volumes and caches. But after subsequent
commits refactor how permissions for caches and volumes are handled,
this edge case will likely result in permissions errors in the task,
so it isn't worth worrying about.
Several Dockerfile have been updated to add missing VOLUME so the check
passes.
In the case of desktop1604-test, we stopped removing
/home/worker/.cache because you can't remove a mount point, which is
what volumes are inside Docker containers.
MozReview-Commit-ID: GEyNkkX00kN
Docker volumes are host-mounted filesystems. We typically mount
caches at their location. But not always. The reason we define
VOLUME in Dockerfiles is we're guaranteed to get a fast host
filesystem instead of AUFS when a cache isn't mounted.
In this commit, we teach the docker-worker payload builder about
the existence of Docker volumes. Docker volumes can be declared
inline in the YAML. More conveniently, we automatically parse out
VOLUME lines from corresponding in-tree Dockerfile.
We'll do useful things with this data in subsequent commits.
MozReview-Commit-ID: BNxp8EDEYw
Previously, we conditionally added caches to a task if the current
parameters warranted it.
In order to audit that all caches fulfill basic requirements, we need
to have unconditional knowledge of all caches.
This commit introduces an optional key on each cache entry stating
whether it should be skipped in "untrusted" environments. When we
convert a task definition to a worker payload, we filter out these
caches if necessary.
This change uncovered an inconsistency with filtering caches. In
one location we filtered on the source repo name. In others, we
filtered on the SCM level.
Setting the caches in the spidermonkey kind also changed slightly
to ensure we're not overwriting existing caches. I don't think this
has any behavior changes. But the new method is more correct.
MozReview-Commit-ID: 1crpdWHqQ68
run-task just grew features to aid with cache validation.
Attempts by run-task to use caches not under its control will fail.
So, we add a transform that audits for and ensures that certain
caches are only being used with run-task. This will help catch
stragglers attempting to use e.g. the legacy VCS checkouts or
tooltool caches without run-task. Fortunately, there are no
violations for this policy. Yay!
MozReview-Commit-ID: LBCmDUdgcuM
Today, cache names are mostly static and are brittle as a result.
In theory, when a backwards incompatible change is performed on
something that touches a cache, the cache name needs to be changed
to ensure tasks running the old code don't see cached data from the
new task. (Alternatively, all code is forward compatible, but that is
hard to implement in practice.)
For many things, the process works as planned. However, not everyone
knows that cache names need changed. And, it isn't always obvious
that some things require fresh caches. When mistakes are made, tasks
break intermittently due to cache wonkiness.
One area where we get into trouble is with UID and GID mismatch.
Task A will use a Docker image where our standard "worker" user/group
is UID/GID 1000:1000. Then Task B will use UID/GID 500:500. (This is
common when mixing Debian and RedHel based distros.) If they use the
same cache, then Task B needs to chown/chmod all files in the cache
or there could be a permissions problem. This is exactly why
run-task recursively chowns certain paths before dropping root
privileges.
Permissions setting in run-task solves permissions problems. But
it doesn't solve content incompatibility problems. For that, you
need to change cache names, not use caches, or blow away content
when incompatibilities are detected.
This commit starts the process of adding a little bit more coherence
to our caching story.
There are two main features in this commit:
1) Cache names tied to run-task content
2) Cache validation in run-task
Taskgraph now detects when a task is using caches with run-task. When
caches and run-task are both being used, the cache name is adjusted to
contain a hash of run-task's content. When run-task changes, the cache
name changes. So, changing run-task ensures that all caches from that point
forward are "clean." This frees run-task and any functionality related
to run-task (such as maintaining version control checkouts) from
having to maintain backwards or forwards compatibility with any other
version of run-task. This does mean that any changes to run-task
effectively wipe out caches. But changes to run-task tend to be
seldom, so this should be acceptable.
The second part of this change is code in run-task to record per-cache
properties and validate whether a populated cache is appropriate for
use. To enable this, taskgraph passes a list of cache paths via an
environment variable. For each cache path, run-task looks for a
well-defined file containing a list of "requirements." Right now,
that list is simply a version string. But other features will be
worked into it. If the cache is empty, we simply write out a new
requirements file and are done. If the file exists, we compare
requirements and fail fast if there is a mismatch. If the cache
has content but not this special file, then we abort (because this
should never happen).
The "requirements" validation isn't very useful now because the only
entry comes from run-task's source code and modifying run-task will
change the hash and cause a new cache to be used. The implementation
at this point is more demonstrating the concept than doing anything
terribly useful with it.
MozReview-Commit-ID: HtpXIc7OD1k
The upload now uses MOZ_SCM_LEVEL to determine which secret and bucket to
upload to, so it can potentially run at any level.
This also modifies task descriptions to allow {level} in scopes, and updates
try syntax to allow `-j doc-upload` even though run-on-tasks says it doesn't
run on try by default.
MozReview-Commit-ID: Dm27TGPa7IM
The toolchain jobs produce artifacts that are going to be used by other
jobs, but there is no reliable way for the decision task to know the
name of those artifacts. So we make their definition required in the
toolchain job definitions.
MozReview-Commit-ID: 9tPAMBAkvCs
Added config via tests.yml, test-sets.yml
Added remove_installer to config for linux.
Added blank for windows as that will come later.
MozReview-Commit-ID: 9tPAMBAkvCs
Added config via tests.yml, test-sets.yml
Added remove_installer to config for linux.
Added blank for windows as that will come later.
There are a few places where we walk commit ancestry looking for things
attached to a specific revision. Because the repository name is attached
to the index path and because a revision can exist in multiple
repositories, we often have to perform N index lookups to find a result
for a specific revision. This is inefficient.
To facilitate faster index lookups by revision, we introduce a new route
that doesn't contain the repository name. In theory, we should be able
to do this globally - for all repos. However, the configuration of
tasks can vary significantly by repo. So e.g. a linux64 build on
"central" is sufficiently different from a linux64 build on "beta" or
"release." For that reason, this commit takes the conservative
approach and only defines a shared route for repositories with a similar
configuration: the "trunk" repositories.
MozReview-Commit-ID: 8rIgUbzW4eL
There are a few places where we walk commit ancestry looking for things
attached to a specific revision. Because the repository name is attached
to the index path and because a revision can exist in multiple
repositories, we often have to perform N index lookups to find a result
for a specific revision. This is inefficient.
To facilitate faster index lookups by revision, we introduce a new route
that doesn't contain the repository name. In theory, we should be able
to do this globally - for all repos. However, the configuration of
tasks can vary significantly by repo. So e.g. a linux64 build on
"central" is sufficiently different from a linux64 build on "beta" or
"release." For that reason, this commit takes the conservative
approach and only defines a shared route for repositories with a similar
configuration: the "trunk" repositories.
MozReview-Commit-ID: 8rIgUbzW4eL
365731510976 (bug 1380391) added index routes for decision tasks by
pushlog id. This is a good idea. The pushlog id is guaranteed to always
be incrementing (except for repos that are periodically reset, which
we don't care about). It is useful to provide strict ordering for
pushes and is simpler for machines to consume and sort than dates.
So let's index all tasks by pushlog id.
MozReview-Commit-ID: BPqx4ARza1c
This is needed before we can upgrade to flake8 3.3.0, as that version starts flagging these errors.
These files were modified by running:
autopep8 --select E305 --in-place -r <dir>
on the affected directories. I did it one dir at a time and verified the result after each.
MozReview-Commit-ID: FmlsfiKIbtr
To date we have variously specified both worker-type and worker-implementation,
often manually coordinated. We also embedded a few awkward assumptions such as
that the native engine only runs on OS X.
But a worker type has one and only one implementation, and that implementation
is stable over time (as changing it would require simultaneous landings on all
trees).
Instead, this change makes worker-type the primary configuration, and derives
both a worker implementation (defining the payload format) and worker OS
(determining what to include in the payload) from that value. The derivation
occurs when deciding how to implement a particular job, where the run_using
functions are distinguished by worker implementation.
The two-part logic to determine how and where to run a test task based on its
platform is combined into a single transform, `set_worker_type`.
This contains some other related changes:
- MOZ_AUTOMATION is set in specific jobs, rather than everywhere docker-worker
is used
- the URL to test packages is factored out into a shared function
- docker-worker test defaults are applied in `mozharness_test.py`
- the WORKER_TYPE array in `task.py`, formerly mixing two types of keys, is
split
- the 'invalid' workerType is assigned an 'invalid' implementation
- all tasks that do not use job descriptions but use docker-worker, etc. have
`worker.os` added
Tested to not produce a substantially different taskgraph for a regular push, a
try push, or a nightly cron.
MozReview-Commit-ID: LDHrmrpBo7I
Tasks should be assigned a priority based on the branch they originated from. It
is important that certain branches receive preferential treatment, such as a release
branch task being executed before a task from Try. Branch priority mirrors
the priorities defined within buildbot.
MozReview-Commit-ID: 8qR9F34lzzc
Tasks should be assigned a priority based on the branch they originated from. It
is important that certain branches receive preferential treatment, such as a release
branch task being executed before a task from Try. Branch priority mirrors
the priorities defined within buildbot.
MozReview-Commit-ID: 8qR9F34lzzc