Files
tubestation/taskcluster/gecko_taskgraph/util/perfile.py
Andrew Halberstadt 2f338261f4 Bug 1884364 - Create a new 'files_changed' parameter, r=taskgraph-reviewers,releng-reviewers,jcristau
We use hg.m.o's `json-automationrelevance` endpoint for a variety of reasons
such as getting the files changed for optimization purposes, or finding the
base revision for diff purposes. But this endpoint is slow and puts undue load
on hg.mozilla.org if queried too often.

The helper function that fetches this is memoized, so in theory we should only
ever make this request once per graph generation. However, there are still cases
where we request this unnecessarily:

1. When running `./mach taskgraph` locally, we first fetch
`json-automationrelevance` and then fall back to fetching it locally if the
revision wasn't found. I believe the reason for this is to be able to generate
identical graphs as produced by CI.

2. When specifying multiple parameters (so graphs are generated in parallel),
the memoize won't cache across processes, so we make the request once per
parameter set.

3. Any other time we generate tasks outside the context of a Decision task (e.g
`./mach try`), as there are transforms that call this function.

By turning `files_changed` into a parameter, we can ensure that this value gets
"frozen" by the Decision task and it will never need to be recomputed. E.g, you
could use `-p task-id=<decision id>` and you'd still get the `files_changed`
value that Decision task computed. This means, that for all non-Decision use
cases we can rely on local VCS to give us our changed files.

This should greatly cut back on the number of queries being made to `hg.m.o`.

Differential Revision: https://phabricator.services.mozilla.com/D204127
2024-03-19 14:13:54 +00:00

96 lines
3.3 KiB
Python

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
import itertools
import json
import logging
import math
import taskgraph
from mozbuild.util import memoize
from mozpack.path import match as mozpackmatch
logger = logging.getLogger(__name__)
@memoize
def perfile_number_of_chunks(is_try, try_task_config, files_changed, type):
changed_files = set(files_changed)
if taskgraph.fast and not is_try:
# When iterating on taskgraph changes, the exact number of chunks that
# test-verify runs usually isn't important, so skip it when going fast.
return 3
tests_per_chunk = 10.0
if type.startswith("test-coverage"):
tests_per_chunk = 30.0
if type.startswith("test-verify-wpt") or type.startswith("test-coverage-wpt"):
file_patterns = [
"testing/web-platform/tests/**",
"testing/web-platform/mozilla/tests/**",
]
elif type.startswith("test-verify-gpu") or type.startswith("test-coverage-gpu"):
file_patterns = [
"**/*webgl*/**/test_*",
"**/dom/canvas/**/test_*",
"**/gfx/tests/**/test_*",
"**/devtools/canvasdebugger/**/browser_*",
"**/reftest*/**",
]
elif type.startswith("test-verify") or type.startswith("test-coverage"):
file_patterns = [
"**/test_*",
"**/browser_*",
"**/crashtest*/**",
"js/src/tests/test/**",
"js/src/tests/non262/**",
"js/src/tests/test262/**",
]
else:
# Returning 0 means no tests to run, this captures non test-verify tasks
return 1
if try_task_config:
suite_to_paths = json.loads(try_task_config)
specified_files = itertools.chain.from_iterable(suite_to_paths.values())
changed_files.update(specified_files)
test_count = 0
for pattern in file_patterns:
for path in changed_files:
# TODO: consider running tests if a manifest changes
if path.endswith(".list") or path.endswith(".ini"):
continue
if path.endswith("^headers^"):
continue
if mozpackmatch(path, pattern):
gpu = False
if type == "test-verify-e10s" or type == "test-coverage-e10s":
# file_patterns for test-verify will pick up some gpu tests, lets ignore
# in the case of reftest, we will not have any in the regular case
gpu_dirs = [
"dom/canvas",
"gfx/tests",
"devtools/canvasdebugger",
"webgl",
]
for gdir in gpu_dirs:
if len(path.split(gdir)) > 1:
gpu = True
if not gpu:
test_count += 1
chunks = test_count / tests_per_chunk
chunks = int(math.ceil(chunks))
# Never return 0 chunks on try, so that per-file tests can be pushed to try with
# an explicit path, and also so "empty" runs can be checked on try.
if is_try and chunks == 0:
chunks = 1
return chunks