diff options
Diffstat (limited to 'taskcluster/docs/transforms.rst')
-rw-r--r-- | taskcluster/docs/transforms.rst | 215 |
1 files changed, 215 insertions, 0 deletions
diff --git a/taskcluster/docs/transforms.rst b/taskcluster/docs/transforms.rst new file mode 100644 index 0000000000..8509dbd071 --- /dev/null +++ b/taskcluster/docs/transforms.rst @@ -0,0 +1,215 @@ +Transforms +========== + +Many task kinds generate tasks by a process of transforming job descriptions +into task definitions. The basic operation is simple, although the sequence of +transforms applied for a particular kind may not be! + +Overview +-------- + +To begin, a kind implementation generates a collection of items; see +:doc:`loading`. The items are simply Python dictionaries, and describe +"semantically" what the resulting task or tasks should do. + +The kind also defines a sequence of transformations. These are applied, in +order, to each item. Early transforms might apply default values or break +items up into smaller items (for example, chunking a test suite). Later +transforms rewrite the items entirely, with the final result being a task +definition. + +Transform Functions +................... + +Each transformation looks like this: + +.. code-block:: python + + @transforms.add + def transform_an_item(config, items): + """This transform ...""" # always a docstring! + for item in items: + # .. + yield item + +The ``config`` argument is a Python object containing useful configuration for +the kind, and is a subclass of +:class:`taskgraph.transforms.base.TransformConfig`, which specifies a few of +its attributes. Kinds may subclass and add additional attributes if necessary. + +While most transforms yield one item for each item consumed, this is not always +true: items that are not yielded are effectively filtered out. Yielding +multiple items for each consumed item implements item duplication; this is how +test chunking is accomplished, for example. + +The ``transforms`` object is an instance of +:class:`taskgraph.transforms.base.TransformSequence`, which serves as a simple +mechanism to combine a sequence of transforms into one. + +Schemas +....... + +The items used in transforms are validated against some simple schemas at +various points in the transformation process. These schemas accomplish two +things: they provide a place to add comments about the meaning of each field, +and they enforce that the fields are actually used in the documented fashion. + +Keyed By +........ + +Several fields in the input items can be "keyed by" another value in the item. +For example, a test description's chunks may be keyed by ``test-platform``. +In the item, this looks like: + +.. code-block:: yaml + + chunks: + by-test-platform: + linux64/debug: 12 + linux64/opt: 8 + android.*: 14 + default: 10 + +This is a simple but powerful way to encode business rules in the items +provided as input to the transforms, rather than expressing those rules in the +transforms themselves. If you are implementing a new business rule, prefer +this mode where possible. The structure is easily resolved to a single value +using :func:`taskgraph.transform.base.resolve_keyed_by`. + +Exact matches are used immediately. If no exact matches are found, each +alternative is treated as a regular expression, matched against the whole +value. Thus ``android.*`` would match ``android-api-16/debug``. If nothing +matches as a regular expression, but there is a ``default`` alternative, it is +used. Otherwise, an exception is raised and graph generation stops. + +Organization +------------- + +Task creation operates broadly in a few phases, with the interfaces of those +stages defined by schemas. The process begins with the raw data structures +parsed from the YAML files in the kind configuration. This data can processed +by kind-specific transforms resulting, for test jobs, in a "test description". +For non-test jobs, the next step is a "job description". These transformations +may also "duplicate" tasks, for example to implement chunking or several +variations of the same task. + +In any case, shared transforms then convert this into a "task description", +which the task-generation transforms then convert into a task definition +suitable for ``queue.createTask``. + +Test Descriptions +----------------- + +Test descriptions specify how to run a unittest or talos run. They aim to +describe this abstractly, although in many cases the unique nature of +invocation on different platforms leaves a lot of specific behavior in the test +description, divided by ``by-test-platform``. + +Test descriptions are validated to conform to the schema in +``taskcluster/taskgraph/transforms/tests.py``. This schema is extensively +documented and is a the primary reference for anyone modifying tests. + +The output of ``tests.py`` is a task description. Test dependencies are +produced in the form of a dictionary mapping dependency name to task label. + +Job Descriptions +---------------- + +A job description says what to run in the task. It is a combination of a +``run`` section and all of the fields from a task description. The run section +has a ``using`` property that defines how this task should be run; for example, +``mozharness`` to run a mozharness script, or ``mach`` to run a mach command. +The remainder of the run section is specific to the run-using implementation. + +The effect of a job description is to say "run this thing on this worker". The +job description must contain enough information about the worker to identify +the workerType and the implementation (docker-worker, generic-worker, etc.). +Alternatively, job descriptions can specify the ``platforms`` field in +conjunction with the ``by-platform`` key to specify multiple workerTypes and +implementations. Any other task-description information is passed along +verbatim, although it is augmented by the run-using implementation. + +The run-using implementations are all located in +``taskcluster/taskgraph/transforms/job``, along with the schemas for their +implementations. Those well-commented source files are the canonical +documentation for what constitutes a job description, and should be considered +part of the documentation. + +following ``run-using`` are available + + * ``hazard`` + * ``mach`` + * ``mozharness`` + * ``mozharness-test`` + * ``run-task`` + * ``spidermonkey`` or ``spidermonkey-package`` or ``spidermonkey-mozjs-crate`` or ``spidermonkey-rust-bindings`` + * ``debian-package`` + * ``toolchain-script`` + * ``always-optimized`` + * ``fetch-url`` + * ``python-test`` + + +Task Descriptions +----------------- + +Every kind needs to create tasks, and all of those tasks have some things in +common. They all run on one of a small set of worker implementations, each +with their own idiosyncrasies. And they all report to TreeHerder in a similar +way. + +The transforms in ``taskcluster/taskgraph/transforms/task.py`` implement +this common functionality. They expect a "task description", and produce a +task definition. The schema for a task description is defined at the top of +``task.py``, with copious comments. Go forth and read it now! + +In general, the task-description transforms handle functionality that is common +to all Gecko tasks. While the schema is the definitive reference, the +functionality includes: + +* TreeHerder metadata + +* Build index routes + +* Information about the projects on which this task should run + +* Optimizations + +* Defaults for ``expires-after`` and and ``deadline-after``, based on project + +* Worker configuration + +The parts of the task description that are specific to a worker implementation +are isolated in a ``task_description['worker']`` object which has an +``implementation`` property naming the worker implementation. Each worker +implementation has its own section of the schema describing the fields it +expects. Thus the transforms that produce a task description must be aware of +the worker implementation to be used, but need not be aware of the details of +its payload format. + +The ``task.py`` file also contains a dictionary mapping treeherder groups to +group names using an internal list of group names. Feel free to add additional +groups to this list as necessary. + +Signing Descriptions +-------------------- + +Signing kinds are passed a single dependent job (from its kind dependency) to act +on. + +The transforms in ``taskcluster/taskgraph/transforms/signing.py`` implement +this common functionality. They expect a "signing description", and produce a +task definition. The schema for a signing description is defined at the top of +``signing.py``, with copious comments. + +In particular you define a set of upstream artifact urls (that point at the +dependent task) and can optionally provide a dependent name (defaults to build) +for use in ``task-reference``/``artifact-reference``. You also need to provide +the signing formats to use. + +More Detail +----------- + +The source files provide lots of additional detail, both in the code itself and +in the comments and docstrings. For the next level of detail beyond this file, +consult the transform source under ``taskcluster/taskgraph/transforms``. |