summaryrefslogtreecommitdiffstats
path: root/build/docs/sparse.rst
diff options
context:
space:
mode:
Diffstat (limited to 'build/docs/sparse.rst')
-rw-r--r--build/docs/sparse.rst157
1 files changed, 157 insertions, 0 deletions
diff --git a/build/docs/sparse.rst b/build/docs/sparse.rst
new file mode 100644
index 0000000000..6dcf548334
--- /dev/null
+++ b/build/docs/sparse.rst
@@ -0,0 +1,157 @@
+.. _build_sparse:
+
+================
+Sparse Checkouts
+================
+
+The Firefox repository is large: over 230,000 files. That many files
+can put a lot of strain on machines, tools, and processes.
+
+Some version control tools have the ability to only populate a
+working directory / checkout with a subset of files in the repository.
+This is called *sparse checkout*.
+
+Various tools in the Firefox repository are configured to work
+when a sparse checkout is being used.
+
+Sparse Checkouts in Mercurial
+=============================
+
+Mercurial 4.3 introduced **experimental** support for sparse checkouts
+in the official distribution (a Facebook-authored extension has
+implemented the feature as a 3rd party extension for years).
+
+To enable sparse checkout support in Mercurial, enable the ``sparse``
+extension::
+
+ [extensions]
+ sparse =
+
+The *sparseness* of the working directory is managed using
+``hg debugsparse``. Run ``hg help debugsparse`` and ``hg help -e sparse``
+for more info on the feature.
+
+When a *sparse config* is enabled, the working directory only contains
+files matching that config. You cannot ``hg add`` or ``hg remove`` files
+outside the *sparse config*.
+
+.. warning::
+
+ Sparse support in Mercurial 4.3 does not have any backwards
+ compatibility guarantees. Expect things to change. Scripting against
+ commands or relying on behavior is strongly discouraged.
+
+In-Tree Sparse Profiles
+=======================
+
+Mercurial supports defining the sparse config using files under version
+control. These are called *sparse profiles*.
+
+Essentially, the sparse profiles are managed just like any other file in
+the repository. When you ``hg update``, the sparse configuration is
+evaluated against the sparse profile at the revision being updated to.
+From an end-user perspective, you just need to *activate* a profile once
+and files will be added or removed as appropriate whenever the versioned
+profile file updates.
+
+In the Firefox repository, the ``build/sparse-profiles`` directory
+contains Mercurial *sparse profiles* files.
+
+Each *sparse profile* essentially defines a list of file patterns
+(see ``hg help patterns``) to include or exclude. See
+``hg help -e sparse`` for more.
+
+Mach Support for Sparse Checkouts
+=================================
+
+``mach`` detects when a sparse checkout is being used and its
+behavior may vary to accommodate this.
+
+By default it is a fatal error if ``mach`` can't load one of the
+``mach_commands.py`` files it was told to. But if a sparse checkout
+is being used, ``mach`` assumes that file isn't part of the sparse
+checkout and to ignore missing file errors. This means that
+running ``mach`` inside a sparse checkout will only have access
+to the commands defined in files in the sparse checkout.
+
+Sparse Checkouts in Automation
+==============================
+
+``hg robustcheckout`` (the extension/command used to perform clones
+and working directory operations in automation) supports sparse checkout.
+However, it has a number of limitations over Mercurial's default sparse
+checkout implementation:
+
+* Only supports 1 profile at a time
+* Does not support non-profile sparse configs
+* Does not allow transitioning from a non-sparse to sparse checkout or
+ vice-versa
+
+These restrictions ensure that any sparse working directory populated by
+``hg robustcheckout`` is as consistent and robust as possible.
+
+``run-task`` (the low-level script for *bootstrapping* tasks in
+automation) has support for sparse checkouts.
+
+TaskGraph tasks using ``run-task`` can specify a ``sparse-profile``
+attribute in YAML (or in code) to denote the sparse profile file to
+use. e.g.::
+
+ run:
+ using: run-command
+ command: <command>
+ sparse-profile: taskgraph
+
+This automagically results in ``run-task`` and ``hg robustcheckout``
+using the sparse profile defined in ``build/sparse-profiles/<value>``.
+
+Pros and Cons of Sparse Checkouts
+=================================
+
+The benefits of sparse checkout are that it makes the repository appear
+to be smaller. This means:
+
+* Less time performing working directory operations -> faster version
+ control operations
+* Fewer files to consult -> faster operations
+* Working directories only contain what is needed -> easier to understand
+ what everything does
+
+Fewer files in the working directory also contributes to disadvantages:
+
+* Searching may not yield hits because a file isn't in the sparse
+ checkout. e.g. a *global* search and replace may not actually be
+ *global* after all.
+* Tools performing filesystem walking or path globbing (e.g.
+ ``**/*.js``) may fail to find files because they don't exist.
+* Various tools and processes make assumptions that all files in the
+ repository are always available.
+
+There can also be problems caused by mixing sparse and non-sparse
+checkouts. For example, if a process in automation is using sparse
+and a local developer is not using sparse, things may work for the
+local developer but fail in automation (because a file isn't included
+in the sparse configuration and not available to automation.
+Furthermore, if environments aren't using exactly the same sparse
+configuration, differences can contribute to varying behavior.
+
+When Should Sparse Checkouts Be Used?
+=====================================
+
+Developers are discouraged from using sparse checkouts for local work
+until tools for handling sparse checkouts have improved. In particular,
+Mercurial's support for sparse is still experimental and various Firefox
+tools make assumptions that all files are available. Developers should
+use sparse checkout at their own risk.
+
+The use of sparse checkouts in automation is a performance versus
+robustness trade-off. Use of sparse checkouts will make automation
+faster because machines will only have to manage a few thousand files
+in a checkout instead of a few hundred thousand. This can potentially
+translate to minutes saved per machine day. At the scale of thousands
+of machines, the savings can be significant. But adopting sparse
+checkouts will open up new avenues for failures. (See section above.)
+If a process is isolated (in terms of file access) and well-understood,
+sparse checkout can likely be leveraged with little risk. But if a
+process is doing things like walking the filesystem and performing
+lots of wildcard matching, the dangers are higher.