summaryrefslogtreecommitdiffstats
path: root/docs/IMPLEMENTATION-DECISIONS.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/IMPLEMENTATION-DECISIONS.md')
-rw-r--r--docs/IMPLEMENTATION-DECISIONS.md206
1 files changed, 206 insertions, 0 deletions
diff --git a/docs/IMPLEMENTATION-DECISIONS.md b/docs/IMPLEMENTATION-DECISIONS.md
new file mode 100644
index 0000000..a5fad33
--- /dev/null
+++ b/docs/IMPLEMENTATION-DECISIONS.md
@@ -0,0 +1,206 @@
+# Implementation decisions for debputy
+
+This document logs important decisions taken during the design of `debputy` along with the
+rationale and alternatives considered at the time. This tech note collects decisions, analysis,
+and trade-offs made in the implementation of the system that may be of future interest. It also
+collects a list of intended future work. The historical background here may be useful for
+understanding design and implementation decisions, but would clutter other documents and distract
+from the details of the system as implemented.
+
+## Border between "installation" and "transformation"
+
+In `debputy`, a contributor can request certain actions to be performed such as `install foo into pkg`
+or `ensure bar in pkg is a symlink to baz`. While the former is clearly an installation rule, is the
+latter an installation rule or a transformation rule?
+
+Answering this was important to ensure that actions were placed where people would expect them or would
+find it logical to look for them. This is complicated by the fact that `install` (the command line tool)
+can perform mode and ownership transformation, create directories, whereas `dh_install` deals only with
+installing (copying) paths into packages and mode/ownership changes is related to a separate helper.
+
+The considered options were:
+
+### Install upstream bits and then apply packaging modification (chosen)
+
+In this line of thinking, the logic conceptually boils down to the following rule of thumb:
+
+ > If a path does not come from upstream, then it is a transform.
+
+Expanding a bit, anything that would install paths from upstream (usually `debian/tmp/...`) into a
+package is considered part of `installation`. If further mutations are needed (such as, `create an
+empty dir at X as integration point`), they are transformations.
+
+All path metadata modifications (owner, group or mode) are considered transformations. Even in the
+case, where the transformation is "disabling" a built-in normalization. The logic here is that the
+packager's transform rule is undoing a built-in transformation rule.
+
+This option was chosen because it fit the perceived idea of how a packager views their own work
+per the following 4-step list:
+
+ 1. Do any upstream build required.
+ 2. Install files to build the initial trees for each Debian package.
+ 3. Transform those trees for any additional fixes required.
+ 4. Turn those trees into debs.
+
+Note: The `debhelper` stack has all transformations (according to this definition) under its
+installation phase as defined by `dh`'s `install` target. Concretely, the `dh install` target covers
+`dh_installdirs`, `dh_link` and `dh_fixperms`. However, it is less important what `debhelper` is
+doing as long as the definition is simple and not counter-intuitive to packagers.
+
+### Define the structural and then apply non-structural modifications
+
+Another proposal was to see the `file layout` phase as anything that did structural changes to the
+content of the package. By the end of the `file layout` phase, all paths would be present where
+they were expected. So any mutation by the packager that changed the deb structurally would be a
+part of the `file layout` phase.
+
+Note file compression (and therefore renaming of files) could occur after `file layout` when this
+model was discussed.
+
+The primary advantage was that it works without having an upstream build system. However, even
+native packages tend to have an "upstream-like" build system, so it is not as much of an advantage
+in practice.
+
+Note this definition is not a 1:1 match with debhelper either. As an example, file mode
+modification would be a transformation in this definition, whereas `debhelper` has it under
+`dh install`.
+
+## Stateless vs. Stateful installation rules
+
+A key concept in packaging is to "install" paths provided by upstream's build system into one or
+more packages. In source packages producing multiple binary packages, the packager will want to
+divide the content across multiple packages and `debputy` should facilitate this in the best
+possible fashion.
+
+There were two "schools of thought" considered here, which is easiest to illustrate with the
+following example:
+
+ Assume that the upstream build system provides 4 programs in `debian/tmp/usr/bin`. One of
+ these (`foo`) would have to be installed into the package `pkg` and the other would be more
+ special purpose and go into `foo-utils`.
+
+
+For a "stateless" ruleset, the packager would have to specify the request as:
+
+ * install `usr/bin/foo` into `pkg`
+ * install `usr/bin/*` except `usr/bin/foo` into `pkg-utils`
+
+Whereas with a "stateful" ruleset, the packager would have to specify the request as:
+
+ 1. install `usr/bin/foo` into `pkg`
+ 2. install `usr/bin/*` into `pkg-utils`
+ - Could be read as "install everything remaining in `usr/bin` into `pkg-utils`".
+
+
+### Stateful installation rules (chosen)
+
+The chosen model ended up being "stateful" patterns.
+
+Pros:
+
+ 1. Stateful rules provides a "natural" way to say "install FILE1 in DIR into A,
+ FILE2 from DIR into B, and the rest of DIR into C" without having to accumulating
+ "excludes" or replacing a glob with "subglobs" to avoid double matching.
+
+ 2. There is a "natural" way to define that something should *not* be installed
+ via the `discard` rule, which interfaces nicely with the `dh_missing`-like
+ behaviour (detecting things that might have been overlooked).
+
+ 3. Avoids the complexity of having a glob expansion with a "per-rule" `exclude`,
+ where the `exclude` itself contains globs (`usr/lib/*` except `usr/bin/*.la`).
+
+Cons:
+ 1. Stateful parsing requires `debputy` to track what has already been matched.
+ 2. Rules cannot be interpreted in isolation nor out of order.
+ 3. Naming does not (always) imply the "destructiveness" or state of the action.
+ - The `install` term is commonly understood to have `copy` semantics rather
+ than `move` semantics.
+ 4. It is a step away from default `debhelper` mechanics and might cause
+ surprises for people assuming `debhelper` semantics.
+
+
+The 1st con would have applied anyway, as to avoid accidental RC bugs the
+contributor is required to explicitly list multiple packages for any install
+rule that would install the same path into two distinct packages or to provide
+`dh_missing` functionality. Therefore, the tracking would have existed in
+some form regardless.
+
+The 2nd con can be mitigated by leveraging the tracking to report if the
+rules appear to run in opposite order.
+
+The 3rd con is partly mitigated by using `discard` rather than `exclude` (which
+was the original name). Additionally, the mitigation for the 2nd con generally
+covers the most common cases as well. The only "surprising" case if you have
+one tool path you want installed into two packages at the same time, where you
+use two matches and the second one is a glob. However, the use-case is rare and
+was considered an acceptable risk given its probability.
+
+The 4th con is less of a problem when migrating from `debhelper` to `debputy`.
+Any `debhelper` based package will not have (unintentional) overlapping matches
+causing file conflicts. There might be some benign double matching that the
+packager will have to clean up post migration, because `debhelper` is more
+forgiving. Migration from `debputy` to `debhelper` might be more difficult but
+not a goal for `debputy`, so it was not considered relevant.
+
+Prior art: `dh-exec` supports a similar feature via `=> usr/bin/foo`.
+
+
+### Stateless installation rules
+
+Pros:
+
+ 1. It matches the default helper, so it requires less cognitive effort for
+ people migrating.
+
+ 2. The `install` term would effectively have `copy` semantics.
+
+ 3. In theory, `debputy` could do with simpler tracking mechanics.
+ - In practice, the tracked used for the error reporting required was 80%
+ of the complexity. This severely limits any practical benefit.
+
+
+Cons:
+
+ 1. No obvious way to deliberately ignore content that are not of a glob + exclude.
+ - While the `usr/bin/* except <matches>` could work, the default is "new appearances"
+ gets installed rather than aborting the built with a "there is a new tool for you
+ to consider". Alternatives such as including a stand-alone `exclude` or `discard`
+ rule would imply stateful parsing, but would not actually be stateful for `install`
+ and therefore being a potential source of confusion. Therefore, such a feature
+ would have to require a separate configuration next to installations.
+
+ 2. Install rules with globs would have to accumulate excludes or degenerate to the "magic
+ sub-matching globs" to avoid overlaps. The latter is the pattern supported by debhelper.
+
+# Plugin integration
+
+Looking at `debhelper`, one of its major sources of success is that "anyone" could extend it
+to solve their specific need and that it was easy to do so. When looking at the debhelper
+extensions, it seems a vast majority of Debian packages do the debhelper extension "on the
+side" (such as a bundle it inside an existing `-dev` package). Having package dedicated
+to the debhelper tooling does happen but seems to be very rare.
+
+With this in mind, using python's `entry_points` API was ruled out. It would require packagers
+to do a Python project inside their existing package with double build-systems, which basically
+no existing package helper does well (CDBS a possible exception but CDBS is generally frowned
+upon by the general Debian contributor population).
+
+Instead, a "drop a .json file here" approach was chosen instead to get a more "light-weight"
+integration up and running. When designing it, the following things were important:
+
+ * It should be possible to extract the metadata of the plugin *without* running any code from it
+ as running code could "taint" the process and break "list all plugins" features.
+ (This ruled out loading python code directly)
+
+ * Simple features would ideally not require code at all. Packager provided files as an example
+ can basically be done as a configuration rather than code. This means that `debputy` can provide
+ automated plugin upgrades from the current format to a future one if needed be.
+
+ * Being able to re-use the declarative parser to handle the error messages and data normalization
+ (this implies `JSON`, `YAML` or similar formats that is easily parsed in to mappings and lists).
+
+ * It is important that there is a plugin API compat level that enables us to change the format or
+ API between `debputy` and the plugins if we learn that the current API is inadequate.
+
+At the time of writing, the plugin integration is still in development. What is important can change
+as we get actual users.