docs/IMPLEMENTATION-DECISIONS.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230

# Implementation decisions for debputy

This document logs important decisions taken during the design of `debputy` along with the
rationale and alternatives considered at the time.  This tech note collects decisions, analysis,
and trade-offs made in the implementation of the system that may be of future interest. It also
collects a list of intended future work. The historical background here may be useful for
understanding design and implementation decisions, but would clutter other documents and distract
from the details of the system as implemented.

## Border between "installation" and "transformation"

In `debputy`, a contributor can request certain actions to be performed such as `install foo into pkg`
or `ensure bar in pkg is a symlink to baz`.  While the former is clearly an installation rule, is the
latter an installation rule or a transformation rule?

Answering this was important to ensure that actions were placed where people would expect them or would
find it logical to look for them.  This is complicated by the fact that `install` (the command line tool)
can perform mode and ownership transformation, create directories, whereas `dh_install` deals only with
installing (copying) paths into packages and mode/ownership changes is related to a separate helper.

The considered options were:

### Install upstream bits and then apply packaging modification (chosen)

In this line of thinking, the logic conceptually boils down to the following rule of thumb:

  > If a path does not come from upstream, then it is a transform.

Expanding a bit, anything that would install paths from upstream (usually `debian/tmp/...`) into a
package is considered part of `installation`.  If further mutations are needed (such as, `create an
empty dir at X as integration point`), they are transformations.

All path metadata modifications (owner, group or mode) are considered transformations.  Even in the
case, where the transformation is "disabling" a built-in normalization.  The logic here is that the
packager's transform rule is undoing a built-in transformation rule.

This option was chosen because it fit the perceived idea of how a packager views their own work
per the following 4-step list:

   1. Do any upstream build required.
   2. Install files to build the initial trees for each Debian package.
   3. Transform those trees for any additional fixes required.
   4. Turn those trees into debs.

Note: The `debhelper` stack has all transformations (according to this definition) under its
installation phase as defined by `dh`'s `install` target. Concretely, the `dh install` target covers
`dh_installdirs`, `dh_link` and `dh_fixperms`. However, it is less important what `debhelper` is
doing as long as the definition is simple and not counter-intuitive to packagers.

### Define the structural and then apply non-structural modifications

Another proposal was to see the `file layout` phase as anything that did structural changes to the
content of the package.  By the end of the `file layout` phase, all paths would be present where
they were expected. So any mutation by the packager that changed the deb structurally would be a
part of the `file layout` phase.

Note file compression (and therefore renaming of files) could occur after `file layout` when this
model was discussed.

The primary advantage was that it works without having an upstream build system. However, even
native packages tend to have an "upstream-like" build system, so it is not as much of an advantage
in practice.

Note this definition is not a 1:1 match with debhelper either.  As an example, file mode
modification would be a transformation in this definition, whereas `debhelper` has it under
`dh install`.

## Stateless vs. Stateful installation rules

A key concept in packaging is to "install" paths provided by upstream's build system into one or
more packages.  In source packages producing multiple binary packages, the packager will want to
divide the content across multiple packages and `debputy` should facilitate this in the best
possible fashion.

There were two "schools of thought" considered here, which is easiest to illustrate with the
following example:

  Assume that the upstream build system provides 4 programs in `debian/tmp/usr/bin`.  One of
  these (`foo`) would have to be installed into the package `pkg` and the other would be more
  special purpose and go into `foo-utils`.


For a "stateless" ruleset, the packager would have to specify the request as:

   * install `usr/bin/foo` into `pkg`
   * install `usr/bin/*` except `usr/bin/foo` into `pkg-utils`

Whereas with a "stateful" ruleset, the packager would have to specify the request as:

   1. install `usr/bin/foo` into `pkg`
   2. install `usr/bin/*` into `pkg-utils`
      - Could be read as "install everything remaining in `usr/bin` into `pkg-utils`".


### Stateful installation rules (chosen)

The chosen model ended up being "stateful" patterns.

Pros:

 1. Stateful rules provides a "natural" way to say "install FILE1 in DIR into A,
    FILE2 from DIR into B, and the rest of DIR into C" without having to accumulating
    "excludes" or replacing a glob with "subglobs" to avoid double matching.

 2. There is a "natural" way to define that something should *not* be installed
    via the `discard` rule, which interfaces nicely with the `dh_missing`-like
    behaviour (detecting things that might have been overlooked).

 3. Avoids the complexity of having a glob expansion with a "per-rule" `exclude`,
    where the `exclude` itself contains globs (`usr/lib/*` except `usr/bin/*.la`).

Cons:
 1. Stateful parsing requires `debputy` to track what has already been matched.
 2. Rules cannot be interpreted in isolation nor out of order.
 3. Naming does not (always) imply the "destructiveness" or state of the action.
    - The `install` term is commonly understood to have `copy` semantics rather
      than `move` semantics.
 4. It is a step away from default `debhelper` mechanics and might cause
    surprises for people assuming `debhelper` semantics.


The 1st con would have applied anyway, as to avoid accidental RC bugs the
contributor is required to explicitly list multiple packages for any install
rule that would install the same path into two distinct packages or to provide
`dh_missing` functionality.  Therefore, the tracking would have existed in
some form regardless.

The 2nd con can be mitigated by leveraging the tracking to report if the
rules appear to run in opposite order.

The 3rd con is partly mitigated by using `discard` rather than `exclude` (which
was the original name). Additionally, the mitigation for the 2nd con generally
covers the most common cases as well. The only "surprising" case if you have
one tool path you want installed into two packages at the same time, where you
use two matches and the second one is a glob. However, the use-case is rare and
was considered an acceptable risk given its probability.

The 4th con is less of a problem when migrating from `debhelper` to `debputy`.
Any `debhelper` based package will not have (unintentional) overlapping matches
causing file conflicts.  There might be some benign double matching that the
packager will have to clean up post migration, because `debhelper` is more
forgiving. Migration from `debputy` to `debhelper` might be more difficult but
not a goal for `debputy`, so it was not considered relevant.

Prior art: `dh-exec` supports a similar feature via `=> usr/bin/foo`.


### Stateless installation rules

Pros:

  1. It matches the default helper, so it requires less cognitive effort for
     people migrating.

  2. The `install` term would effectively have `copy` semantics.

  3. In theory, `debputy` could do with simpler tracking mechanics.
     - In practice, the tracked used for the error reporting required was 80%
       of the complexity. This severely limits any practical benefit.


Cons:

 1. No obvious way to deliberately ignore content that are not of a glob + exclude.
    - While the `usr/bin/* except <matches>` could work, the default is "new appearances"
      gets installed rather than aborting the built with a "there is a new tool for you
      to consider".  Alternatives such as including a stand-alone `exclude` or `discard`
      rule would imply stateful parsing, but would not actually be stateful for `install`
      and therefore being a potential source of confusion.  Therefore, such a feature
      would have to require a separate configuration next to installations.

 2. Install rules with globs would have to accumulate excludes or degenerate to the "magic
    sub-matching globs" to avoid overlaps. The latter is the pattern supported by debhelper.

# Plugin integration

Looking at `debhelper`, one of its major sources of success is that "anyone" could extend it
to solve their specific need and that it was easy to do so.  When looking at the debhelper
extensions, it seems a vast majority of Debian packages do the debhelper extension "on the
side" (such as a bundle it inside an existing `-dev` package).  Having package dedicated
to the debhelper tooling does happen but seems to be very rare.

With this in mind, using python's `entry_points` API was ruled out. It would require packagers
to do a Python project inside their existing package with double build-systems, which basically
no existing package helper does well (CDBS a possible exception but CDBS is generally frowned
upon by the general Debian contributor population).

Instead, a "drop a .json file here" approach was chosen instead to get a more "light-weight"
integration up and running.  When designing it, the following things were important:

 * It should be possible to extract the metadata of the plugin *without* running any code from it
   as running code could "taint" the process and break "list all plugins" features.
   (This ruled out loading python code directly)

 * Simple features would ideally not require code at all.  Packager provided files as an example
   can basically be done as a configuration rather than code.  This means that `debputy` can provide
   automated plugin upgrades from the current format to a future one if needed be.

 * Being able to reuse the declarative parser to handle the error messages and data normalization
   (this implies `JSON`, `YAML` or similar formats that is easily parsed in to mappings and lists).

 * It is important that there is a plugin API compat level that enables us to change the format or
   API between `debputy` and the plugins if we learn that the current API is inadequate.

At the time of writing, the plugin integration is still in development.  What is important can change
as we get actual users.

# Plugin module import vs. initialization design

When loading the python code for plugins, there were multiple ways to initialize the plugin:

 1. The plugin could have an initialization function that is called by `debputy`

 2. The plugin could use `@some_feature` annotations on types to be registered. This is similar
    in spirit to the `add-on` system where module load is considered the initialization event.

The 1. option was chosen because it is more reliable at directing the control flow and enables
the plugin module to be importable by non-`debputy` code. The latter might not seem directly
useful, but code like `py.test` attempts to import everything it can (even in `debian/` by
default) which could cause unwanted breakage for plugin providers by simply adding a
Python-based  plugin.

With the module autoloads mechanism, `debputy` would have to assume everything imported is
part of the plugin. This can cause issues (misattribution of errors) when a plugin loads
code or definitions from another plugin in addition to the `py.test` problem above. Safeguards
could be added, but the solutions found would either break the "py.test can import the module"-
property (see above) or be "safety is opt-in rather than always on/opt-out" by having an explicit
"This is plugin X code coming now" - the absence of a marker then causing misattribution).

If these problems could be solved, the annotation based loading could be considered.