diff options
Diffstat (limited to 'doc/tutorial/Lintian/Tutorial/WritingChecks.pod')
-rw-r--r-- | doc/tutorial/Lintian/Tutorial/WritingChecks.pod | 442 |
1 files changed, 442 insertions, 0 deletions
diff --git a/doc/tutorial/Lintian/Tutorial/WritingChecks.pod b/doc/tutorial/Lintian/Tutorial/WritingChecks.pod new file mode 100644 index 0000000..b6642e1 --- /dev/null +++ b/doc/tutorial/Lintian/Tutorial/WritingChecks.pod @@ -0,0 +1,442 @@ +=encoding utf-8 + +=head1 NAME + +Lintian::Tutorial::WritingChecks -- Writing checks for Lintian + +=head1 SYNOPSIS + +Warning: This tutorial may be outdated. + +This guide will quickly guide you through the basics of writing a +Lintian check. Most of the work is in writing the two files: + + checks/<my-check>.pm + checks/<my-check>.desc + +And then either adding a Lintian profile or extending an existing +one. + +=head1 DESCRIPTION + +The basics of writing a check are outlined in the Lintian User Manual +(§3.3). This tutorial will focus on the act of writing the actual +check. In this tutorial, we will assume the name of the check to be +written is "deb/pkg-check". + +The tutorial will work with a "binary" and "udeb" check. Checking +source packages works in a similar fashion. + +=head2 Create a check I<.desc> file + +As mentioned, this tutorial will focus on the writing of a check. +Please see the Lintian User Manual (§3.3) for how to do this part. + +=head2 Create the Perl check module + +Start with the template: + + # deb/pkg-check is loaded as Lintian::deb::pkg_check + # - See Lintian User Manual §3.3 for more info + package Lintian::deb::pkg_check; + + use strict; + use warnings; + + sub run { + my ($pkg, $type, $info, $proc, $group) = @_; + return; + } + +The snippet above is a simple valid check that does "nothing at all". +We will extend it in just a moment, but first let us have a look at +the arguments at the setup. + +The I<run> sub is the entry point of our "deb/pkg-check" check; it +will be invoked once per package it should process. In our case, that +will be once per "binary" (.deb) and once per udeb package processed. + +It is given 5 arguments (in the future, possibly more), which are: + +=over 4 + +=item $pkg - The name of the package being processed. + +(Same as $proc->pkg_name) + +=item $type - The type of the package being processed. + +At the moment, $type is one of "binary" (.deb), "udeb", "source" +(.dsc) or "changes". This argument is mostly useful if certain checks +do not apply equally to all package types being processed. + +Generally it is advisable to check only binaries ("binary" and +"udeb"), sources or changes in a given check. But in rare cases, it +makes sense to lump multiple types together in the same check and this +argument helps you do that. + +(Current it is always identical to $proc->pkg_type) + +=item $info - Accessor to the data Lintian has extracted + +Basically all information you want about a given package comes from +the $info object. Sometimes referred to as either the "info object" or +(an instance of) L<Lintian::Collect>. + +This object (together with a properly set Needs-Info in the I<.desc> +file) will grant you access to all of the data Lintian has extracted +about this package. + +Based on the value of the $type argument, it will be one of +L<Lintian::Collect::Binary>, L<Lintian::Collect::Changes> or +L<Lintian::Collect::Source>. + +(Currently it is the same as $proc->info) + +=item $proc - Basic metadata about the package + +This is an instance of L<Lintian::Processable> and is useful for +trivially obtaining very basic package metadata. Particularly, the +name of source package and version of source package are readily +available through this object. + +=item $group - Group of processables from the same source + +If you want to do a cross-check between different packages built from +the same source, $group helps you access those other packages +(if they are available). + +This is an instance of L<Lintian::ProcessableGroup>. + +=back + +Now back to the coding. + +=head2 Accessing fields + +Let's do a slightly harder example. Assume we wanted to emit a tag for +all packages without a (valid) Multi-Arch field. This requires us to +A) identify if the package has a Multi-Arch field and B) identify if +the content of the field was valid. + +Starting from the top. All $info objects have a method called field, +which gives you access to a (raw) field from the control file of the +package. It returns C<undef> if said field is not present or the +content of said field otherwise. Note that field names must be given +in all lowercase letters (i.e. use "multi-arch", not "Multi-Arch"). + +This was the first half. Let's look at checking the value. Multi-arch +fields can (currently) be one of "no", "same", "foreign" or "allowed". +One way of checking this would be using the regex: + +Notice that Lintian automatically strips leading and trailing spaces +on the I<first> line in a field. It also strips trailing spaces from +all other lines, but leading spaces and the " ."-continuation markers +are kept as is. + +=head2 Checking dependencies + +Lintian can do some checking of dependencies. For most cases it works +similar to a normal dependency check, but keep in mind that Lintian +uses I<pure> logic to determine if dependencies are satisfied (i.e. it +will not look up relations like Provides for you). + +Suppose you wanted all packages with a multi-arch "same" field to +pre-depend on the package "multiarch-support". Well, we could use the +L<< $info->relation|Lintian::Collect::Binary/relation (FIELD) >> method for +this. + +$info->relation returns an instance of L<Lintian::Relation>. This +object has an "implies" method that can be used to check if a package +has an explicit dependency. Note that "implies" actually checks if +one relation "implies" another (i.e. if you satisfied relationA then +you definitely also satisfied relationB). + +As with the "field"-method, field names have to be given in all +lowercase. However "relation" will never return C<undef> (not even if the +field is missing). + +=head2 Using static data files + +Currently our check mixes data and code. Namely all the valid values +for the Multi-Arch field are currently hard-coded in our check. We can +move those out of the check by using a data file. + +Lintian natively supports data files that are either "sets" or +"tables" via L<Lintian::Data> (i.e. "unordered" collections). As an +added bonus, L<Lintian::Data> transparently supports vendor specific +data files for us. + +First we need to make a data file containing the values. Which could be: + + # A table of all the valid values for the multi-arch field. + no + same + foreign + allowed + +This can then be stored in the data directory as +I<data/deb/pkg-check/multiarch-values>. + +Now we can load it by using: + + use Lintian::Data; + + my $VALID_MULTI_ARCH_VALUES = + Lintian::Data->new('deb/pkg-check/multiarch-values'); + +Actually, this is not quite true. L<Lintian::Data> is lazy, so it +will not load anything before we force it to do so. Most of the time +this is just an added bonus. However, if you ever have to force it to +load something immediately, you can do so by invoking its "known" +method (with an arbitrary defined string and ignore the result). + +Data files work with 3 access methods, "all", "known" and "value". + +=over 4 + +=item all + +"all" (i.e. $data->all) returns a list of all the entries in the data +file (for key/value tables, all returns the keys). The list is not +sorted in any order (not even input order). + +=item known + +"known" (i.e. $data->known('item')) returns a truth value if a given +item or key is known (present) in the data set or table. For key/pair +tables, the value associated with the key can be retrieved with +"value" (see below). + +=item value + +"value" (i.e. $data->value('key')) returns a value associated with a +key for key/value tables. For unknown keys, it returns C<undef>. If +the data file is not a key/value table but just a set, value returns +a truth value for known keys. + +=back + +While we could use both "value" and "known", we will use the latter +for readability (and to remind ourselves that this is a data set and +not a data table). + +Basically we will be replacing: + + unless exists $VALID_MULTI_ARCH_VALUES{$multiarch}; + +with + + unless $VALID_MULTI_ARCH_VALUES->known($multiarch); + +=head2 Accessing contents of the package + +Another heavily used mechanism is to check for the presence (or absence) +of a given file. Generally this is what the +L<< $info->index|Lintian::Collect::Package/index (FILE) >> and +L<< $info->sorted_index|Lintian::Collect::Package/sorted_index >> methods +are for. The "index" method returns instances of L<Lintian::Path>, +which has a number of utility methods. + +If you want to loop over all files in a package, the sorted_index will +do this for you. If you are looking for a specific file (or directory), a +call to "index" will be much faster. For the contents of a specific directory, +you can use something like: + + if (my $dir = $info->index('path/to/dir/')) { + foreach my $elem ($dir->children) { + print $elem->name . " is a file" if $elem->is_file; + # ... + } + } + +Keep in mind that using the "index" or "sorted_index" method will +require that you put "unpacked" in Needs-Info. See L</Keeping Needs-Info +up to date>. + +There are also a pair of methods for accessing the control files of a +binary package. These are +L<< $info->control_index|Lintian::Collect::Package/control_index (FILE) >> and +L<< $info->sorted_control_index|Lintian::Collect::Package/sorted_control_index >>. + +=head3 Accessing contents of a file in a package + +When you actually want to see the contents of a file, you can use +L<open|Lintian::Path/open> (or L<open_gz|Lintian::Path/open_gz>) on +an object returned by e.g. +L<< $info->index|Lintian::Collect::Package/index (FILE) >>. These +methods will open the underlying file for reading (the latter +applying a gzip decompression). + +However, please do assert that the file is safe to read by calling +L<is_open_ok|Lintian::Path/is_open_ok> first. Generally, it will +only be true for files or safely resolvable symlinks pointing to +files. Should you attempt to open a path that does not satisfy +those criteria, L<Lintian::Path> will raise a trappable error at +runtime. + +Alternatively, if you access the underlying file object, you can +use the L<fs_path|Lintian::Path/fs_path> method. Usually, you will +want to test either L<is_open_ok|Lintian::Path/is_open_ok> or +L<is_valid_path|Lintian::Path/is_valid_path> first to ensure you do +not follow unsafe symlinks. The "is_open_ok" check will also assert +that it is not (e.g.) a named pipe or such. + +Should you call L<fs_path|Lintian::Path/fs_path> on a symlink that +escapes the package root, the method will throw a trappable error at +runtime. Once the path is returned, there are no more built-in +fail-safes. When you use the returned path, keep things like +"../../../../../etc/passwd"-symlink and "fifo" pipes in mind. + + +In some cases, you may even need to access the file system objects +I<without> using L<Lintian::Path>. This is, of course, discouraged +and suffers from the same issues above (all checking must be done +manually by you). Here you have to use the "unpacked", "debfiles" or +"control" methods from L<Lintian::Collect> or its subclasses. + + + +The following snippet may be useful for testing that a given path does +not escape the root. + + use Lintian::Util qw(is_ancestor_of); + + my $path = ...; + # The snippet applies equally well to $info->debfiles and + # $info->control (just remember to subst all occurrences of + # $info->unpacked). + my $unpacked_file = $info->unpacked($path); + if ( -f $unpacked_file && is_ancestor_of($info->unpacked, $unpacked_file)) { + # a file and contained within the package root. + } else { + # not a file or an unsafe path + } + +=head2 Keeping Needs-Info up to date + +Keeping the "Needs-Info" field of your I<.desc> file is a bit of +manual work. In the API description for the method there will +generally be a line looking something like: + + Needs-Info requirements for using methodx: Y + +Which means that the methodx requires Y to work. Here Y is a comma +separated list and each element of Y basically falls into 3 cases. + +=over 4 + +=item * The element is the word I<none> + +In this case, the method has no "external" requirements and can be +used without any changes to your Needs-Info. The "field" method +is an example of this. + +This only makes sense if it is the only element in the list. + +=item * The element is a link to a method + +In this case, the method uses another method to do its job. An example +is the +L<sorted_control_index|Lintian::Collect::Binary/sorted_control_index> +method, which uses the +L<control_index|Lintian::Collect::Binary/control_index (FILE)> +method. So using I<sorted_control_index> has the same requirements as +using I<control_index>. + +=item * The element is the name of a collection (e.g. "control_index"). + +In this case, the method needs the given collection to be run. So to +use (e.g.) L<control_index|Lintian::Collect::Binary/control_index (FILE)>, +you have to put "bin-pkg-control" in your Needs-Info. + +=back + +CAVEAT: Methods can have different requirements based on the type of +package! An example of this "changelog", which requires "changelog-file" +in binary packages and "Same as debfiles" in source packages. + +=head2 Avoiding security issues + +Over the years a couple of security issues have been discovered in +Lintian. The problem is that people can in theory create some really nasty +packages. Please keep the following in mind when writing a check: + +=over 4 + +=item * Avoid 2-arg open, system/exec($shellcmd), `$shellcmd` like the +plague. + +When you get any one of those wrong you introduce "arbitrary code +execution" vulnerabilities (we learned this the hard way via +CVE-2009-4014). + +Usually 3-arg open and the non-shell variant of system/exec are +enough. When you actually need a shell pipeline, consider using +L<Lintian::Command>. It also provides a I<safe_qx> command to assist +with capturing stdout as an alternative to `$cmd` (or qx/$cmd/). + +=item * Do not trust field values. + +This is especially true if you intend to use the value as part of a +file name. Verify that the field contains what you expect before you use +it. + +=item * Use L<Lintian::Path> (or, failing that, is_ancestor_of) + +You might be tempted to think that the following code is safe: + + use autodie; + + my $filename = 'some/file'; + my $ufile = $info->unpacked($filename); + if ( ! -l $ufile) { + # Looks safe, but isn't in general + open(my $fd, '<', $ufile); + ...; + } + +This is definitely unsafe if "$filename" contains at least one +directory segment. So, if in doubt, use +L<is_ancestor_of|Lintian::Util/is_ancestor_of(PARENTDIR, PATH)> to +verify that the requested file is indeed the file you think it is. A +better version of the above would be: + + use autodie, + use Lintian::Util qw(is_ancestor_of); + [...] + my $filename = 'some/file'; + my $ufile = $info->unpacked($filename); + if ( ! -l $ufile && -f $ufile && is_ancestor_of($info->unpacked, $ufile)) { + # $ufile is a file and it is contained within the package root. + open(m $fd, '<', $ufile); + ...; + } + +In some cases you can even drop the "! -l $ufile" part. + +Of course, it is much easier to use the L<Lintian::Path> object +(whenever possible). + + my $filename = 'some/file'; + my $ufile = $info->index($filename); + if ( $ufile && $ufile->is_file && $ufile->is_open_ok) { + my $fd = $ufile->open; + ...; + } + +Here you can drop the " && $ufile->is_file" if you want to permit +safe symlinks. + + +For more information on the is_ancestor_of check, see +L<is_ancestor_of|Lintian::Util/is_ancestor_of(PARENTDIR, PATH)> + + +=back + +=head1 SEE ALSO + +L<Lintian::Tutorial::WritingTests>, L<Lintian::Tutorial::TestSuite> + +=cut |