summaryrefslogtreecommitdiffstats
path: root/doc/tutorial/Lintian/Tutorial/WritingChecks.pod
diff options
context:
space:
mode:
Diffstat (limited to 'doc/tutorial/Lintian/Tutorial/WritingChecks.pod')
-rw-r--r--doc/tutorial/Lintian/Tutorial/WritingChecks.pod442
1 files changed, 442 insertions, 0 deletions
diff --git a/doc/tutorial/Lintian/Tutorial/WritingChecks.pod b/doc/tutorial/Lintian/Tutorial/WritingChecks.pod
new file mode 100644
index 0000000..b6642e1
--- /dev/null
+++ b/doc/tutorial/Lintian/Tutorial/WritingChecks.pod
@@ -0,0 +1,442 @@
+=encoding utf-8
+
+=head1 NAME
+
+Lintian::Tutorial::WritingChecks -- Writing checks for Lintian
+
+=head1 SYNOPSIS
+
+Warning: This tutorial may be outdated.
+
+This guide will quickly guide you through the basics of writing a
+Lintian check. Most of the work is in writing the two files:
+
+ checks/<my-check>.pm
+ checks/<my-check>.desc
+
+And then either adding a Lintian profile or extending an existing
+one.
+
+=head1 DESCRIPTION
+
+The basics of writing a check are outlined in the Lintian User Manual
+(§3.3). This tutorial will focus on the act of writing the actual
+check. In this tutorial, we will assume the name of the check to be
+written is "deb/pkg-check".
+
+The tutorial will work with a "binary" and "udeb" check. Checking
+source packages works in a similar fashion.
+
+=head2 Create a check I<.desc> file
+
+As mentioned, this tutorial will focus on the writing of a check.
+Please see the Lintian User Manual (§3.3) for how to do this part.
+
+=head2 Create the Perl check module
+
+Start with the template:
+
+ # deb/pkg-check is loaded as Lintian::deb::pkg_check
+ # - See Lintian User Manual §3.3 for more info
+ package Lintian::deb::pkg_check;
+
+ use strict;
+ use warnings;
+
+ sub run {
+ my ($pkg, $type, $info, $proc, $group) = @_;
+ return;
+ }
+
+The snippet above is a simple valid check that does "nothing at all".
+We will extend it in just a moment, but first let us have a look at
+the arguments at the setup.
+
+The I<run> sub is the entry point of our "deb/pkg-check" check; it
+will be invoked once per package it should process. In our case, that
+will be once per "binary" (.deb) and once per udeb package processed.
+
+It is given 5 arguments (in the future, possibly more), which are:
+
+=over 4
+
+=item $pkg - The name of the package being processed.
+
+(Same as $proc->pkg_name)
+
+=item $type - The type of the package being processed.
+
+At the moment, $type is one of "binary" (.deb), "udeb", "source"
+(.dsc) or "changes". This argument is mostly useful if certain checks
+do not apply equally to all package types being processed.
+
+Generally it is advisable to check only binaries ("binary" and
+"udeb"), sources or changes in a given check. But in rare cases, it
+makes sense to lump multiple types together in the same check and this
+argument helps you do that.
+
+(Current it is always identical to $proc->pkg_type)
+
+=item $info - Accessor to the data Lintian has extracted
+
+Basically all information you want about a given package comes from
+the $info object. Sometimes referred to as either the "info object" or
+(an instance of) L<Lintian::Collect>.
+
+This object (together with a properly set Needs-Info in the I<.desc>
+file) will grant you access to all of the data Lintian has extracted
+about this package.
+
+Based on the value of the $type argument, it will be one of
+L<Lintian::Collect::Binary>, L<Lintian::Collect::Changes> or
+L<Lintian::Collect::Source>.
+
+(Currently it is the same as $proc->info)
+
+=item $proc - Basic metadata about the package
+
+This is an instance of L<Lintian::Processable> and is useful for
+trivially obtaining very basic package metadata. Particularly, the
+name of source package and version of source package are readily
+available through this object.
+
+=item $group - Group of processables from the same source
+
+If you want to do a cross-check between different packages built from
+the same source, $group helps you access those other packages
+(if they are available).
+
+This is an instance of L<Lintian::ProcessableGroup>.
+
+=back
+
+Now back to the coding.
+
+=head2 Accessing fields
+
+Let's do a slightly harder example. Assume we wanted to emit a tag for
+all packages without a (valid) Multi-Arch field. This requires us to
+A) identify if the package has a Multi-Arch field and B) identify if
+the content of the field was valid.
+
+Starting from the top. All $info objects have a method called field,
+which gives you access to a (raw) field from the control file of the
+package. It returns C<undef> if said field is not present or the
+content of said field otherwise. Note that field names must be given
+in all lowercase letters (i.e. use "multi-arch", not "Multi-Arch").
+
+This was the first half. Let's look at checking the value. Multi-arch
+fields can (currently) be one of "no", "same", "foreign" or "allowed".
+One way of checking this would be using the regex:
+
+Notice that Lintian automatically strips leading and trailing spaces
+on the I<first> line in a field. It also strips trailing spaces from
+all other lines, but leading spaces and the " ."-continuation markers
+are kept as is.
+
+=head2 Checking dependencies
+
+Lintian can do some checking of dependencies. For most cases it works
+similar to a normal dependency check, but keep in mind that Lintian
+uses I<pure> logic to determine if dependencies are satisfied (i.e. it
+will not look up relations like Provides for you).
+
+Suppose you wanted all packages with a multi-arch "same" field to
+pre-depend on the package "multiarch-support". Well, we could use the
+L<< $info->relation|Lintian::Collect::Binary/relation (FIELD) >> method for
+this.
+
+$info->relation returns an instance of L<Lintian::Relation>. This
+object has an "implies" method that can be used to check if a package
+has an explicit dependency. Note that "implies" actually checks if
+one relation "implies" another (i.e. if you satisfied relationA then
+you definitely also satisfied relationB).
+
+As with the "field"-method, field names have to be given in all
+lowercase. However "relation" will never return C<undef> (not even if the
+field is missing).
+
+=head2 Using static data files
+
+Currently our check mixes data and code. Namely all the valid values
+for the Multi-Arch field are currently hard-coded in our check. We can
+move those out of the check by using a data file.
+
+Lintian natively supports data files that are either "sets" or
+"tables" via L<Lintian::Data> (i.e. "unordered" collections). As an
+added bonus, L<Lintian::Data> transparently supports vendor specific
+data files for us.
+
+First we need to make a data file containing the values. Which could be:
+
+ # A table of all the valid values for the multi-arch field.
+ no
+ same
+ foreign
+ allowed
+
+This can then be stored in the data directory as
+I<data/deb/pkg-check/multiarch-values>.
+
+Now we can load it by using:
+
+ use Lintian::Data;
+
+ my $VALID_MULTI_ARCH_VALUES =
+ Lintian::Data->new('deb/pkg-check/multiarch-values');
+
+Actually, this is not quite true. L<Lintian::Data> is lazy, so it
+will not load anything before we force it to do so. Most of the time
+this is just an added bonus. However, if you ever have to force it to
+load something immediately, you can do so by invoking its "known"
+method (with an arbitrary defined string and ignore the result).
+
+Data files work with 3 access methods, "all", "known" and "value".
+
+=over 4
+
+=item all
+
+"all" (i.e. $data->all) returns a list of all the entries in the data
+file (for key/value tables, all returns the keys). The list is not
+sorted in any order (not even input order).
+
+=item known
+
+"known" (i.e. $data->known('item')) returns a truth value if a given
+item or key is known (present) in the data set or table. For key/pair
+tables, the value associated with the key can be retrieved with
+"value" (see below).
+
+=item value
+
+"value" (i.e. $data->value('key')) returns a value associated with a
+key for key/value tables. For unknown keys, it returns C<undef>. If
+the data file is not a key/value table but just a set, value returns
+a truth value for known keys.
+
+=back
+
+While we could use both "value" and "known", we will use the latter
+for readability (and to remind ourselves that this is a data set and
+not a data table).
+
+Basically we will be replacing:
+
+ unless exists $VALID_MULTI_ARCH_VALUES{$multiarch};
+
+with
+
+ unless $VALID_MULTI_ARCH_VALUES->known($multiarch);
+
+=head2 Accessing contents of the package
+
+Another heavily used mechanism is to check for the presence (or absence)
+of a given file. Generally this is what the
+L<< $info->index|Lintian::Collect::Package/index (FILE) >> and
+L<< $info->sorted_index|Lintian::Collect::Package/sorted_index >> methods
+are for. The "index" method returns instances of L<Lintian::Path>,
+which has a number of utility methods.
+
+If you want to loop over all files in a package, the sorted_index will
+do this for you. If you are looking for a specific file (or directory), a
+call to "index" will be much faster. For the contents of a specific directory,
+you can use something like:
+
+ if (my $dir = $info->index('path/to/dir/')) {
+ foreach my $elem ($dir->children) {
+ print $elem->name . " is a file" if $elem->is_file;
+ # ...
+ }
+ }
+
+Keep in mind that using the "index" or "sorted_index" method will
+require that you put "unpacked" in Needs-Info. See L</Keeping Needs-Info
+up to date>.
+
+There are also a pair of methods for accessing the control files of a
+binary package. These are
+L<< $info->control_index|Lintian::Collect::Package/control_index (FILE) >> and
+L<< $info->sorted_control_index|Lintian::Collect::Package/sorted_control_index >>.
+
+=head3 Accessing contents of a file in a package
+
+When you actually want to see the contents of a file, you can use
+L<open|Lintian::Path/open> (or L<open_gz|Lintian::Path/open_gz>) on
+an object returned by e.g.
+L<< $info->index|Lintian::Collect::Package/index (FILE) >>. These
+methods will open the underlying file for reading (the latter
+applying a gzip decompression).
+
+However, please do assert that the file is safe to read by calling
+L<is_open_ok|Lintian::Path/is_open_ok> first. Generally, it will
+only be true for files or safely resolvable symlinks pointing to
+files. Should you attempt to open a path that does not satisfy
+those criteria, L<Lintian::Path> will raise a trappable error at
+runtime.
+
+Alternatively, if you access the underlying file object, you can
+use the L<fs_path|Lintian::Path/fs_path> method. Usually, you will
+want to test either L<is_open_ok|Lintian::Path/is_open_ok> or
+L<is_valid_path|Lintian::Path/is_valid_path> first to ensure you do
+not follow unsafe symlinks. The "is_open_ok" check will also assert
+that it is not (e.g.) a named pipe or such.
+
+Should you call L<fs_path|Lintian::Path/fs_path> on a symlink that
+escapes the package root, the method will throw a trappable error at
+runtime. Once the path is returned, there are no more built-in
+fail-safes. When you use the returned path, keep things like
+"../../../../../etc/passwd"-symlink and "fifo" pipes in mind.
+
+
+In some cases, you may even need to access the file system objects
+I<without> using L<Lintian::Path>. This is, of course, discouraged
+and suffers from the same issues above (all checking must be done
+manually by you). Here you have to use the "unpacked", "debfiles" or
+"control" methods from L<Lintian::Collect> or its subclasses.
+
+
+
+The following snippet may be useful for testing that a given path does
+not escape the root.
+
+ use Lintian::Util qw(is_ancestor_of);
+
+ my $path = ...;
+ # The snippet applies equally well to $info->debfiles and
+ # $info->control (just remember to subst all occurrences of
+ # $info->unpacked).
+ my $unpacked_file = $info->unpacked($path);
+ if ( -f $unpacked_file && is_ancestor_of($info->unpacked, $unpacked_file)) {
+ # a file and contained within the package root.
+ } else {
+ # not a file or an unsafe path
+ }
+
+=head2 Keeping Needs-Info up to date
+
+Keeping the "Needs-Info" field of your I<.desc> file is a bit of
+manual work. In the API description for the method there will
+generally be a line looking something like:
+
+ Needs-Info requirements for using methodx: Y
+
+Which means that the methodx requires Y to work. Here Y is a comma
+separated list and each element of Y basically falls into 3 cases.
+
+=over 4
+
+=item * The element is the word I<none>
+
+In this case, the method has no "external" requirements and can be
+used without any changes to your Needs-Info. The "field" method
+is an example of this.
+
+This only makes sense if it is the only element in the list.
+
+=item * The element is a link to a method
+
+In this case, the method uses another method to do its job. An example
+is the
+L<sorted_control_index|Lintian::Collect::Binary/sorted_control_index>
+method, which uses the
+L<control_index|Lintian::Collect::Binary/control_index (FILE)>
+method. So using I<sorted_control_index> has the same requirements as
+using I<control_index>.
+
+=item * The element is the name of a collection (e.g. "control_index").
+
+In this case, the method needs the given collection to be run. So to
+use (e.g.) L<control_index|Lintian::Collect::Binary/control_index (FILE)>,
+you have to put "bin-pkg-control" in your Needs-Info.
+
+=back
+
+CAVEAT: Methods can have different requirements based on the type of
+package! An example of this "changelog", which requires "changelog-file"
+in binary packages and "Same as debfiles" in source packages.
+
+=head2 Avoiding security issues
+
+Over the years a couple of security issues have been discovered in
+Lintian. The problem is that people can in theory create some really nasty
+packages. Please keep the following in mind when writing a check:
+
+=over 4
+
+=item * Avoid 2-arg open, system/exec($shellcmd), `$shellcmd` like the
+plague.
+
+When you get any one of those wrong you introduce "arbitrary code
+execution" vulnerabilities (we learned this the hard way via
+CVE-2009-4014).
+
+Usually 3-arg open and the non-shell variant of system/exec are
+enough. When you actually need a shell pipeline, consider using
+L<Lintian::Command>. It also provides a I<safe_qx> command to assist
+with capturing stdout as an alternative to `$cmd` (or qx/$cmd/).
+
+=item * Do not trust field values.
+
+This is especially true if you intend to use the value as part of a
+file name. Verify that the field contains what you expect before you use
+it.
+
+=item * Use L<Lintian::Path> (or, failing that, is_ancestor_of)
+
+You might be tempted to think that the following code is safe:
+
+ use autodie;
+
+ my $filename = 'some/file';
+ my $ufile = $info->unpacked($filename);
+ if ( ! -l $ufile) {
+ # Looks safe, but isn't in general
+ open(my $fd, '<', $ufile);
+ ...;
+ }
+
+This is definitely unsafe if "$filename" contains at least one
+directory segment. So, if in doubt, use
+L<is_ancestor_of|Lintian::Util/is_ancestor_of(PARENTDIR, PATH)> to
+verify that the requested file is indeed the file you think it is. A
+better version of the above would be:
+
+ use autodie,
+ use Lintian::Util qw(is_ancestor_of);
+ [...]
+ my $filename = 'some/file';
+ my $ufile = $info->unpacked($filename);
+ if ( ! -l $ufile && -f $ufile && is_ancestor_of($info->unpacked, $ufile)) {
+ # $ufile is a file and it is contained within the package root.
+ open(m $fd, '<', $ufile);
+ ...;
+ }
+
+In some cases you can even drop the "! -l $ufile" part.
+
+Of course, it is much easier to use the L<Lintian::Path> object
+(whenever possible).
+
+ my $filename = 'some/file';
+ my $ufile = $info->index($filename);
+ if ( $ufile && $ufile->is_file && $ufile->is_open_ok) {
+ my $fd = $ufile->open;
+ ...;
+ }
+
+Here you can drop the " && $ufile->is_file" if you want to permit
+safe symlinks.
+
+
+For more information on the is_ancestor_of check, see
+L<is_ancestor_of|Lintian::Util/is_ancestor_of(PARENTDIR, PATH)>
+
+
+=back
+
+=head1 SEE ALSO
+
+L<Lintian::Tutorial::WritingTests>, L<Lintian::Tutorial::TestSuite>
+
+=cut