Postfix Debugging Howto


Purpose of this document

This document describes how to debug parts of the Postfix mail system when things do not work according to expectation. The methods vary from making Postfix log a lot of detail, to running some daemon processes under control of a call tracer or debugger.

The text assumes that the Postfix main.cf and master.cf configuration files are stored in directory /etc/postfix. You can use the command "postconf config_directory" to find out the actual location of this directory on your machine.

Listed in order of increasing invasiveness, the debugging techniques are as follows:

Look for obvious signs of trouble

Postfix logs all failed and successful deliveries to a logfile.

When Postfix does not receive or deliver mail, the first order of business is to look for errors that prevent Postfix from working properly:

% egrep '(warning|error|fatal|panic):' /some/log/file | more

Note: the most important message is near the BEGINNING of the output. Error messages that come later are less useful.

The nature of each problem is indicated as follows:

Debugging Postfix from inside

Postfix version 2.1 and later can produce mail delivery reports for debugging purposes. These reports not only show sender/recipient addresses after address rewriting and alias expansion or forwarding, they also show information about delivery to mailbox, delivery to non-Postfix command, responses from remote SMTP servers, and so on.

Postfix can produce two types of mail delivery reports for debugging:

These reports contain information that is generated by Postfix delivery agents. Since these run as daemon processes that cannot interact with users directly, the result is sent as mail to the sender of the test message. The format of these reports is practically identical to that of ordinary non-delivery notifications.

For a detailed example of a mail delivery status report, see the debugging section at the end of the ADDRESS_REWRITING_README document.

Try turning off chroot operation in master.cf

A common mistake is to turn on chroot operation in the master.cf file without going through all the necessary steps to set up a chroot environment. This causes Postfix daemon processes to fail due to all kinds of missing files.

The example below shows an SMTP server that is configured with chroot turned off:

/etc/postfix/master.cf:
    # =============================================================
    # service type  private unpriv  chroot  wakeup  maxproc command
    #               (yes)   (yes)   (yes)   (never) (100)
    # =============================================================
    smtp      inet  n       -       n       -       -       smtpd

Inspect master.cf for any processes that have chroot operation not turned off. If you find any, save a copy of the master.cf file, and edit the entries in question. After executing the command "postfix reload", see if the problem has gone away.

If turning off chrooted operation made the problem go away, then congratulations. Leaving Postfix running in this way is adequate for most sites. If you prefer chrooted operation, see the Postfix BASIC_CONFIGURATION_README file for information about how to prepare Postfix for chrooted operation.

Verbose logging for specific SMTP connections

In /etc/postfix/main.cf, list the remote site name or address in the debug_peer_list parameter. For example, in order to make the software log a lot of information to the syslog daemon for connections from or to the loopback interface:

/etc/postfix/main.cf:
    debug_peer_list = 127.0.0.1

You can specify one or more hosts, domains, addresses or net/masks. To make the change effective immediately, execute the command "postfix reload".

Record the SMTP session with a network sniffer

This example uses tcpdump. In order to record a conversation you need to specify a large enough buffer with the "-s" option or else you will miss some or all of the packet payload.

# tcpdump -w /file/name -s 0 host example.com and port 25

Older tcpdump versions don't support "-s 0"; in that case, use "-s 2000" instead.

Run this for a while, stop with Ctrl-C when done. To view the data use a binary viewer, ethereal, or good old less.

Making Postfix daemon programs more verbose

Append one or more "-v" options to selected daemon definitions in /etc/postfix/master.cf and type "postfix reload". This will cause a lot of activity to be logged to the syslog daemon. For example, to make the Postfix SMTP server process more verbose:

/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -v

To diagnose problems with address rewriting specify a "-v" option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to diagnose problems with mail delivery specify a "-v" option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8), local(8), pipe(8), smtp(8), or virtual(8) delivery agent.

Manually tracing a Postfix daemon process

Many systems allow you to inspect a running process with a system call tracer. For example:

# trace -p process-id (SunOS 4)
# strace -p process-id (Linux and many others)
# truss -p process-id (Solaris, FreeBSD)
# ktrace -p process-id (generic 4.4BSD)

Even more informative are traces of system library calls. Examples:

# ltrace -p process-id (Linux, also ported to FreeBSD and BSD/OS)
# sotruss -p process-id (Solaris)

See your system documentation for details.

Tracing a running process can give valuable information about what a process is attempting to do. This is as much information as you can get without running an interactive debugger program, as described in a later section.

Automatically tracing a Postfix daemon process

Postfix can attach a call tracer whenever a daemon process starts. Call tracers come in several kinds.

  1. System call tracers such as trace, truss, strace, or ktrace. These show the communication between the process and the kernel.

  2. Library call tracers such as sotruss and ltrace. These show calls of library routines, and give a better idea of what is going on within the process.

Append a -D option to the suspect command in /etc/postfix/master.cf, for example:

/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D

Edit the debugger_command definition in /etc/postfix/main.cf so that it invokes the call tracer of your choice, for example:

/etc/postfix/main.cf:
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin;
         (truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5

Type "postfix reload" and watch the logfile.

Running daemon programs with the interactive ddd debugger

If you have X Windows installed on the Postfix machine, then an interactive debugger such as ddd can be convenient.

Edit the debugger_command definition in /etc/postfix/main.cf so that it invokes ddd:

/etc/postfix/main.cf:
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
         ddd $daemon_directory/$process_name $process_id & sleep 5

Be sure that gdb is in the command search path, and export XAUTHORITY so that X access control works, for example:

% setenv XAUTHORITY ~/.Xauthority (csh syntax)
$ export XAUTHORITY=$HOME/.Xauthority (sh syntax)

Append a -D option to the suspect daemon definition in /etc/postfix/master.cf, for example:

/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D

Stop and start the Postfix system. This is necessary so that Postfix runs with the proper XAUTHORITY and DISPLAY settings.

Whenever the suspect daemon process is started, a debugger window pops up and you can watch in detail what happens.

Running daemon programs with the interactive gdb debugger

If you have the screen command installed on the Postfix machine, then you can run an interactive debugger such as gdb as follows.

Edit the debugger_command definition in /etc/postfix/main.cf so that it runs gdb inside a detached screen session:

/etc/postfix/main.cf:
    debugger_command =
        PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
        export HOME; screen -e^tt -dmS $process_name gdb
        $daemon_directory/$process_name $process_id & sleep 2

Be sure that gdb is in the command search path.

Append a -D option to the suspect daemon definition in /etc/postfix/master.cf, for example:

/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D

Execute the command "postfix reload" and wait until a daemon process is started (you can see this in the maillog file).

Then attach to the screen, and debug away:

# HOME=/root screen -r
gdb) continue
gdb) where

Running daemon programs under a non-interactive debugger

If you do not have X Windows installed on the Postfix machine, or if you are not familiar with interactive debuggers, then you can try to run gdb in non-interactive mode, and have it print a stack trace when the process crashes.

Edit the debugger_command definition in /etc/postfix/main.cf so that it invokes the gdb debugger:

/etc/postfix/main.cf:
    debugger_command =
        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
        where; sleep 8640000) | gdb $daemon_directory/$process_name 
        $process_id 2>&1
        >$config_directory/$process_name.$process_id.log & sleep 5

Append a -D option to the suspect daemon in /etc/postfix/master.cf, for example:

/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D

Type "postfix reload" to make the configuration changes effective.

Whenever a suspect daemon process is started, an output file is created, named after the daemon and process ID (for example, smtpd.12345.log). When the process crashes, a stack trace (with output from the "where" command) is written to its logfile.

Unreasonable behavior

Sometimes the behavior exhibited by Postfix just does not match the source code. Why can a program deviate from the instructions given by its author? There are two possibilities.

In both cases, the program being executed is not the program that was supposed to be executed, so anything could happen.

There is a third possibility:

Hardware-related failures usually do not reproduce in exactly the same way after power cycling and rebooting the system. There's little Postfix can do about bad hardware. Be sure to use hardware that at the very least can detect memory errors. Otherwise, Postfix will just be waiting to be hit by a bit error. Critical systems deserve real hardware.

When a compiler makes an error, the problem can be reproduced whenever the resulting program is run. Compiler errors are most likely to happen in the code optimizer. If a problem is reproducible across power cycles and system reboots, it can be worthwhile to rebuild Postfix with optimization disabled, and to see if optimization makes a difference.

In order to compile Postfix with optimizations turned off:

% make tidy
% make makefiles OPT=

This produces a set of Makefiles that do not request compiler optimization.

Once the makefiles are set up, build the software:

% make
% su
Password:
# make install

If the problem goes away, then it is time to ask your vendor for help.

Reporting problems to postfix-users@postfix.org

The people who participate on postfix-users@postfix.org are very helpful, especially if YOU provide them with sufficient information. Remember, these volunteers are willing to help, but their time is limited.

When reporting a problem, be sure to include the following information.