diff options
Diffstat (limited to '')
-rw-r--r-- | html/DEBUG_README.html | 597 |
1 files changed, 597 insertions, 0 deletions
diff --git a/html/DEBUG_README.html b/html/DEBUG_README.html new file mode 100644 index 0000000..f0dea38 --- /dev/null +++ b/html/DEBUG_README.html @@ -0,0 +1,597 @@ +<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" + "http://www.w3.org/TR/html4/loose.dtd"> + +<html> + +<head> + +<title> Postfix Debugging Howto </title> + +<meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> + +</head> + +<body> + +<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1> + +<hr> + +<h2>Purpose of this document</h2> + +<p> This document describes how to debug parts of the Postfix mail +system when things do not work according to expectation. The methods +vary from making Postfix log a lot of detail, to running some daemon +processes under control of a call tracer or debugger. </p> + +<p> The text assumes that the Postfix <a href="postconf.5.html">main.cf</a> and <a href="master.5.html">master.cf</a> +configuration files are stored in directory /etc/postfix. You can +use the command "<b>postconf <a href="postconf.5.html#config_directory">config_directory</a></b>" to find out the +actual location of this directory on your machine. </p> + +<p> Listed in order of increasing invasiveness, the debugging +techniques are as follows: </p> + +<ul> + +<li><a href="#logging">Look for obvious signs of trouble</a> + +<li><a href="#trace_mail">Debugging Postfix from inside</a> + +<li><a href="#no_chroot">Try turning off chroot operation in +master.cf</a> + +<li><a href="#debug_peer">Verbose logging for specific SMTP +connections</a> + +<li><a href="#sniffer">Record the SMTP session with a network +sniffer</a> + +<li><a href="#verbose">Making Postfix daemon programs more verbose</a> + +<li><a href="#man_trace">Manually tracing a Postfix daemon process</a> + +<li><a href="#auto_trace">Automatically tracing a Postfix daemon +process</a> + +<li><a href="#ddd">Running daemon programs with the interactive +ddd debugger</a> + +<li><a href="#screen">Running daemon programs with the interactive +gdb debugger</a> + +<li><a href="#gdb">Running daemon programs under a non-interactive +debugger</a> + +<li><a href="#unreasonable">Unreasonable behavior</a> + +<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a> + +</ul> + +<h2><a name="logging">Look for obvious signs of trouble</a></h2> + +<p> Postfix logs all failed and successful deliveries to a logfile. </p> + +<ul> + +<li> <p> When Postfix uses syslog logging (the default), the file +is usually called /var/log/maillog, /var/log/mail, or something +similar; the exact pathname is configured in a file called +/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p> + +<li> <p> When Postfix uses its own logging system (see <a href="MAILLOG_README.html">MAILLOG_README</a>), +the location of the logfile is configured with the Postfix <a href="postconf.5.html#maillog_file">maillog_file</a> +parameter. </p> + +</ul> + +<p> When Postfix does not receive or deliver mail, the first order +of business is to look for errors that prevent Postfix from working +properly: </p> + +<blockquote> +<pre> +% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b> +</pre> +</blockquote> + +<p> Note: the most important message is near the BEGINNING of the +output. Error messages that come later are less useful. </p> + +<p> The nature of each problem is indicated as follows: </p> + +<ul> + +<li> <p> "<b>panic</b>" indicates a problem in the software itself +that only a programmer can fix. Postfix cannot proceed until this +is fixed. </p> + +<li> <p> "<b>fatal</b>" is the result of missing files, incorrect +permissions, incorrect configuration file settings that you can +fix. Postfix cannot proceed until this is fixed. </p> + +<li> <p> "<b>error</b>" reports an error condition. For safety +reasons, a Postfix process will terminate when more than 13 of these +happen. </p> + +<li> <p> "<b>warning</b>" indicates a non-fatal error. These are +problems that you may not be able to fix (such as a broken DNS +server elsewhere on the network) but may also indicate local +configuration errors that could become a problem later. </p> + +</ul> + +<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2> + +<p> Postfix version 2.1 and later can +produce mail delivery reports for debugging purposes. These reports +not only show sender/recipient addresses after address rewriting +and alias expansion or forwarding, they also show information about +delivery to mailbox, delivery to non-Postfix command, responses +from remote SMTP servers, and so on. +</p> + +<p> Postfix can produce two types of mail delivery reports for +debugging: </p> + +<ul> + +<li> <p> What-if: report what would happen, but do not actually +deliver mail. This mode of operation is requested with: </p> + +<pre> +% <b>/usr/sbin/sendmail -bv address...</b> +Mail Delivery Status Report will be mailed to <your login name>. +</pre> + +<li> <p> What happened: deliver mail and report successes and/or +failures, including replies from remote SMTP servers. This mode +of operation is requested with: </p> + +<pre> +% <b>/usr/sbin/sendmail -v address...</b> +Mail Delivery Status Report will be mailed to <your login name>. +</pre> + +</ul> + +<p> These reports contain information that is generated by Postfix +delivery agents. Since these run as daemon processes that cannot +interact with users directly, the result is sent as mail to the +sender of the test message. The format of these reports is practically +identical to that of ordinary non-delivery notifications. </p> + +<p> For a detailed example of a mail delivery status report, see +the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a> +section at the end of the <a href="ADDRESS_REWRITING_README.html">ADDRESS_REWRITING_README</a> document. </p> + +<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2> + +<p> A common mistake is to turn on chroot operation in the <a href="master.5.html">master.cf</a> +file without going through all the necessary steps to set up a +chroot environment. This causes Postfix daemon processes to fail +due to all kinds of missing files. </p> + +<p> The example below shows an SMTP server that is configured with +chroot turned off: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="master.5.html">master.cf</a>: + # ============================================================= + # service type private unpriv <b>chroot</b> wakeup maxproc command + # (yes) (yes) <b>(yes)</b> (never) (100) + # ============================================================= + smtp inet n - <b>n</b> - - smtpd +</pre> +</blockquote> + +<p> Inspect <a href="master.5.html">master.cf</a> for any processes that have chroot operation +not turned off. If you find any, save a copy of the <a href="master.5.html">master.cf</a> file, +and edit the entries in question. After executing the command +"<b>postfix reload</b>", see if the problem has gone away. </p> + +<p> If turning off chrooted operation made the problem go away, +then congratulations. Leaving Postfix running in this way is +adequate for most sites. If you prefer chrooted operation, see +the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup"> +BASIC_CONFIGURATION_README</a> file for information about how to +prepare Postfix for chrooted operation. </p> + +<h2><a name="debug_peer">Verbose logging for specific SMTP +connections</a></h2> + +<p> In /etc/postfix/<a href="postconf.5.html">main.cf</a>, list the remote site name or address +in the <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> parameter. For example, in order to make +the software log a lot of information to the syslog daemon for +connections from or to the loopback interface: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="postconf.5.html">main.cf</a>: + <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> = 127.0.0.1 +</pre> +</blockquote> + +<p> You can specify one or more hosts, domains, addresses or +net/masks. To make the change effective immediately, execute the +command "<b>postfix reload</b>". </p> + +<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2> + +<p> This example uses <b>tcpdump</b>. In order to record a conversation +you need to specify a large enough buffer with the "<b>-s</b>" +option or else you will miss some or all of the packet payload. +</p> + +<blockquote> +<pre> +# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b> +</pre> +</blockquote> + +<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case, +use "<b>-s 2000</b>" instead. </p> + +<p> Run this for a while, stop with Ctrl-C when done. To view the +data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>. +</p> + +<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2> + +<p> Append one or more "<b>-v</b>" options to selected daemon +definitions in /etc/postfix/<a href="master.5.html">master.cf</a> and type "<b>postfix reload</b>". +This will cause a lot of activity to be logged to the syslog daemon. +For example, to make the Postfix SMTP server process more verbose: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="master.5.html">master.cf</a>: + smtp inet n - n - - smtpd -v +</pre> +</blockquote> + +<p> To diagnose problems with address rewriting specify a "<b>-v</b>" +option for the <a href="cleanup.8.html">cleanup(8)</a> and/or <a href="trivial-rewrite.8.html">trivial-rewrite(8)</a> daemon, and to +diagnose problems with mail delivery specify a "<b>-v</b>" +option for the <a href="qmgr.8.html">qmgr(8)</a> or <a href="qmgr.8.html">oqmgr(8)</a> queue manager, or for the <a href="lmtp.8.html">lmtp(8)</a>, +<a href="local.8.html">local(8)</a>, <a href="pipe.8.html">pipe(8)</a>, <a href="smtp.8.html">smtp(8)</a>, or <a href="virtual.8.html">virtual(8)</a> delivery agent. </p> + +<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2> + +<p> Many systems allow you to inspect a running process with a +system call tracer. For example: </p> + +<blockquote> +<pre> +# <b>trace -p process-id</b> (SunOS 4) +# <b>strace -p process-id</b> (Linux and many others) +# <b>truss -p process-id</b> (Solaris, FreeBSD) +# <b>ktrace -p process-id</b> (generic 4.4BSD) +</pre> +</blockquote> + +<p> Even more informative are traces of system library calls. +Examples: </p> + +<blockquote> +<pre> +# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS) +# <b>sotruss -p process-id</b> (Solaris) +</pre> +</blockquote> + +<p> See your system documentation for details. </p> + +<p> Tracing a running process can give valuable information about +what a process is attempting to do. This is as much information as +you can get without running an interactive debugger program, as +described in a later section. </p> + +<h2><a name="auto_trace">Automatically tracing a Postfix daemon +process</a></h2> + +<p> Postfix can attach a call tracer whenever a daemon process +starts. Call tracers come in several kinds. </p> + +<ol> + +<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>, +<b>strace</b>, or <b>ktrace</b>. These show the communication +between the process and the kernel. </p> + +<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>. +These show calls of library routines, and give a better idea of +what is going on within the process. </p> + +</ol> + +<p> Append a <b>-D</b> option to the suspect command in +/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="master.5.html">master.cf</a>: + smtp inet n - n - - smtpd -D +</pre> +</blockquote> + +<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> +so that it invokes the call tracer of your choice, for example: +</p> + +<blockquote> +<pre> +/etc/postfix/<a href="postconf.5.html">main.cf</a>: + <a href="postconf.5.html#debugger_command">debugger_command</a> = + PATH=/bin:/usr/bin:/usr/local/bin; + (truss -p $<a href="postconf.5.html#process_id">process_id</a> 2>&1 | logger -p mail.info) & sleep 5 +</pre> +</blockquote> + +<p> Type "<b>postfix reload</b>" and watch the logfile. </p> + +<h2><a name="ddd">Running daemon programs with the interactive +ddd debugger</a></h2> + +<p> If you have X Windows installed on the Postfix machine, then +an interactive debugger such as <b>ddd</b> can be convenient. +</p> + +<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> +so that it invokes <b>ddd</b>: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="postconf.5.html">main.cf</a>: + <a href="postconf.5.html#debugger_command">debugger_command</a> = + PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin + ddd $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> & sleep 5 +</pre> +</blockquote> + +<p> Be sure that <b>gdb</b> is in the command search path, and +export <b>XAUTHORITY</b> so that X access control works, for example: +</p> + +<blockquote> +<pre> +% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax) +$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax) +</pre> +</blockquote> + +<p> Append a <b>-D</b> option to the suspect daemon definition in +/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="master.5.html">master.cf</a>: + smtp inet n - n - - smtpd -D +</pre> +</blockquote> + +<p> Stop and start the Postfix system. This is necessary so that +Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b> +settings. </p> + +<p> Whenever the suspect daemon process is started, a debugger +window pops up and you can watch in detail what happens. </p> + +<h2><a name="screen">Running daemon programs with the interactive +gdb debugger</a></h2> + +<p> If you have the screen command installed on the Postfix machine, then +you can run an interactive debugger such as <b>gdb</b> as follows. </p> + +<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> +so that it runs <b>gdb</b> inside a detached <b>screen</b> session: +</p> + +<blockquote> +<pre> +/etc/postfix/<a href="postconf.5.html">main.cf</a>: + <a href="postconf.5.html#debugger_command">debugger_command</a> = + PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root; + export HOME; screen -e^tt -dmS $<a href="postconf.5.html#process_name">process_name</a> gdb + $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> & sleep 2 +</pre> +</blockquote> + +<p> Be sure that <b>gdb</b> is in the command search path. </p> + +<p> Append a <b>-D</b> option to the suspect daemon definition in +/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="master.5.html">master.cf</a>: + smtp inet n - n - - smtpd -D +</pre> +</blockquote> + +<p> Execute the command "<b>postfix reload</b>" and wait until a +daemon process is started (you can see this in the maillog file). +</p> + +<p> Then attach to the screen, and debug away: </p> + +<blockquote> +<pre> +# HOME=/root screen -r +gdb) continue +gdb) where +</pre> +</blockquote> + +<h2><a name="gdb">Running daemon programs under a non-interactive +debugger</a></h2> + +<p> If you do not have X Windows installed on the Postfix machine, +or if you are not familiar with interactive debuggers, then you +can try to run <b>gdb</b> in non-interactive mode, and have it +print a stack trace when the process crashes. </p> + +<p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a> +so that it invokes the <b>gdb</b> debugger: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="postconf.5.html">main.cf</a>: + <a href="postconf.5.html#debugger_command">debugger_command</a> = + PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo + where; sleep 8640000) | gdb $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> + $<a href="postconf.5.html#process_id">process_id</a> 2>&1 + >$<a href="postconf.5.html#config_directory">config_directory</a>/$<a href="postconf.5.html#process_name">process_name</a>.$<a href="postconf.5.html#process_id">process_id</a>.log & sleep 5 +</pre> +</blockquote> + +<p> Append a <b>-D</b> option to the suspect daemon in +/etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p> + +<blockquote> +<pre> +/etc/postfix/<a href="master.5.html">master.cf</a>: + smtp inet n - n - - smtpd -D +</pre> +</blockquote> + +<p> Type "<b>postfix reload</b>" to make the configuration changes +effective. </p> + +<p> Whenever a suspect daemon process is started, an output file +is created, named after the daemon and process ID (for example, +smtpd.12345.log). When the process crashes, a stack trace (with +output from the "<b>where</b>" command) is written to its logfile. +</p> + +<h2><a name="unreasonable">Unreasonable behavior</a></h2> + +<p> Sometimes the behavior exhibited by Postfix just does not match the +source code. Why can a program deviate from the instructions given +by its author? There are two possibilities. </p> + +<ul> + +<li> <p> The compiler has erred. This rarely happens. </p> + +<li> <p> The hardware has erred. Does the machine have ECC memory? </p> + +</ul> + +<p> In both cases, the program being executed is not the program +that was supposed to be executed, so anything could happen. </p> + +<p> There is a third possibility: </p> + +<ul> + +<li> <p> Bugs in system software (kernel or libraries). </p> + +</ul> + +<p> Hardware-related failures usually do not reproduce in exactly +the same way after power cycling and rebooting the system. There's +little Postfix can do about bad hardware. Be sure to use hardware +that at the very least can detect memory errors. Otherwise, Postfix +will just be waiting to be hit by a bit error. Critical systems +deserve real hardware. </p> + +<p> When a compiler makes an error, the problem can be reproduced +whenever the resulting program is run. Compiler errors are most +likely to happen in the code optimizer. If a problem is reproducible +across power cycles and system reboots, it can be worthwhile to +rebuild Postfix with optimization disabled, and to see if optimization +makes a difference. </p> + +<p> In order to compile Postfix with optimizations turned off: </p> + +<blockquote> +<pre> +% <b>make tidy</b> +% <b>make makefiles OPT=</b> +</pre> +</blockquote> + +<p> This produces a set of Makefiles that do not request compiler +optimization. </p> + +<p> Once the makefiles are set up, build the software: </p> + +<blockquote> +<pre> +% <b>make</b> +% <b>su</b> +Password: +# <b>make install</b> +</pre> +</blockquote> + +<p> If the problem goes away, then it is time to ask your vendor +for help. </p> + +<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2> + +<p> The people who participate on postfix-users@postfix.org +are very helpful, especially if YOU provide them with sufficient +information. Remember, these volunteers are willing to help, but +their time is limited. </p> + +<p> When reporting a problem, be sure to include the following +information. </p> + +<ul> + +<li> <p> A summary of the problem. Please do not just send some +logging without explanation of what YOU believe is wrong. </p> + +<li> <p> Complete error messages. Please use cut-and-paste, or use +attachments, instead of reciting information from memory. +</p> + +<li> <p> Postfix logging. See the text at the top of the <a href="DEBUG_README.html">DEBUG_README</a> +document to find out where logging is stored. Please do not frustrate +the helpers by word wrapping the logging. If the logging is more +than a few kbytes of text, consider posting an URL on a web or ftp +site. </p> + +<li> <p> Consider using a test email address so that you don't have +to reveal email addresses or passwords of innocent people. </p> + +<li> <p> If you can't use a test email address, please anonymize +email addresses and host names consistently. Replace each letter +by "A", each digit +by "D" so that the helpers can still recognize syntactical errors. +</p> + +<li> <p> Command output from:</p> + +<ul> + +<li> <p> "<b>postconf -n</b>". Please do not send your <a href="postconf.5.html">main.cf</a> file, +or 1000+ lines of <b>postconf</b> command output. </p> + +<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p> + +</ul> + +<li> <p> Better, provide output from the <b>postfinger</b> tool. +This can be found at <a href="http://ftp.wl0.org/SOURCES/postfinger">http://ftp.wl0.org/SOURCES/postfinger</a>. </p> + +<li> <p> If the problem is SASL related, consider including the +output from the <b>saslfinger</b> tool. This can be found at +<a href="http://postfix.state-of-mind.de/patrick.koetter/saslfinger/">http://postfix.state-of-mind.de/patrick.koetter/saslfinger/</a>. </p> + +<li> <p> If the problem is about too much mail in the queue, consider +including output from the <b>qshape</b> tool, as described in the +<a href="QSHAPE_README.html">QSHAPE_README</a> file. </p> + +<li> <p> If the problem is protocol related (connections time out, +or an SMTP server complains about syntax errors etc.) consider +recording a session with <b>tcpdump</b>, as described in the <a +href="#sniffer">DEBUG_README</a> document. </ul> + +</body> + +</html> |