summaryrefslogtreecommitdiffstats
path: root/proto/DEBUG_README.html
blob: ea62cdc48121217bf74e0a840a43b85b688f5a0c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<title> Postfix Debugging Howto </title>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body>

<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>

<hr>

<h2>Purpose of this document</h2>

<p> This document describes how to debug parts of the Postfix mail
system when things do not work according to expectation. The methods
vary from making Postfix log a lot of detail, to running some daemon
processes under control of a call tracer or debugger. </p>

<p> The text assumes that the Postfix main.cf and master.cf
configuration files are stored in directory /etc/postfix. You can
use the command "<b>postconf config_directory</b>" to find out the
actual location of this directory on your machine. </p>

<p> Listed in order of increasing invasiveness, the debugging
techniques are as follows: </p>

<ul>

<li><a href="#logging">Look for obvious signs of trouble</a>

<li><a href="#trace_mail">Debugging Postfix from inside</a>

<li><a href="#no_chroot">Try turning off chroot operation in
master.cf</a>

<li><a href="#debug_peer">Verbose logging for specific SMTP
connections</a>

<li><a href="#sniffer">Record the SMTP session with a network
sniffer</a>

<li><a href="#verbose">Making Postfix daemon programs more verbose</a>

<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>

<li><a href="#auto_trace">Automatically tracing a Postfix daemon
process</a>

<li><a href="#ddd">Running daemon programs with the interactive
ddd debugger</a>

<li><a href="#screen">Running daemon programs with the interactive
gdb debugger</a>

<li><a href="#gdb">Running daemon programs under a non-interactive
debugger</a>

<li><a href="#unreasonable">Unreasonable behavior</a>

<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>

</ul>

<h2><a name="logging">Look for obvious signs of trouble</a></h2>

<p> Postfix logs all failed and successful deliveries to a logfile. </p>

<ul>

<li> <p> When Postfix uses syslog logging (the default), the file
is usually called /var/log/maillog, /var/log/mail, or something
similar; the exact pathname is configured in a file called
/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p>

<li> <p> When Postfix uses its own logging system (see MAILLOG_README),
the location of the logfile is configured with the Postfix maillog_file
parameter. </p>

</ul>

<p> When Postfix does not receive or deliver mail, the first order
of business is to look for errors that prevent Postfix from working
properly:  </p>

<blockquote>
<pre>
% <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b>
</pre>
</blockquote>

<p> Note: the most important message is near the BEGINNING of the
output.  Error messages that come later are less useful. </p>

<p> The nature of each problem is indicated as follows: </p>

<ul>

<li> <p> "<b>panic</b>" indicates a problem in the software itself
that only a programmer can fix. Postfix cannot proceed until this
is fixed. </p>

<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
permissions, incorrect configuration file settings that you can
fix.  Postfix cannot proceed until this is fixed. </p>

<li> <p> "<b>error</b>" reports an error condition. For safety
reasons, a Postfix process will terminate when more than 13 of these
happen. </p>

<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
problems that you may not be able to fix (such as a broken DNS
server elsewhere on the network) but may also indicate local
configuration errors that could become a problem later. </p>

</ul>

<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>

<p> Postfix version 2.1 and later can
produce mail delivery reports for debugging purposes. These reports
not only show sender/recipient addresses after address rewriting
and alias expansion or forwarding, they also show information about
delivery to mailbox, delivery to non-Postfix command, responses
from remote SMTP servers, and so on.
</p>

<p> Postfix can produce two types of mail delivery reports for
debugging: </p>

<ul>

<li> <p> What-if: report what would happen, but do not actually
deliver mail. This mode of operation is requested with: </p>

<pre>
% <b>/usr/sbin/sendmail -bv address...</b>
Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
</pre>

<li> <p> What happened: deliver mail and report successes and/or
failures, including replies from remote SMTP servers.  This mode
of operation is requested with: </p>

<pre>
% <b>/usr/sbin/sendmail -v address...</b>
Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
</pre>

</ul>

<p> These reports contain information that is generated by Postfix
delivery agents. Since these run as daemon processes that cannot
interact with users directly, the result is sent as mail to the
sender of the test message. The format of these reports is practically
identical to that of ordinary non-delivery notifications. </p>

<p> For a detailed example of a mail delivery status report, see
the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
section at the end of the ADDRESS_REWRITING_README document.  </p>

<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>

<p> A common mistake is to turn on chroot operation in the master.cf
file without going through all the necessary steps to set up a
chroot environment. This causes Postfix daemon processes to fail
due to all kinds of missing files. </p>

<p> The example below shows an SMTP server that is configured with
chroot turned off: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    # =============================================================
    # service type  private unpriv  <b>chroot</b>  wakeup  maxproc command
    #               (yes)   (yes)   <b>(yes)</b>   (never) (100)
    # =============================================================
    smtp      inet  n       -       <b>n</b>       -       -       smtpd
</pre>
</blockquote>

<p> Inspect master.cf for any processes that have chroot operation
not turned off. If you find any, save a copy of the master.cf file,
and edit the entries in question.  After executing the command
"<b>postfix reload</b>", see if the problem has gone away. </p>

<p> If turning off chrooted operation made the problem go away,
then congratulations.  Leaving Postfix running in this way is
adequate for most sites.  If you prefer chrooted operation, see
the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
BASIC_CONFIGURATION_README</a> file for information about how to
prepare Postfix for chrooted operation. </p>

<h2><a name="debug_peer">Verbose logging for specific SMTP
connections</a></h2>

<p> In /etc/postfix/main.cf, list the remote site name or address
in the debug_peer_list parameter. For example, in order to make
the software log a lot of information to the syslog daemon for
connections from or to the loopback interface: </p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debug_peer_list = 127.0.0.1
</pre>
</blockquote>

<p> You can specify one or more hosts, domains, addresses or
net/masks.  To make the change effective immediately, execute the
command "<b>postfix reload</b>". </p>

<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>

<p> This example uses <b>tcpdump</b>. In order to record a conversation
you need to specify a large enough buffer with the "<b>-s</b>"
option or else you will miss some or all of the packet payload.
</p>

<blockquote>
<pre>
# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
</pre>
</blockquote>

<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
use "<b>-s 2000</b>" instead.  </p>

<p> Run this for a while, stop with Ctrl-C when done. To view the
data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
</p>

<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>

<p> Append one or more "<b>-v</b>" options to selected daemon
definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>".
This will cause a lot of activity to be logged to the syslog daemon.
For example, to make the Postfix SMTP server process more verbose: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -v
</pre>
</blockquote>

<p> To diagnose problems with address rewriting specify a "<b>-v</b>"
option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to
diagnose problems with mail delivery specify a "<b>-v</b>"
option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8),
local(8), pipe(8), smtp(8), or virtual(8) delivery agent.  </p>

<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>

<p> Many systems allow you to inspect a running process with a
system call tracer. For example: </p>

<blockquote>
<pre>
# <b>trace -p process-id</b> (SunOS 4)
# <b>strace -p process-id</b> (Linux and many others)
# <b>truss -p process-id</b> (Solaris, FreeBSD)
# <b>ktrace -p process-id</b> (generic 4.4BSD)
</pre>
</blockquote>

<p> Even more informative are traces of system library calls.
Examples: </p>

<blockquote>
<pre>
# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
# <b>sotruss -p process-id</b> (Solaris)
</pre>
</blockquote>

<p> See your system documentation for details. </p>

<p> Tracing a running process can give valuable information about
what a process is attempting to do. This is as much information as
you can get without running an interactive debugger program, as
described in a later section. </p>

<h2><a name="auto_trace">Automatically tracing a Postfix daemon
process</a></h2>

<p> Postfix can attach a call tracer whenever a daemon process
starts.  Call tracers come in several kinds. </p>

<ol>

<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
<b>strace</b>, or <b>ktrace</b>.  These show the communication
between the process and the kernel. </p>

<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
These show calls of library routines, and give a better idea of
what is going on within the process. </p>

</ol>

<p> Append a <b>-D</b> option to the suspect command in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes the call tracer of your choice, for example:
</p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin;
         (truss -p $process_id 2&gt;&amp;1 | logger -p mail.info) &amp; sleep 5
</pre>
</blockquote>

<p> Type "<b>postfix reload</b>" and watch the logfile. </p>

<h2><a name="ddd">Running daemon programs with the interactive
ddd debugger</a></h2>

<p> If you have X Windows installed on the Postfix machine, then
an interactive debugger such as <b>ddd</b> can be convenient.
</p>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes <b>ddd</b>: </p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
         ddd $daemon_directory/$process_name $process_id &amp; sleep 5
</pre>
</blockquote>

<p> Be sure that <b>gdb</b> is in the command search path, and
export <b>XAUTHORITY</b> so that X access control works, for example:
</p>

<blockquote>
<pre>
% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
</pre>
</blockquote>

<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Stop and start the Postfix system.  This is necessary so that
Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
settings. </p>

<p> Whenever the suspect daemon process is started, a debugger
window pops up and you can watch in detail what happens. </p>

<h2><a name="screen">Running daemon programs with the interactive
gdb debugger</a></h2>

<p> If you have the screen command installed on the Postfix machine, then
you can run an interactive debugger such as <b>gdb</b> as follows. </p>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
</p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
        PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
        export HOME; screen -e^tt -dmS $process_name gdb
        $daemon_directory/$process_name $process_id &amp; sleep 2
</pre>
</blockquote>

<p> Be sure that <b>gdb</b> is in the command search path. </p>

<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Execute the command "<b>postfix reload</b>" and wait until a
daemon process is started (you can see this in the maillog file).
</p>

<p> Then attach to the screen, and debug away: </p>

<blockquote>
<pre>
# HOME=/root screen -r
gdb) continue
gdb) where
</pre>
</blockquote>

<h2><a name="gdb">Running daemon programs under a non-interactive
debugger</a></h2>

<p> If you do not have X Windows installed on the Postfix machine,
or if you are not familiar with interactive debuggers, then you
can try to run <b>gdb</b> in non-interactive mode, and have it
print a stack trace when the process crashes.  </p>

<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes the <b>gdb</b> debugger: </p>

<blockquote>
<pre>
/etc/postfix/main.cf:
    debugger_command =
        PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
        where; sleep 8640000) | gdb $daemon_directory/$process_name 
        $process_id 2&gt&amp;1
        &gt;$config_directory/$process_name.$process_id.log &amp; sleep 5
</pre>
</blockquote>

<p> Append a <b>-D</b> option to the suspect daemon in
/etc/postfix/master.cf, for example: </p>

<blockquote>
<pre>
/etc/postfix/master.cf:
    smtp      inet  n       -       n       -       -       smtpd -D
</pre>
</blockquote>

<p> Type "<b>postfix reload</b>" to make the configuration changes
effective.  </p>

<p> Whenever a suspect daemon process is started, an output file
is created, named after the daemon and process ID (for example,
smtpd.12345.log). When the process crashes, a stack trace (with
output from the "<b>where</b>" command) is written to its logfile.
</p>

<h2><a name="unreasonable">Unreasonable behavior</a></h2>

<p> Sometimes the behavior exhibited by Postfix just does not match the
source code. Why can a program deviate from the instructions given
by its author? There are two possibilities. </p>

<ul>

<li> <p> The compiler has erred. This rarely happens. </p>

<li> <p> The hardware has erred. Does the machine have ECC memory? </p>

</ul>

<p> In both cases, the program being executed is not the program
that was supposed to be executed, so anything could happen. </p>

<p> There is a third possibility: </p>

<ul>

<li> <p> Bugs in system software (kernel or libraries). </p>

</ul>

<p> Hardware-related failures usually do not reproduce in exactly
the same way after power cycling and rebooting the system.  There's
little Postfix can do about bad hardware.  Be sure to use hardware
that at the very least can detect memory errors.  Otherwise, Postfix
will just be waiting to be hit by a bit error.  Critical systems
deserve real hardware. </p>

<p> When a compiler makes an error, the problem can be reproduced
whenever the resulting program is run. Compiler errors are most
likely to happen in the code optimizer. If a problem is reproducible
across power cycles and system reboots, it can be worthwhile to
rebuild Postfix with optimization disabled, and to see if optimization
makes a difference. </p>

<p> In order to compile Postfix with optimizations turned off: </p>

<blockquote>
<pre>
% <b>make tidy</b>
% <b>make makefiles OPT=</b>
</pre>
</blockquote>

<p> This produces a set of Makefiles that do not request compiler
optimization.  </p>

<p> Once the makefiles are set up, build the software: </p>

<blockquote>
<pre>
% <b>make</b>
% <b>su</b>
Password:
# <b>make install</b>
</pre>
</blockquote>

<p> If the problem goes away, then it is time to ask your vendor
for help. </p>

<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>

<p> The people who participate on postfix-users@postfix.org
are very helpful, especially if YOU provide them with sufficient
information.  Remember, these volunteers are willing to help, but
their time is limited. </p>

<p> When reporting a problem, be sure to include the following
information. </p>

<ul>

<li> <p> A summary of the problem. Please do not just send some
logging without explanation of what YOU believe is wrong. </p>

<li> <p> Complete error messages. Please use cut-and-paste, or use
attachments, instead of reciting information from memory.
</p>

<li> <p> Postfix logging. See the text at the top of the DEBUG_README
document to find out where logging is stored. Please do not frustrate
the helpers by word wrapping the logging. If the logging is more
than a few kbytes of text, consider posting an URL on a web or ftp
site. </p>

<li> <p> Consider using a test email address so that you don't have
to reveal email addresses or passwords of innocent people. </p>

<li> <p> If you can't use a test email address, please anonymize
email addresses and host names consistently. Replace each letter
by "A", each digit
by "D" so that the helpers can still recognize syntactical errors.
</p>

<li> <p> Command output from:</p>

<ul>

<li> <p> "<b>postconf -n</b>". Please do not send your main.cf file,
or 1000+ lines of <b>postconf</b> command output. </p>

<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p>

</ul>

<li> <p> Better, provide output from the <b>postfinger</b> tool.
This can be found at https://github.com/ford--prefect/postfinger.  </p>

<li> <p> If the problem is SASL related, consider including the
output from the <b>saslfinger</b> tool. This can be found at
https://packages.debian.org/search?keywords=sasl2-bin. </p>

<li> <p> If the problem is about too much mail in the queue, consider
including output from the <b>qshape</b> tool, as described in the
QSHAPE_README file. </p>

<li> <p> If the problem is protocol related (connections time out,
or an SMTP server complains about syntax errors etc.) consider
recording a session with <b>tcpdump</b>, as described in the <a
href="#sniffer">DEBUG_README</a> document.  </ul>

</body>

</html>