summaryrefslogtreecommitdiffstats
path: root/proto/STRESS_README.html
blob: 4935226d20612a535bac41fea4f48324e5cebe82 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<title>Postfix Stress-Dependent Configuration</title>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel='stylesheet' type='text/css' href='postfix-doc.css'>

</head>

<body>

<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
Stress-Dependent Configuration</h1>

<hr>

<h2>Overview </h2>

<p> This document describes the symptoms of Postfix SMTP server
overload. It presents permanent main.cf changes to avoid overload
during normal operation, and temporary main.cf changes to cope with
an unexpected burst of mail. This document makes specific suggestions
for Postfix 2.5 and later which support stress-adaptive behavior,
and for earlier Postfix versions that don't.  </p>

<p> Topics covered in this document: </p>

<ul>

<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> 

<li><a href="#adapt"> Automatic stress-adaptive behavior </a>

<li><a href="#concurrency"> Service more SMTP clients at the same time </a> 

<li><a href="#time"> Spend less time per SMTP client </a>

<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>

<li><a href="#legacy"> Temporary measures for older Postfix releases </a>

<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>

<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>

<li><a href="#other"> Other measures to off-load zombies </a>

<li><a href="#credits"> Credits </a>

</ul>

<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>

<p> Under normal conditions, the Postfix SMTP server responds
immediately when an SMTP client connects to it; the time to deliver
mail is noticeable only with large messages.  Performance degrades
dramatically when the number of SMTP clients exceeds the number of
Postfix SMTP server processes.  When an SMTP client connects while
all Postfix SMTP server processes are busy, the client must wait
until a server process becomes available. </p>

<p> SMTP server overload may be caused by a surge of legitimate
mail (example: a DNS registrar opens a new zone for registrations),
by mistake (mail explosion caused by a forwarding loop) or by malice
(worm outbreak, botnet, or other illegitimate activity).  </p>

<p> Symptoms of Postfix SMTP server overload are: </p>

<ul>

<li> <p> Remote SMTP clients experience a long delay before Postfix
sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>

<ul>

<li> <p> NOTE: Broken DNS configurations can also cause lengthy
delays before Postfix sends "220 hostname.example.com ...". These
delays also exist when Postfix is NOT overloaded.  </p>

<li> <p> NOTE:  To avoid "overload" delays for end-user mail
clients, enable the "submission" service entry in master.cf (present
since Postfix 2.1), and tell users to connect to this instead of
the public SMTP service. </p>

</ul>

<li> <p> The Postfix SMTP server logs an increased number of "lost
connection after CONNECT" events. This happens because remote SMTP
clients disconnect before Postfix answers the connection. </p>

<ul>

<li> <p> NOTE: A portscan for open SMTP ports can also result in
"lost connection ..." logfile messages. </p>

</ul>

<li> <p> Postfix 2.3 and later logs a warning that all server ports
are busy: </p>

<pre>
Oct  3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
 (25) has reached its process limit "30": new clients may experience
 noticeable delays
Oct  3 20:39:27 spike postfix/master[28905]: warning: to avoid this
 condition, increase the process count in master.cf or reduce the
 service time per client
Oct  3 20:39:27 spike postfix/master[28905]: warning: see
  <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of
  stress-adapting configuration settings
</pre>

</ul>

<p> Legitimate mail that doesn't get through during an episode of
Postfix SMTP server overload is not necessarily lost. It should
still arrive once the situation returns to normal, as long as the
overload condition is temporary.  </p>

<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>

<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
It works as follows. When a "public" network service such as the
SMTP server runs into an "all server ports are busy" condition, the
Postfix master(8) daemon logs a warning, restarts the service
(without interrupting existing network sessions), and runs the
service with "-o stress=yes" on the server process command line:
</p>

<blockquote>
<pre>
80821  ??  S      0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
</pre>
</blockquote>

<p> Normally, the Postfix master(8) daemon runs such a service with
"-o stress=" on the command line (i.e.  with an empty parameter
value):  </p>

<blockquote>
<pre>
83326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
</pre>
</blockquote>

<p> You won't see "-o stress" command-line parameters with services
that have local clients only. These include services internal to
Postfix such as the queue manager, and services that listen on a
loopback interface only, such as after-filter SMTP services.  </p>

<p> The "stress" parameter value is the key to making main.cf
parameter settings stress adaptive. The following settings are the
default with Postfix 2.6 and later. </p>

<blockquote>
<pre>
1 smtpd_timeout = ${stress?{10}:{300}}s
2 smtpd_hard_error_limit = ${stress?{1}:{20}}
3 smtpd_junk_command_limit = ${stress?{1}:{100}}
4 # Parameters added after Postfix 2.6:
5 smtpd_per_record_deadline = ${stress?{yes}:{no}}
6 smtpd_starttls_timeout = ${stress?{10}:{300}}s
7 address_verify_poll_count = ${stress?{1}:{3}}
</pre>
</blockquote>

<p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y}
instead of the newer form ${stress?{x}:{y}}. </p>

<p> The syntax of ${name?{value}:{value}}, ${name?value} and
${name:value} is explained at the beginning of the postconf(5)
manual page. </p>

<p> Translation: <p>

<ul>

<li> <p> Line 1: under conditions of stress, use an smtpd_timeout
value of 10 seconds instead of the default 300 seconds. Experience
on the postfix-users list from a variety of sysadmins shows that
reducing the "normal" smtpd_timeout to 60s is unlikely to affect
legitimate clients. However, it is unlikely to become the Postfix
default because it's not RFC compliant. Setting smtpd_timeout to
10s or even 5s under stress will still allow most
legitimate clients to connect and send mail, but may delay mail
from some clients. No mail should be lost, as long as this measure
is used only temporarily. </p>

<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit
of 1 instead of the default 20. This disconnects clients
after a single error, giving other clients a chance to connect.
However, this may cause significant delays with legitimate mail,
such as a mailing list that contains a few no-longer-active user
names that didn't bother to unsubscribe. No mail should be lost,
as long as this measure is used only temporarily. </p>

<li> <p> Line 3: under conditions of stress, use an
smtpd_junk_command_limit of 1 instead of the default 100. This
prevents clients from keeping connections open by repeatedly
sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p>

<li> <p> Line 5: under conditions of stress, change the behavior
of smtpd_timeout and smtpd_starttls_timeout, from a time limit per
read or write system call, to a time limit to send or receive a
complete record (an SMTP command line, SMTP response line, SMTP
message content line, or TLS protocol message). </p>

<li> <p> Line 6: under conditions of stress, reduce the time limit
for TLS protocol handshake messages to 10 seconds, from the default
value of 300 seconds. See also the smtpd_timeout discussion above.
</p>

<li> <p> Line 7: under conditions of stress, do not wait up to 6
seconds for the completion of an address verification probe. If the
result is not already in the address verification cache, reply
immediately with $unverified_recipient_tempfail_action or
$unverified_sender_tempfail_action. No mail should be lost, as long
as this measure is used only temporarily.  </p>

</ul>

<p> NOTE: Please keep in mind that the stress-adaptive feature is
a fairly desperate measure to keep <b>some</b> legitimate mail
flowing under overload conditions.  If a site is reaching the SMTP
server process limit when there isn't an attack or bot flood
occurring, then either the process limit needs to be raised or more
hardware needs to be added.  </p>

<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>

<p> This section and the ones that follow discuss permanent measures
against mail server overload.  </p>

<p> One measure to avoid the "all server processes busy" condition
is to service more SMTP clients simultaneously. For this you need
to increase the number of Postfix SMTP server processes. This will
improve the
responsiveness for remote SMTP clients, as long as the server machine
has enough hardware and software resources to run the additional
processes, and as long as the file system can keep up with the
additional load. </p>

<ul>

<li> <p> You increase the number of SMTP server processes either
by increasing the default_process_limit in main.cf (line 3 below),
or by increasing the SMTP server's "maxproc" field in master.cf
(line 10 below).  Either way, you need to issue a "postfix reload"
command to make the change effective.  </p>

<li> <p> Process limits above 1000 require Postfix version 2.4 or
later, and an operating system that supports kernel-based event
filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
</p>

<li> <p> More processes use more memory. You can reduce the Postfix
memory footprint by using cdb:
lookup tables instead of Berkeley DB's hash: or btree: tables. </p>

<pre>
 1 /etc/postfix/main.cf:
 2     # Raise the global process limit, 100 since Postfix 2.0.
 3     default_process_limit = 200
 4
 5 /etc/postfix/master.cf:
 6     # =============================================================
 7     # service type  private unpriv  chroot  wakeup  maxproc command
 8     # =============================================================
 9     # Raise the SMTP service process limit only.
10     smtp      inet  n       -       n       -       200     smtpd
</pre>

<li> <p> NOTE: older versions of the SMTPD_POLICY_README document
contain a mistake: they configure a fixed number of policy daemon
processes.  When you raise the SMTP server's "maxproc" field in
master.cf, SMTP server processes will report problems when connecting
to policy server processes, because there aren't enough of them.
Examples of errors are "connection refused" or "operation timed
out".  </p>

<p> To fix, edit master.cf and specify a zero "maxproc" field
in all policy server entries; see line 6 in the example below.
Issue a "postfix reload" command to make the change effective.  </p>

<pre>
1 /etc/postfix/master.cf:
2     # =============================================================
3     # service type  private unpriv  chroot  wakeup  maxproc command
4     # =============================================================
5     # Disable the policy service process limit.
6     policy    unix  -       n       n       -       0       spawn
7         user=nobody argv=/some/where/policy-server
</pre>

</ul>

<h2><a name="time"> Spend less time per SMTP client </a></h2>

<p> When increasing the number of SMTP server processes is not
practical, you can improve Postfix server responsiveness by eliminating
delays.  When Postfix spends less time per SMTP session, the same
number of SMTP server processes can service more clients in a given
amount of time. </p>

<ul>

<li> <p> Eliminate non-functional RBL lookups (blocklists that are
no longer in operation). These lookups can degrade performance.
Postfix logs a warning when an RBL server does not respond. </p>

<li> <p> Eliminate redundant RBL lookups (people often use multiple
Spamhaus RBLs that include each other).  To find out whether RBLs
include other RBLs, look up the websites that document the RBL's
policies. </p>

<li> <p> Eliminate header_checks and body_checks, and keep just a few
emergency patterns to block the latest worm explosion or backscatter
mail.  See BACKSCATTER_README for examples of the latter.

<li> <p> Group your header_checks and body_checks patterns to avoid
unnecessary pattern matching operations:

<pre>
 1  /etc/postfix/header_checks:
 2      if /^Subject:/
 3      /^Subject: virus found in mail from you/ reject
 4      /^Subject: ..other../ reject
 5      endif
 6  
 7      if /^Received:/
 8      /^Received: from (postfix\.org) / reject forged client name in received header: $1
 9      /^Received: from ..other../ reject ....
10      endif
</pre>

</ul>

<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>

<p> Under conditions of overload you can improve Postfix SMTP server
responsiveness by hanging up on suspicious clients, so that other
clients get a chance to talk to Postfix.  </p>

<ul>

<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
RBLs (see next bullet) or that match selected non-RBL restrictions
such as SMTP access maps.  The Postfix SMTP server will reject mail
and disconnect without waiting for the remote SMTP client to send
a QUIT command.  </p>

<li> <p> To hang up connections from denylisted zombies, you can
set specific Postfix SMTP server reject codes for specific RBLs,
and for individual responses from specific RBLs. We'll use
zen.spamhaus.org as an example; by the time you read this document,
details may have changed.  Right now, their documents say that a
response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
address, which means that the machine is probably running a bot of
some kind.  To give a 521 response instead of the default 554
response, use something like: </p>

<pre>
 1  /etc/postfix/main.cf:
 2      smtpd_client_restrictions =
 3         permit_mynetworks
 4         reject_rbl_client zen.spamhaus.org=127.0.0.10
 5         reject_rbl_client zen.spamhaus.org=127.0.0.11
 6         reject_rbl_client zen.spamhaus.org
 7  
 8      rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
 9  
10  /etc/postfix/rbl_reply_maps:
11      # With Postfix 2.3-2.5 use "421" to hang up connections.
12      zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
13       $rbl_class [$rbl_what] blocked using
14       $rbl_domain${rbl_reason?; $rbl_reason}
15  
16      zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
17       $rbl_class [$rbl_what] blocked using
18       $rbl_domain${rbl_reason?; $rbl_reason}
</pre>

<p> Although the above example shows three RBL lookups (lines 4-6),
Postfix will only do a single DNS query, so it does not affect the
performance. </p>

<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
cause Postfix to disconnect). The down-side of replying with 421
is that it works only for zombies and other malware. If the client
is running a real MTA, then it may connect again several times until
the mail expires in its queue. When this is a problem, stick with
the default 554 reply, and use "smtpd_hard_error_limit = 1" as
described below.  </p>

<li> <p> You can automatically turn on the above overload measure
with Postfix 2.5 and later, or with earlier releases that contain
the stress-adaptive behavior source code patch from the mirrors
listed at http://www.postfix.org/download.html. Simply replace line
above 8 with: </p>

<pre>
 8      rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
</pre>

</ul>

<p> More information about automatic stress-adaptive behavior is
in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
</p>

<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>

<p> See the section "<a href="#adapt">Automatic stress-adaptive
behavior</a>" if you are running Postfix version 2.5 or later, or
if you have applied the source code patch for stress-adaptive
behavior from the mirrors listed at http://www.postfix.org/download.html.
</p>

<p> The following measures can be applied temporarily during overload.
They still allow <b>most</b> legitimate clients to connect and send
mail, but may affect some legitimate clients. </p>

<ul>

<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the
postfix-users list from a variety of sysadmins shows that reducing
the "normal" smtpd_timeout to 60s is unlikely to affect legitimate
clients. However, it is unlikely to become the Postfix default
because it's not RFC compliant. Setting smtpd_timeout to 10s (line
2 below) or even 5s under stress will still allow <b>most</b>
legitimate clients to connect and send mail, but may delay mail
from some clients.  No mail should be lost, as long as this measure
is used only temporarily.  </p>

<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this
to 1 under stress (line 3 below) helps by disconnecting clients
after a single error, giving other clients a chance to connect.
However, this may cause significant delays with legitimate mail,
such as a mailing list that contains a few no-longer-active user
names that didn't bother to unsubscribe. No mail should be lost,
as long as this measure is used only temporarily. </p>

<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default
100. This prevents clients from keeping idle connections open by
repeatedly sending NOOP or RSET commands. </p>

</ul>

<blockquote>
<pre>
1  /etc/postfix/main.cf:
2      smtpd_timeout = 10
3      smtpd_hard_error_limit = 1
4      smtpd_junk_command_limit = 1
</pre>
</blockquote>

<p> With these measures, no mail should be lost, as long
as these measures are used only temporarily. The next section of
this document introduces a way to automate this process. </p>

<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>

<p> To find out if your Postfix installation supports stress-adaptive
behavior, use the "ps" command, and look for the smtpd processes.
Postfix has stress-adaptive support when you see "-o stress=" or
"-o stress=yes" command-line options. Remember that Postfix never
enables stress-adaptive behavior on servers that listen on local
addresses only. </p>

<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
and other System-V flavors, use "ps -ef" instead of "ps ax". </p>

<blockquote>
<pre>
$ ps ax|grep smtpd
83326  ??  S      0:00.28 smtpd -n smtp -t inet -u -c -o stress=
84345  ??  Ss     0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
</pre>
</blockquote>

<p> You can't use postconf(1) to detect stress-adaptive support.
The postconf(1) command ignores the existence of the stress parameter
in main.cf, because the parameter has no effect there.  Command-line
"-o parameter" settings always take precedence over main.cf parameter
settings.  <p>

<p> If you configure stress-adaptive behavior in main.cf when it
isn't supported, nothing bad will happen.  The processes will run
as if the stress parameter always has an empty value. </p>

<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>

<p> You can manually force stress-adaptive behavior on, by adding
a "-o stress=yes" command-line option in master.cf. This can be
useful for testing overrides on the SMTP service. Issue "postfix
reload" to make the change effective.  </p>

<p> Note: setting the stress parameter in main.cf has no effect for
services that accept remote connections. </p>

<blockquote>
<pre>
1 /etc/postfix/master.cf:
2     # =============================================================
3     # service type  private unpriv  chroot  wakeup  maxproc command
4     # =============================================================
5     # 
6     smtp      inet  n       -       n       -       -       smtpd
7         -o stress=yes
8         -o . . .
</pre>
</blockquote>

<p> To permanently force stress-adaptive behavior off with a specific
service, specify "-o stress=" on its master.cf command line.  This
may be desirable for the "submission" service. Issue "postfix reload"
to make the change effective.  </p>

<p> Note: setting the stress parameter in main.cf has no effect for
services that accept remote connections. </p>

<blockquote>
<pre>
1 /etc/postfix/master.cf:
2     # =============================================================
3     # service type  private unpriv  chroot  wakeup  maxproc command
4     # =============================================================
5     # 
6     submission inet n       -       n       -       -       smtpd
7         -o stress=
8         -o . . .
</pre>
</blockquote>

<h2><a name="other"> Other measures to off-load zombies </a> </h2>

<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides
additional protection against mail server overload. One postscreen(8)
process handles multiple inbound SMTP connections, and decides which
clients may talk to a Postfix SMTP server process.  By keeping
spambots away, postscreen(8) leaves more SMTP server processes
available for legitimate clients, and delays the onset of server
overload conditions. </p>

<h2><a name="credits"> Credits </a></h2>

<ul>

<li>  Thanks to the postfix-users mailing list members for sharing
early experiences with the stress-adaptive feature.

<li>  The RBL example and several other paragraphs of text were
adapted from postfix-users postings by Noel Jones.

<li>  Wietse implemented stress-adaptive behavior as the smallest
possible patch while he should be working on other things.

</ul>

</body> </html>