summaryrefslogtreecommitdiffstats
path: root/README_FILES/TUNING_README
blob: 10801646c9a8260bd944b5ec3d410f058a0e4c8e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
 PPoossttffiixx PPeerrffoorrmmaannccee TTuunniinngg

-------------------------------------------------------------------------------

PPuurrppoossee ooff PPoossttffiixx ppeerrffoorrmmaannccee ttuunniinngg

The hints and tips in this document help you improve the performance of Postfix
systems that already work. If your Postfix system is unable to receive or
deliver mail, then you need to solve those problems first, using the
DEBUG_README document as guidance.

For tuning external content filter performance, first read the respective
information in the FILTER_README and SMTPD_PROXY_README documents. Then make
sure to avoid latency in the content filter code. As much as possible avoid
performing queries against external data sources with a high or highly variable
delay. Your content filter will run with a small concurrency to avoid CPU/
memory starvation, and if any latency creeps in, content filter throughput will
suffer. High volume environments should avoid RBL lookups, complex database
queries and so on.

Topics on mail receiving performance:

  * General mail receiving performance tips
  * Doing more work with your SMTP server processes
  * Slowing down SMTP clients that make many errors
  * Measures against clients that make too many connections

Topics on mail delivery performance:

  * General mail delivery performance tips
  * Tuning the frequency of deferred mail delivery attempts
  * Tuning the number of simultaneous deliveries
  * Tuning the number of recipients per delivery

Other Postfix performance tuning topics:

  * Tuning the number of Postfix processes
  * Tuning the number of processes on the system
  * Tuning the number of open files or sockets

The following tools can be used to measure mail system performance under
artificial loads. They are normally not installed with Postfix.

  * smtp-source, SMTP/LMTP message generator
  * smtp-sink, SMTP/LMTP message dump
  * qmqp-source, QMQP message generator
  * qmqp-sink, QMQP message dump

GGeenneerraall mmaaiill rreecceeiivviinngg ppeerrffoorrmmaannccee ttiippss

  * Read and understand the maildrop queue, incoming queue, and active queue
    discussions in the QSHAPE_README document.

  * Run a local name server to reduce slow-down due to DNS lookups. If you run
    multiple Postfix systems, point each local name server to a shared
    forwarding server to reduce the number of lookups across the upstream
    network link.

  * Eliminate unnecessary LDAP lookups, by specifying a domain filter. This
    eliminates lookups for addresses in remote domains, and eliminates lookups
    of partial addresses. See ldap_table(5) for details.

When Postfix responds slowly to SMTP clients:

  * Look for obvious signs of trouble as described in the DEBUG_README
    document, and eliminate those problems first.

  * Turn off your header_checks and body_checks patterns and see if the problem
    goes away.

  * Turn off chroot operation as described in the DEBUG_README document and see
    if the problem goes away.

  * If Postfix logs the SMTP client as "unknown" then you have a name service
    problem: the name server is bad, or the resolv.conf file contains bad
    information, or some packet filter is blocking the DNS requests or replies.

  * If the number of smtpd(8) processes has reached the process limit as
    specified in master.cf, new SMTP clients must wait until a process becomes
    available. See the STRESS_README and POSTSCREEN_README documents for
    measures that help to prevent SMTP server overload.

DDooiinngg mmoorree wwoorrkk wwiitthh yyoouurr SSMMTTPP sseerrvveerr pprroocceesssseess

With Postfix versions 2.0 and earlier, the smtpd(8) server pauses before
reporting an error to an SMTP client. The idea is called tar pitting. However,
these delays also slow down Postfix. When the smtpd(8) server replies slowly,
sessions take more time, so that more smtpd(8) server processes are needed to
handle the load. When your Postfix smtpd(8) server process limit is reached,
new clients must wait until a server process becomes available. This means that
all clients experience poor performance.

You can speed up the handling of smtpd(8) server error replies by turning off
the delay:

    /etc/postfix/main.cf:
        # Not needed with Postfix 2.1
        smtpd_error_sleep_time = 0

With the above setting, Postfix 2.0 and earlier can serve more SMTP clients
with the same number SMTP server processes. The next section describes how
Postfix deals with clients that make a large number of errors.

SSlloowwiinngg ddoowwnn SSMMTTPP cclliieennttss tthhaatt mmaakkee mmaannyy eerrrroorrss

The Postfix smtpd(8) server maintains a per-session error count. The error
count is reset when a message is transferred successfully, and is incremented
when a client request is unrecognized or unimplemented, when a client request
violates access restrictions, or when some other error happens.

As the per-session error count increases, the smtpd(8) server changes behavior
and begins to insert delays into the responses. The idea is to slow down a run-
away client in order to limit resource usage. The behavior is Postfix version
dependent.

IMPORTANT: These delays slow down Postfix, too. When too much delay is
configured, the number of simultaneous SMTP sessions will increase until it
reaches the smtpd(8) server process limit, and new SMTP clients must wait until
an smtpd(8) server process becomes available.

Postfix version 2.1 and later:

  * When the error count reaches $smtpd_soft_error_limit (default: 10), the
    Postfix smtpd(8) server delays all non-error and error responses by
    $smtpd_error_sleep_time seconds (default: 1 second).

  * When the error count reaches $smtpd_hard_error_limit (default: 20) the
    Postfix smtpd(8) server breaks the connection.

Postfix version 2.0 and earlier:

  * When the error count is less than $smtpd_soft_error_limit (default: 10) the
    Postfix smtpd(8) server delays all error replies by $smtpd_error_sleep_time
    (1 second with Postfix 2.0, 5 seconds with Postfix 1.1 and earlier).

  * When the error count reaches $smtpd_soft_error_limit, the Postfix smtpd(8)
    server delays all responses by "error count" seconds or
    $smtpd_error_sleep_time, whichever is more.

  * When the error count reaches $smtpd_hard_error_limit (default: 20) the
    Postfix smtpd(8) server breaks the connection.

MMeeaassuurreess aaggaaiinnsstt cclliieennttss tthhaatt mmaakkee ttoooo mmaannyy ccoonnnneeccttiioonnss

Note: these features use the Postfix anvil(8) service, introduced with Postfix
version 2.2.

The Postfix smtpd(8) server can limit the number of simultaneous connections
from the same SMTP client, as well as the connection rate and the rate of
certain SMTP commands from the same client. These statistics are maintained by
the anvil(8) server (translation: if anvil(8) breaks, then connection limits
stop working).

IMPORTANT: These limits must not be used to regulate legitimate traffic: mail
will suffer grotesque delays if you do so. The limits are designed to protect
the smtpd(8) server against abuse by out-of-control clients.

    smtpd_client_connection_count_limit (default: 50)
        The maximum number of connections that an SMTP client may make
        simultaneously.
    smtpd_client_connection_rate_limit (default: no limit)
        The maximum number of connections that an SMTP client may make in the
        time interval specified with anvil_rate_time_unit (default: 60s).
    smtpd_client_message_rate_limit (default: no limit)
        The maximum number of message delivery requests that an SMTP client may
        make in the time interval specified with anvil_rate_time_unit (default:
        60s).
    smtpd_client_recipient_rate_limit (default: no limit)
        The maximum number of recipient addresses that an SMTP client may
        specify in the time interval specified with anvil_rate_time_unit
        (default: 60s).
    smtpd_client_new_tls_session_rate_limit (default: no limit)
        The maximum number of new TLS sessions (without using the TLS session
        cache) that an SMTP client may negotiate in the time interval specified
        with anvil_rate_time_unit (default: 60s).
    smtpd_client_auth_rate_limit (default: no limit)
        The maximum number of AUTH commands that an SMTP client may send in the
        time interval specified with anvil_rate_time_unit (default: 60s).
        Available in Postfix 3.1 and later.
    smtpd_client_event_limit_exceptions (default: $mynetworks)
        SMTP clients that are excluded from connection and rate limits
        specified above.

GGeenneerraall mmaaiill ddeelliivveerryy ppeerrffoorrmmaannccee ttiippss

  * Read and understand the maildrop queue, incoming queue, active queue and
    deferred queue discussions in the QSHAPE_README document.

  * In case of slow delivery, run the qshape tool as described in the
    QSHAPE_README document.

  * Submit multiple recipients per message instead of submitting messages with
    only a few recipients.

  * Submit mail via SMTP instead of /usr/sbin/sendmail. You may have to adjust
    the smtpd_recipient_limit parameter setting.

  * Don't overwhelm the disk with mail submissions. Optimize the mail
    submission rate by tuning the number of parallel submissions and/or by
    tuning the Postfix in_flow_delay parameter setting.

  * Run a local name server to reduce slow-down due to DNS lookups. If you run
    multiple Postfix systems, point each local name server to a shared
    forwarding server to reduce the number of lookups across the upstream
    network link.

  * Reduce the smtp_connect_timeout and smtp_helo_timeout values so that
    Postfix does not waste lots of time connecting to non-responding remote
    SMTP servers.

  * Use a dedicated mail delivery transport for problematic destinations, with
    reduced timeouts and with adjusted concurrency. See "Tuning the number of
    simultaneous deliveries" below.

  * Use a fallback_relay host for mail that cannot be delivered upon the first
    attempt. This "graveyard" machine can use shorter retry times for difficult
    to reach destinations. See "Tuning the frequency of deferred mail delivery
    attempts" below.

  * Speed up disk updates with a large (64MB) persistent write cache. This
    allows disk updates to be sorted for optimal access speed without
    compromising file system integrity when the system crashes.

  * Use a solid-state disk (a persistent RAM disk). This is an expensive
    solution that should be used in combination with short SMTP timeouts and a
    fallback_relay "graveyard" machine that delivers mail for problem
    destinations.

TTuunniinngg tthhee nnuummbbeerr ooff ssiimmuullttaanneeoouuss ddeelliivveerriieess

Although Postfix can be configured to run 1000 SMTP client processes at the
same time, it is rarely desirable that it makes 1000 simultaneous connections
to the same remote system. For this reason, Postfix has safety mechanisms in
place to avoid this so-called "thundering herd" problem.

The Postfix queue manager implements the analog of the TCP slow start flow
control strategy: when delivering to a site, send a small number of messages
first, then increase the concurrency as long as all goes well; reduce
concurrency in the face of congestion.

  * The initial_destination_concurrency parameter (default: 5) controls how
    many messages are initially sent to the same destination before adapting
    delivery concurrency. Of course, this setting is effective only as long as
    it does not exceed the process limit and the destination concurrency limit
    for the specific mail transport channel.

  * The default_destination_concurrency_limit parameter (default: 20) controls
    how many messages may be sent to the same destination simultaneously. You
    can override this setting for specific message delivery transports by
    taking the name of the master.cf entry and appending
    "_destination_concurrency_limit".

Examples of transport specific concurrency limits are:

  * The local_destination_concurrency_limit parameter (default: 2) controls how
    many messages are delivered simultaneously to the same local recipient. The
    recommended limit is low because delivery to the same mailbox must happen
    sequentially, so massive parallelism is not useful. Another good reason to
    limit delivery concurrency to the same recipient: if the recipient has an
    expensive shell command in her .forward file, or if the recipient is a
    mailing list manager, you don't want to run too many instances of those
    processes at the same time.

  * The default smtp_destination_concurrency_limit of 20 seems enough to
    noticeably load a system without bringing it to its knees. Be careful when
    changing this to a much larger number.

The above default values of the concurrency limits work well in a broad range
of situations. Knee-jerk changes to these parameters in the face of congestion
can actually make problems worse. Specifically, large destination concurrencies
should never be the default. They should be used only for transports that
deliver mail to a small number of high volume domains.

A common situation where high concurrency is called for is on gateways relaying
a high volume of mail between the Internet and an intranet mail environment.
Approximately half the mail (assuming equal volumes inbound and outbound) will
be destined for the internal mail hubs. Since the internal mail hubs will be
receiving all external mail exclusively from the gateway, it is reasonable to
configure the gateway to make greater demands on the capacity of the internal
SMTP servers.

The tuning of the inbound concurrency limits need not be trial and error. A
high volume capable mailhub should be able to easily handle 50 or 100 (rather
than the default 20) simultaneous connections, especially if the gateway
forwards to multiple MX hosts. When all MX hosts are up and accepting
connections in a timely fashion, throughput will be high. If any MX host is
down and completely unresponsive, the average connection latency rises to at
least 1/N * $smtp_connect_timeout, if there are N MX hosts. This limits
throughput to at most the destination concurrency * N / $smtp_connect_timeout.

For example, with a destination concurrency of 100 and 2 MX hosts, each host
will handle up to 50 simultaneous connections. If one MX host is down and the
default SMTP connection timeout is 30s, the throughput limit is 100 * 2 / 30 ~=
6 messages per second. This suggests that high volume destinations with good
connectivity and multiple MX hosts need a lower connection timeout, values as
low as 5s or even 1s can be used to prevent congestion when one or more, but
not all MX hosts are down.

If necessary, set a higher transport_destination_concurrency_limit (in main.cf
since this is a queue manager parameter) and a lower smtp_connect_timeout (with
a "-o" override in master.cf since this parameter has no per-transport name)
for the relay transport and any transports dedicated for specific high volume
destinations.

TTuunniinngg tthhee nnuummbbeerr ooff rreecciippiieennttss ppeerr ddeelliivveerryy

The default_destination_recipient_limit parameter (default: 50) controls how
many recipients a Postfix delivery agent will send with each copy of an email
message. You can override this setting for specific Postfix delivery agents.
For example, "uucp_destination_recipient_limit = 100" would limit the number of
recipients per UUCP delivery to 100.

If an email message exceeds the recipient limit for some destination, the
Postfix queue manager breaks up the list of recipients into smaller lists.
Postfix will attempt to send multiple copies of the message in parallel.

IMPORTANT: Be careful when increasing the recipient limit per message delivery;
some SMTP servers abort the connection when they run out of memory or when a
hard recipient limit is reached, so that the message will never be delivered.

The smtpd_recipient_limit parameter (default: 1000) controls how many
recipients the Postfix smtpd(8) server will take per delivery. The default
limit is more than any reasonable SMTP client would send. The limit exists to
protect the local mail system against a run-away client.

TTuunniinngg tthhee ffrreeqquueennccyy ooff ddeeffeerrrreedd mmaaiill ddeelliivveerryy aatttteemmppttss

When a Postfix delivery agent (smtp(8), local(8), etc.) is unable to deliver a
message it may blame the message itself, or it may blame the receiving party.

  * When the delivery agent blames the message, the queue manager gives the
    queue file a time stamp into the future, so it won't be looked at for a
    while. By default, the amount of time to cool down is the amount of time
    that has passed since the message arrived. This results in so-called
    exponential backoff behavior.

  * When the delivery agent blames the receiving party (for example a local
    recipient user, or a remote host), the queue manager not only advances the
    queue file time stamp, but also puts the receiving party on a "dead" list
    so that it will be skipped for some amount of time.

This process is governed by a bunch of little parameters.

    queue_run_delay (default: 300 seconds; before Postfix 2.4: 1000s)
        How often the queue manager scans the queue for deferred mail.
    minimal_backoff_time (default: 300 seconds; before Postfix 2.4: 1000s)
        The minimal amount of time a message won't be looked at, and the
        minimal amount of time to stay away from a "dead" destination.
    maximal_backoff_time (default: 4000 seconds)
        The maximal amount of time a message won't be looked at after a
        delivery failure.
    maximal_queue_lifetime (default: 5 days)
        How long a message stays in the queue before it is sent back as
        undeliverable. Specify 0 for mail that should be returned immediately
        after the first unsuccessful delivery attempt.
    bounce_queue_lifetime (default: 5 days, available with Postfix version 2.1
    and later)
        How long a MAILER-DAEMON message stays in the queue before it is
        considered undeliverable. Specify 0 for mail that should be tried only
        once.
    qmgr_message_recipient_limit (default: 20000)
        The size of many in-memory queue manager data structures. Among others,
        this parameter limits the size of the short-term, in-memory list of
        "dead" destinations. Destinations that don't fit the list are not
        added.
    transport_destination_concurrency_failed_cohort_limit
        Controls when a destination is considered "dead". This parameter is
        critical with a non-zero transport_destination_rate_delay, with a
        reduced transport_destination_concurrency_limit, or with a reduced
        initial_destination_concurrency.

IMPORTANT: If you increase the frequency of deferred mail delivery attempts, or
if you flush the deferred mail queue frequently, then you may find that Postfix
mail delivery performance actually becomes worse. The symptoms are as follows:

  * The active queue becomes saturated with mail that has delivery problems.
    New mail enters the active queue only when an old message is deferred. This
    is a slow process that usually requires timing out one or more SMTP
    connections.

  * All available Postfix delivery agents become occupied trying to connect to
    unreachable sites etc. New mail has to wait until a delivery agent becomes
    available. This is a slow process that usually requires timing out one or
    more SMTP connections.

When mail is being deferred frequently, fixing the problem is always better
than increasing the frequency of delivery attempts. However, if you can control
only the delivery attempt frequency, consider using a dedicated fallback_relay
"graveyard" machine for bad destinations, so that these destinations do not
ruin the performance of normal mail deliveries.

TTuunniinngg tthhee nnuummbbeerr ooff PPoossttffiixx pprroocceesssseess

The default_process_limit configuration parameter gives direct control over how
many daemon processes Postfix will run. As of Postfix 2.0 the default limit is
100 SMTP client processes, 100 SMTP server processes, and so on. This may
overwhelm systems with little memory, as well as networks with low bandwidth.

You can change the global process limit by specifying a non-default
default_process_limit in the main.cf file. For example, to run up to 10 SMTP
client processes, 10 SMTP server processes, and so on:

    /etc/postfix/main.cf:
        default_process_limit = 10

You need to execute "postfix reload" to make the change effective. This limit
is enforced by the Postfix master(8) daemon which does not automatically read
main.cf when it changes.

You can override the process limit for specific Postfix daemons by editing the
master.cf file. For example, if you do not wish to receive 100 SMTP messages at
the same time, but do not want to change the process limits for other Postfix
daemons, you could specify:

    /etc/postfix/master.cf:
        # ====================================================================
        # service type  private unpriv  chroot  wakeup  maxproc command + args
        #               (yes)   (yes)   (yes)   (never) (100)
        # ====================================================================
        . . .
        smtp      inet  n       -       -       -       10      smtpd
        . . .

TTuunniinngg tthhee nnuummbbeerr ooff pprroocceesssseess oonn tthhee ssyysstteemm

  * MacOS X will run out of process slots when you increase Postfix process
    limits. The following works with OSX 10.4 and OSX 10.5.

    MacOS X kernel parameters can be specified in /etc/sysctl.conf.

    /etc/sysctl.conf:
        kern.maxproc=2048
        kern.maxprocperuid=2048

    Unfortunately these can't simply be set on the fly with "sysctl -w". You
    also have to set the following in /etc/launchd.conf so that the root user
    after boot will have the right process limit (2048). Otherwise you have to
    always run ulimit -u 2048 as root, then start a user shell, and then start
    processes for things to take effect.

    /etc/launchd.conf:
        limit maxproc 2048

    Once these are in place, reboot the system. After that, the limits will
    stay in place.

TTuunniinngg tthhee nnuummbbeerr ooff ooppeenn ffiilleess oorr ssoocckkeettss

When Postfix opens too many files or sockets, processes will abort with fatal
errors, and the system may log "file table full" errors.

  * Depending on your Postfix and operating system versions you may need to
    recompile Postfix if you need more than 1024 file descriptors per process:

      o No recompilation is needed for Postfix version 2.4 and later, when it
        was compiled for systems that support BSD kqueue(2) (FreeBSD 4.1,
        NetBSD 2.0, OpenBSD 2.9), Solaris 8 /dev/poll, or Linux 2.6 epoll(4).

      o Otherwise, Postfix needs to be recompiled to override the default
        FD_SETSIZE value.

  * Reduce the number of processes as described under "Tuning the number of
    Postfix processes" above. Fewer processes need fewer open files and
    sockets.

  * Configure the kernel for more open files and sockets. The details are
    extremely system dependent and change with the operating system version. Be
    sure to verify the following information with your system tuning guide:

      o Some FreeBSD kernel parameters can be specified in /boot/loader.conf,
        and some can be specified in /etc/sysctl.conf or changed with sysctl
        commands. Which is which depends on the version.

        kern.ipc.maxsockets="5000"
        kern.ipc.nmbclusters="65536"
        kern.maxproc="2048"
        kern.maxfiles="16384"
        kern.maxfilesperproc="16384"

      o Linux kernel parameters can be specified in /etc/sysctl.conf or changed
        with sysctl commands:

        fs.file-max=16384
        kernel.threads-max=2048

      o Solaris kernel parameters can be specified in /etc/system, as described
        in the Solaris FAQ entry titled "How can I increase the number of file
        descriptors per process?"

        * set hard limit on file descriptors
        set rlim_fd_max = 4096
        * set soft limit on file descriptors
        set rlim_fd_cur = 1024