diff options
Diffstat (limited to '')
-rw-r--r-- | README_FILES/STRESS_README | 426 |
1 files changed, 426 insertions, 0 deletions
diff --git a/README_FILES/STRESS_README b/README_FILES/STRESS_README new file mode 100644 index 0000000..bae6ad1 --- /dev/null +++ b/README_FILES/STRESS_README @@ -0,0 +1,426 @@ +PPoossttffiixx SSttrreessss--DDeeppeennddeenntt CCoonnffiigguurraattiioonn + +------------------------------------------------------------------------------- + +OOvveerrvviieeww + +This document describes the symptoms of Postfix SMTP server overload. It +presents permanent main.cf changes to avoid overload during normal operation, +and temporary main.cf changes to cope with an unexpected burst of mail. This +document makes specific suggestions for Postfix 2.5 and later which support +stress-adaptive behavior, and for earlier Postfix versions that don't. + +Topics covered in this document: + + * Symptoms of Postfix SMTP server overload + * Automatic stress-adaptive behavior + * Service more SMTP clients at the same time + * Spend less time per SMTP client + * Disconnect suspicious SMTP clients + * Temporary measures for older Postfix releases + * Detecting support for stress-adaptive behavior + * Forcing stress-adaptive behavior on or off + * Other measures to off-load zombies + * Credits + +SSyymmppttoommss ooff PPoossttffiixx SSMMTTPP sseerrvveerr oovveerrllooaadd + +Under normal conditions, the Postfix SMTP server responds immediately when an +SMTP client connects to it; the time to deliver mail is noticeable only with +large messages. Performance degrades dramatically when the number of SMTP +clients exceeds the number of Postfix SMTP server processes. When an SMTP +client connects while all Postfix SMTP server processes are busy, the client +must wait until a server process becomes available. + +SMTP server overload may be caused by a surge of legitimate mail (example: a +DNS registrar opens a new zone for registrations), by mistake (mail explosion +caused by a forwarding loop) or by malice (worm outbreak, botnet, or other +illegitimate activity). + +Symptoms of Postfix SMTP server overload are: + + * Remote SMTP clients experience a long delay before Postfix sends the "220 + hostname.example.com ESMTP Postfix" greeting. + + o NOTE: Broken DNS configurations can also cause lengthy delays before + Postfix sends "220 hostname.example.com ...". These delays also exist + when Postfix is NOT overloaded. + + o NOTE: To avoid "overload" delays for end-user mail clients, enable the + "submission" service entry in master.cf (present since Postfix 2.1), + and tell users to connect to this instead of the public SMTP service. + + * The Postfix SMTP server logs an increased number of "lost connection after + CONNECT" events. This happens because remote SMTP clients disconnect before + Postfix answers the connection. + + o NOTE: A portscan for open SMTP ports can also result in "lost + connection ..." logfile messages. + + * Postfix 2.3 and later logs a warning that all server ports are busy: + + Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp" + (25) has reached its process limit "30": new clients may experience + noticeable delays + Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this + condition, increase the process count in master.cf or reduce the + service time per client + Oct 3 20:39:27 spike postfix/master[28905]: warning: see + http://www.postfix.org/STRESS_README.html for examples of + stress-adapting configuration settings + +Legitimate mail that doesn't get through during an episode of Postfix SMTP +server overload is not necessarily lost. It should still arrive once the +situation returns to normal, as long as the overload condition is temporary. + +AAuuttoommaattiicc ssttrreessss--aaddaappttiivvee bbeehhaavviioorr + +Postfix version 2.5 introduces automatic stress-adaptive behavior. It works as +follows. When a "public" network service such as the SMTP server runs into an +"all server ports are busy" condition, the Postfix master(8) daemon logs a +warning, restarts the service (without interrupting existing network sessions), +and runs the service with "-o stress=yes" on the server process command line: + + 80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes + +Normally, the Postfix master(8) daemon runs such a service with "-o stress=" on +the command line (i.e. with an empty parameter value): + + 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= + +You won't see "-o stress" command-line parameters with services that have local +clients only. These include services internal to Postfix such as the queue +manager, and services that listen on a loopback interface only, such as after- +filter SMTP services. + +The "stress" parameter value is the key to making main.cf parameter settings +stress adaptive. The following settings are the default with Postfix 2.6 and +later. + + 1 smtpd_timeout = ${stress?{10}:{300}}s + 2 smtpd_hard_error_limit = ${stress?{1}:{20}} + 3 smtpd_junk_command_limit = ${stress?{1}:{100}} + 4 # Parameters added after Postfix 2.6: + 5 smtpd_per_record_deadline = ${stress?{yes}:{no}} + 6 smtpd_starttls_timeout = ${stress?{10}:{300}}s + 7 address_verify_poll_count = ${stress?{1}:{3}} + +Postfix versions before 3.0 use the older form ${stress?x}${stress:y} instead +of the newer form ${stress?{x}:{y}}. + +The syntax of ${name?{value}:{value}}, ${name?value} and ${name:value} is +explained at the beginning of the postconf(5) manual page. + +Translation: + + * Line 1: under conditions of stress, use an smtpd_timeout value of 10 + seconds instead of the default 300 seconds. Experience on the postfix-users + list from a variety of sysadmins shows that reducing the "normal" + smtpd_timeout to 60s is unlikely to affect legitimate clients. However, it + is unlikely to become the Postfix default because it's not RFC compliant. + Setting smtpd_timeout to 10s or even 5s under stress will still allow most + legitimate clients to connect and send mail, but may delay mail from some + clients. No mail should be lost, as long as this measure is used only + temporarily. + + * Line 2: under conditions of stress, use an smtpd_hard_error_limit of 1 + instead of the default 20. This disconnects clients after a single error, + giving other clients a chance to connect. However, this may cause + significant delays with legitimate mail, such as a mailing list that + contains a few no-longer-active user names that didn't bother to + unsubscribe. No mail should be lost, as long as this measure is used only + temporarily. + + * Line 3: under conditions of stress, use an smtpd_junk_command_limit of 1 + instead of the default 100. This prevents clients from keeping connections + open by repeatedly sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. + + * Line 5: under conditions of stress, change the behavior of smtpd_timeout + and smtpd_starttls_timeout, from a time limit per read or write system + call, to a time limit to send or receive a complete record (an SMTP command + line, SMTP response line, SMTP message content line, or TLS protocol + message). + + * Line 6: under conditions of stress, reduce the time limit for TLS protocol + handshake messages to 10 seconds, from the default value of 300 seconds. + See also the smtpd_timeout discussion above. + + * Line 7: under conditions of stress, do not wait up to 6 seconds for the + completion of an address verification probe. If the result is not already + in the address verification cache, reply immediately with + $unverified_recipient_tempfail_action or + $unverified_sender_tempfail_action. No mail should be lost, as long as this + measure is used only temporarily. + +NOTE: Please keep in mind that the stress-adaptive feature is a fairly +desperate measure to keep ssoommee legitimate mail flowing under overload +conditions. If a site is reaching the SMTP server process limit when there +isn't an attack or bot flood occurring, then either the process limit needs to +be raised or more hardware needs to be added. + +SSeerrvviiccee mmoorree SSMMTTPP cclliieennttss aatt tthhee ssaammee ttiimmee + +This section and the ones that follow discuss permanent measures against mail +server overload. + +One measure to avoid the "all server processes busy" condition is to service +more SMTP clients simultaneously. For this you need to increase the number of +Postfix SMTP server processes. This will improve the responsiveness for remote +SMTP clients, as long as the server machine has enough hardware and software +resources to run the additional processes, and as long as the file system can +keep up with the additional load. + + * You increase the number of SMTP server processes either by increasing the + default_process_limit in main.cf (line 3 below), or by increasing the SMTP + server's "maxproc" field in master.cf (line 10 below). Either way, you need + to issue a "postfix reload" command to make the change effective. + + * Process limits above 1000 require Postfix version 2.4 or later, and an + operating system that supports kernel-based event filters (BSD kqueue(2), + Linux epoll(4), or Solaris /dev/poll). + + * More processes use more memory. You can reduce the Postfix memory footprint + by using cdb: lookup tables instead of Berkeley DB's hash: or btree: + tables. + + 1 /etc/postfix/main.cf: + 2 # Raise the global process limit, 100 since Postfix 2.0. + 3 default_process_limit = 200 + 4 + 5 /etc/postfix/master.cf: + 6 # ============================================================= + 7 # service type private unpriv chroot wakeup maxproc command + 8 # ============================================================= + 9 # Raise the SMTP service process limit only. + 10 smtp inet n - n - 200 smtpd + + * NOTE: older versions of the SMTPD_POLICY_README document contain a mistake: + they configure a fixed number of policy daemon processes. When you raise + the SMTP server's "maxproc" field in master.cf, SMTP server processes will + report problems when connecting to policy server processes, because there + aren't enough of them. Examples of errors are "connection refused" or + "operation timed out". + + To fix, edit master.cf and specify a zero "maxproc" field in all policy + server entries; see line 6 in the example below. Issue a "postfix reload" + command to make the change effective. + + 1 /etc/postfix/master.cf: + 2 # ============================================================= + 3 # service type private unpriv chroot wakeup maxproc command + 4 # ============================================================= + 5 # Disable the policy service process limit. + 6 policy unix - n n - 0 spawn + 7 user=nobody argv=/some/where/policy-server + +SSppeenndd lleessss ttiimmee ppeerr SSMMTTPP cclliieenntt + +When increasing the number of SMTP server processes is not practical, you can +improve Postfix server responsiveness by eliminating delays. When Postfix +spends less time per SMTP session, the same number of SMTP server processes can +service more clients in a given amount of time. + + * Eliminate non-functional RBL lookups (blocklists that are no longer in + operation). These lookups can degrade performance. Postfix logs a warning + when an RBL server does not respond. + + * Eliminate redundant RBL lookups (people often use multiple Spamhaus RBLs + that include each other). To find out whether RBLs include other RBLs, look + up the websites that document the RBL's policies. + + * Eliminate header_checks and body_checks, and keep just a few emergency + patterns to block the latest worm explosion or backscatter mail. See + BACKSCATTER_README for examples of the latter. + + * Group your header_checks and body_checks patterns to avoid unnecessary + pattern matching operations: + + 1 /etc/postfix/header_checks: + 2 if /^Subject:/ + 3 /^Subject: virus found in mail from you/ reject + 4 /^Subject: ..other../ reject + 5 endif + 6 + 7 if /^Received:/ + 8 /^Received: from (postfix\.org) / reject forged client name in + received header: $1 + 9 /^Received: from ..other../ reject .... + 10 endif + +DDiissccoonnnneecctt ssuussppiicciioouuss SSMMTTPP cclliieennttss + +Under conditions of overload you can improve Postfix SMTP server responsiveness +by hanging up on suspicious clients, so that other clients get a chance to talk +to Postfix. + + * Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" (Postfix 2.3- + 2.5) to hang up on clients that that match botnet-related RBLs (see next + bullet) or that match selected non-RBL restrictions such as SMTP access + maps. The Postfix SMTP server will reject mail and disconnect without + waiting for the remote SMTP client to send a QUIT command. + + * To hang up connections from blacklisted zombies, you can set specific + Postfix SMTP server reject codes for specific RBLs, and for individual + responses from specific RBLs. We'll use zen.spamhaus.org as an example; by + the time you read this document, details may have changed. Right now, their + documents say that a response of 127.0.0.10 or 127.0.0.11 indicates a + dynamic client IP address, which means that the machine is probably running + a bot of some kind. To give a 521 response instead of the default 554 + response, use something like: + + 1 /etc/postfix/main.cf: + 2 smtpd_client_restrictions = + 3 permit_mynetworks + 4 reject_rbl_client zen.spamhaus.org=127.0.0.10 + 5 reject_rbl_client zen.spamhaus.org=127.0.0.11 + 6 reject_rbl_client zen.spamhaus.org + 7 + 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps + 9 + 10 /etc/postfix/rbl_reply_maps: + 11 # With Postfix 2.3-2.5 use "421" to hang up connections. + 12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable; + 13 $rbl_class [$rbl_what] blocked using + 14 $rbl_domain${rbl_reason?; $rbl_reason} + 15 + 16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable; + 17 $rbl_class [$rbl_what] blocked using + 18 $rbl_domain${rbl_reason?; $rbl_reason} + + Although the above example shows three RBL lookups (lines 4-6), Postfix + will only do a single DNS query, so it does not affect the performance. + + * With Postfix 2.3-2.5, use reply code 421 (521 will not cause Postfix to + disconnect). The down-side of replying with 421 is that it works only for + zombies and other malware. If the client is running a real MTA, then it may + connect again several times until the mail expires in its queue. When this + is a problem, stick with the default 554 reply, and use + "smtpd_hard_error_limit = 1" as described below. + + * You can automatically turn on the above overload measure with Postfix 2.5 + and later, or with earlier releases that contain the stress-adaptive + behavior source code patch from the mirrors listed at http:// + www.postfix.org/download.html. Simply replace line above 8 with: + + 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps} + +More information about automatic stress-adaptive behavior is in section +"Automatic stress-adaptive behavior". + +TTeemmppoorraarryy mmeeaassuurreess ffoorr oollddeerr PPoossttffiixx rreelleeaasseess + +See the section "Automatic stress-adaptive behavior" if you are running Postfix +version 2.5 or later, or if you have applied the source code patch for stress- +adaptive behavior from the mirrors listed at http://www.postfix.org/ +download.html. + +The following measures can be applied temporarily during overload. They still +allow mmoosstt legitimate clients to connect and send mail, but may affect some +legitimate clients. + + * Reduce smtpd_timeout (default: 300s). Experience on the postfix-users list + from a variety of sysadmins shows that reducing the "normal" smtpd_timeout + to 60s is unlikely to affect legitimate clients. However, it is unlikely to + become the Postfix default because it's not RFC compliant. Setting + smtpd_timeout to 10s (line 2 below) or even 5s under stress will still + allow mmoosstt legitimate clients to connect and send mail, but may delay mail + from some clients. No mail should be lost, as long as this measure is used + only temporarily. + + * Reduce smtpd_hard_error_limit (default: 20). Setting this to 1 under stress + (line 3 below) helps by disconnecting clients after a single error, giving + other clients a chance to connect. However, this may cause significant + delays with legitimate mail, such as a mailing list that contains a few no- + longer-active user names that didn't bother to unsubscribe. No mail should + be lost, as long as this measure is used only temporarily. + + * Use an smtpd_junk_command_limit of 1 instead of the default 100. This + prevents clients from keeping idle connections open by repeatedly sending + NOOP or RSET commands. + + 1 /etc/postfix/main.cf: + 2 smtpd_timeout = 10 + 3 smtpd_hard_error_limit = 1 + 4 smtpd_junk_command_limit = 1 + +With these measures, no mail should be lost, as long as these measures are used +only temporarily. The next section of this document introduces a way to +automate this process. + +DDeetteeccttiinngg ssuuppppoorrtt ffoorr ssttrreessss--aaddaappttiivvee bbeehhaavviioorr + +To find out if your Postfix installation supports stress-adaptive behavior, use +the "ps" command, and look for the smtpd processes. Postfix has stress-adaptive +support when you see "-o stress=" or "-o stress=yes" command-line options. +Remember that Postfix never enables stress-adaptive behavior on servers that +listen on local addresses only. + +The following example is for FreeBSD or Linux. On Solaris, HP-UX and other +System-V flavors, use "ps -ef" instead of "ps ax". + + $ ps ax|grep smtpd + 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= + 84345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd- + policy.pl + +You can't use postconf(1) to detect stress-adaptive support. The postconf(1) +command ignores the existence of the stress parameter in main.cf, because the +parameter has no effect there. Command-line "-o parameter" settings always take +precedence over main.cf parameter settings. + +If you configure stress-adaptive behavior in main.cf when it isn't supported, +nothing bad will happen. The processes will run as if the stress parameter +always has an empty value. + +FFoorrcciinngg ssttrreessss--aaddaappttiivvee bbeehhaavviioorr oonn oorr ooffff + +You can manually force stress-adaptive behavior on, by adding a "-o stress=yes" +command-line option in master.cf. This can be useful for testing overrides on +the SMTP service. Issue "postfix reload" to make the change effective. + +Note: setting the stress parameter in main.cf has no effect for services that +accept remote connections. + + 1 /etc/postfix/master.cf: + 2 # ============================================================= + 3 # service type private unpriv chroot wakeup maxproc command + 4 # ============================================================= + 5 # + 6 smtp inet n - n - - smtpd + 7 -o stress=yes + 8 -o . . . + +To permanently force stress-adaptive behavior off with a specific service, +specify "-o stress=" on its master.cf command line. This may be desirable for +the "submission" service. Issue "postfix reload" to make the change effective. + +Note: setting the stress parameter in main.cf has no effect for services that +accept remote connections. + + 1 /etc/postfix/master.cf: + 2 # ============================================================= + 3 # service type private unpriv chroot wakeup maxproc command + 4 # ============================================================= + 5 # + 6 submission inet n - n - - smtpd + 7 -o stress= + 8 -o . . . + +OOtthheerr mmeeaassuurreess ttoo ooffff--llooaadd zzoommbbiieess + +The postscreen(8) daemon, introduced with Postfix 2.8, provides additional +protection against mail server overload. One postscreen(8) process handles +multiple inbound SMTP connections, and decides which clients may to talk to a +Postfix SMTP server process. By keeping spambots away, postscreen(8) leaves +more SMTP server processes available for legitimate clients, and delays the +onset of server overload conditions. + +CCrreeddiittss + + * Thanks to the postfix-users mailing list members for sharing early + experiences with the stress-adaptive feature. + * The RBL example and several other paragraphs of text were adapted from + postfix-users postings by Noel Jones. + * Wietse implemented stress-adaptive behavior as the smallest possible patch + while he should be working on other things. + |