summaryrefslogtreecommitdiffstats
path: root/README_FILES/STRESS_README
diff options
context:
space:
mode:
Diffstat (limited to 'README_FILES/STRESS_README')
-rw-r--r--README_FILES/STRESS_README426
1 files changed, 426 insertions, 0 deletions
diff --git a/README_FILES/STRESS_README b/README_FILES/STRESS_README
new file mode 100644
index 0000000..8edc148
--- /dev/null
+++ b/README_FILES/STRESS_README
@@ -0,0 +1,426 @@
+PPoossttffiixx SSttrreessss--DDeeppeennddeenntt CCoonnffiigguurraattiioonn
+
+-------------------------------------------------------------------------------
+
+OOvveerrvviieeww
+
+This document describes the symptoms of Postfix SMTP server overload. It
+presents permanent main.cf changes to avoid overload during normal operation,
+and temporary main.cf changes to cope with an unexpected burst of mail. This
+document makes specific suggestions for Postfix 2.5 and later which support
+stress-adaptive behavior, and for earlier Postfix versions that don't.
+
+Topics covered in this document:
+
+ * Symptoms of Postfix SMTP server overload
+ * Automatic stress-adaptive behavior
+ * Service more SMTP clients at the same time
+ * Spend less time per SMTP client
+ * Disconnect suspicious SMTP clients
+ * Temporary measures for older Postfix releases
+ * Detecting support for stress-adaptive behavior
+ * Forcing stress-adaptive behavior on or off
+ * Other measures to off-load zombies
+ * Credits
+
+SSyymmppttoommss ooff PPoossttffiixx SSMMTTPP sseerrvveerr oovveerrllooaadd
+
+Under normal conditions, the Postfix SMTP server responds immediately when an
+SMTP client connects to it; the time to deliver mail is noticeable only with
+large messages. Performance degrades dramatically when the number of SMTP
+clients exceeds the number of Postfix SMTP server processes. When an SMTP
+client connects while all Postfix SMTP server processes are busy, the client
+must wait until a server process becomes available.
+
+SMTP server overload may be caused by a surge of legitimate mail (example: a
+DNS registrar opens a new zone for registrations), by mistake (mail explosion
+caused by a forwarding loop) or by malice (worm outbreak, botnet, or other
+illegitimate activity).
+
+Symptoms of Postfix SMTP server overload are:
+
+ * Remote SMTP clients experience a long delay before Postfix sends the "220
+ hostname.example.com ESMTP Postfix" greeting.
+
+ o NOTE: Broken DNS configurations can also cause lengthy delays before
+ Postfix sends "220 hostname.example.com ...". These delays also exist
+ when Postfix is NOT overloaded.
+
+ o NOTE: To avoid "overload" delays for end-user mail clients, enable the
+ "submission" service entry in master.cf (present since Postfix 2.1),
+ and tell users to connect to this instead of the public SMTP service.
+
+ * The Postfix SMTP server logs an increased number of "lost connection after
+ CONNECT" events. This happens because remote SMTP clients disconnect before
+ Postfix answers the connection.
+
+ o NOTE: A portscan for open SMTP ports can also result in "lost
+ connection ..." logfile messages.
+
+ * Postfix 2.3 and later logs a warning that all server ports are busy:
+
+ Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
+ (25) has reached its process limit "30": new clients may experience
+ noticeable delays
+ Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this
+ condition, increase the process count in master.cf or reduce the
+ service time per client
+ Oct 3 20:39:27 spike postfix/master[28905]: warning: see
+ http://www.postfix.org/STRESS_README.html for examples of
+ stress-adapting configuration settings
+
+Legitimate mail that doesn't get through during an episode of Postfix SMTP
+server overload is not necessarily lost. It should still arrive once the
+situation returns to normal, as long as the overload condition is temporary.
+
+AAuuttoommaattiicc ssttrreessss--aaddaappttiivvee bbeehhaavviioorr
+
+Postfix version 2.5 introduces automatic stress-adaptive behavior. It works as
+follows. When a "public" network service such as the SMTP server runs into an
+"all server ports are busy" condition, the Postfix master(8) daemon logs a
+warning, restarts the service (without interrupting existing network sessions),
+and runs the service with "-o stress=yes" on the server process command line:
+
+ 80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
+
+Normally, the Postfix master(8) daemon runs such a service with "-o stress=" on
+the command line (i.e. with an empty parameter value):
+
+ 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
+
+You won't see "-o stress" command-line parameters with services that have local
+clients only. These include services internal to Postfix such as the queue
+manager, and services that listen on a loopback interface only, such as after-
+filter SMTP services.
+
+The "stress" parameter value is the key to making main.cf parameter settings
+stress adaptive. The following settings are the default with Postfix 2.6 and
+later.
+
+ 1 smtpd_timeout = ${stress?{10}:{300}}s
+ 2 smtpd_hard_error_limit = ${stress?{1}:{20}}
+ 3 smtpd_junk_command_limit = ${stress?{1}:{100}}
+ 4 # Parameters added after Postfix 2.6:
+ 5 smtpd_per_record_deadline = ${stress?{yes}:{no}}
+ 6 smtpd_starttls_timeout = ${stress?{10}:{300}}s
+ 7 address_verify_poll_count = ${stress?{1}:{3}}
+
+Postfix versions before 3.0 use the older form ${stress?x}${stress:y} instead
+of the newer form ${stress?{x}:{y}}.
+
+The syntax of ${name?{value}:{value}}, ${name?value} and ${name:value} is
+explained at the beginning of the postconf(5) manual page.
+
+Translation:
+
+ * Line 1: under conditions of stress, use an smtpd_timeout value of 10
+ seconds instead of the default 300 seconds. Experience on the postfix-users
+ list from a variety of sysadmins shows that reducing the "normal"
+ smtpd_timeout to 60s is unlikely to affect legitimate clients. However, it
+ is unlikely to become the Postfix default because it's not RFC compliant.
+ Setting smtpd_timeout to 10s or even 5s under stress will still allow most
+ legitimate clients to connect and send mail, but may delay mail from some
+ clients. No mail should be lost, as long as this measure is used only
+ temporarily.
+
+ * Line 2: under conditions of stress, use an smtpd_hard_error_limit of 1
+ instead of the default 20. This disconnects clients after a single error,
+ giving other clients a chance to connect. However, this may cause
+ significant delays with legitimate mail, such as a mailing list that
+ contains a few no-longer-active user names that didn't bother to
+ unsubscribe. No mail should be lost, as long as this measure is used only
+ temporarily.
+
+ * Line 3: under conditions of stress, use an smtpd_junk_command_limit of 1
+ instead of the default 100. This prevents clients from keeping connections
+ open by repeatedly sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands.
+
+ * Line 5: under conditions of stress, change the behavior of smtpd_timeout
+ and smtpd_starttls_timeout, from a time limit per read or write system
+ call, to a time limit to send or receive a complete record (an SMTP command
+ line, SMTP response line, SMTP message content line, or TLS protocol
+ message).
+
+ * Line 6: under conditions of stress, reduce the time limit for TLS protocol
+ handshake messages to 10 seconds, from the default value of 300 seconds.
+ See also the smtpd_timeout discussion above.
+
+ * Line 7: under conditions of stress, do not wait up to 6 seconds for the
+ completion of an address verification probe. If the result is not already
+ in the address verification cache, reply immediately with
+ $unverified_recipient_tempfail_action or
+ $unverified_sender_tempfail_action. No mail should be lost, as long as this
+ measure is used only temporarily.
+
+NOTE: Please keep in mind that the stress-adaptive feature is a fairly
+desperate measure to keep ssoommee legitimate mail flowing under overload
+conditions. If a site is reaching the SMTP server process limit when there
+isn't an attack or bot flood occurring, then either the process limit needs to
+be raised or more hardware needs to be added.
+
+SSeerrvviiccee mmoorree SSMMTTPP cclliieennttss aatt tthhee ssaammee ttiimmee
+
+This section and the ones that follow discuss permanent measures against mail
+server overload.
+
+One measure to avoid the "all server processes busy" condition is to service
+more SMTP clients simultaneously. For this you need to increase the number of
+Postfix SMTP server processes. This will improve the responsiveness for remote
+SMTP clients, as long as the server machine has enough hardware and software
+resources to run the additional processes, and as long as the file system can
+keep up with the additional load.
+
+ * You increase the number of SMTP server processes either by increasing the
+ default_process_limit in main.cf (line 3 below), or by increasing the SMTP
+ server's "maxproc" field in master.cf (line 10 below). Either way, you need
+ to issue a "postfix reload" command to make the change effective.
+
+ * Process limits above 1000 require Postfix version 2.4 or later, and an
+ operating system that supports kernel-based event filters (BSD kqueue(2),
+ Linux epoll(4), or Solaris /dev/poll).
+
+ * More processes use more memory. You can reduce the Postfix memory footprint
+ by using cdb: lookup tables instead of Berkeley DB's hash: or btree:
+ tables.
+
+ 1 /etc/postfix/main.cf:
+ 2 # Raise the global process limit, 100 since Postfix 2.0.
+ 3 default_process_limit = 200
+ 4
+ 5 /etc/postfix/master.cf:
+ 6 # =============================================================
+ 7 # service type private unpriv chroot wakeup maxproc command
+ 8 # =============================================================
+ 9 # Raise the SMTP service process limit only.
+ 10 smtp inet n - n - 200 smtpd
+
+ * NOTE: older versions of the SMTPD_POLICY_README document contain a mistake:
+ they configure a fixed number of policy daemon processes. When you raise
+ the SMTP server's "maxproc" field in master.cf, SMTP server processes will
+ report problems when connecting to policy server processes, because there
+ aren't enough of them. Examples of errors are "connection refused" or
+ "operation timed out".
+
+ To fix, edit master.cf and specify a zero "maxproc" field in all policy
+ server entries; see line 6 in the example below. Issue a "postfix reload"
+ command to make the change effective.
+
+ 1 /etc/postfix/master.cf:
+ 2 # =============================================================
+ 3 # service type private unpriv chroot wakeup maxproc command
+ 4 # =============================================================
+ 5 # Disable the policy service process limit.
+ 6 policy unix - n n - 0 spawn
+ 7 user=nobody argv=/some/where/policy-server
+
+SSppeenndd lleessss ttiimmee ppeerr SSMMTTPP cclliieenntt
+
+When increasing the number of SMTP server processes is not practical, you can
+improve Postfix server responsiveness by eliminating delays. When Postfix
+spends less time per SMTP session, the same number of SMTP server processes can
+service more clients in a given amount of time.
+
+ * Eliminate non-functional RBL lookups (blocklists that are no longer in
+ operation). These lookups can degrade performance. Postfix logs a warning
+ when an RBL server does not respond.
+
+ * Eliminate redundant RBL lookups (people often use multiple Spamhaus RBLs
+ that include each other). To find out whether RBLs include other RBLs, look
+ up the websites that document the RBL's policies.
+
+ * Eliminate header_checks and body_checks, and keep just a few emergency
+ patterns to block the latest worm explosion or backscatter mail. See
+ BACKSCATTER_README for examples of the latter.
+
+ * Group your header_checks and body_checks patterns to avoid unnecessary
+ pattern matching operations:
+
+ 1 /etc/postfix/header_checks:
+ 2 if /^Subject:/
+ 3 /^Subject: virus found in mail from you/ reject
+ 4 /^Subject: ..other../ reject
+ 5 endif
+ 6
+ 7 if /^Received:/
+ 8 /^Received: from (postfix\.org) / reject forged client name in
+ received header: $1
+ 9 /^Received: from ..other../ reject ....
+ 10 endif
+
+DDiissccoonnnneecctt ssuussppiicciioouuss SSMMTTPP cclliieennttss
+
+Under conditions of overload you can improve Postfix SMTP server responsiveness
+by hanging up on suspicious clients, so that other clients get a chance to talk
+to Postfix.
+
+ * Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" (Postfix 2.3-
+ 2.5) to hang up on clients that that match botnet-related RBLs (see next
+ bullet) or that match selected non-RBL restrictions such as SMTP access
+ maps. The Postfix SMTP server will reject mail and disconnect without
+ waiting for the remote SMTP client to send a QUIT command.
+
+ * To hang up connections from denylisted zombies, you can set specific
+ Postfix SMTP server reject codes for specific RBLs, and for individual
+ responses from specific RBLs. We'll use zen.spamhaus.org as an example; by
+ the time you read this document, details may have changed. Right now, their
+ documents say that a response of 127.0.0.10 or 127.0.0.11 indicates a
+ dynamic client IP address, which means that the machine is probably running
+ a bot of some kind. To give a 521 response instead of the default 554
+ response, use something like:
+
+ 1 /etc/postfix/main.cf:
+ 2 smtpd_client_restrictions =
+ 3 permit_mynetworks
+ 4 reject_rbl_client zen.spamhaus.org=127.0.0.10
+ 5 reject_rbl_client zen.spamhaus.org=127.0.0.11
+ 6 reject_rbl_client zen.spamhaus.org
+ 7
+ 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
+ 9
+ 10 /etc/postfix/rbl_reply_maps:
+ 11 # With Postfix 2.3-2.5 use "421" to hang up connections.
+ 12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
+ 13 $rbl_class [$rbl_what] blocked using
+ 14 $rbl_domain${rbl_reason?; $rbl_reason}
+ 15
+ 16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
+ 17 $rbl_class [$rbl_what] blocked using
+ 18 $rbl_domain${rbl_reason?; $rbl_reason}
+
+ Although the above example shows three RBL lookups (lines 4-6), Postfix
+ will only do a single DNS query, so it does not affect the performance.
+
+ * With Postfix 2.3-2.5, use reply code 421 (521 will not cause Postfix to
+ disconnect). The down-side of replying with 421 is that it works only for
+ zombies and other malware. If the client is running a real MTA, then it may
+ connect again several times until the mail expires in its queue. When this
+ is a problem, stick with the default 554 reply, and use
+ "smtpd_hard_error_limit = 1" as described below.
+
+ * You can automatically turn on the above overload measure with Postfix 2.5
+ and later, or with earlier releases that contain the stress-adaptive
+ behavior source code patch from the mirrors listed at http://
+ www.postfix.org/download.html. Simply replace line above 8 with:
+
+ 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
+
+More information about automatic stress-adaptive behavior is in section
+"Automatic stress-adaptive behavior".
+
+TTeemmppoorraarryy mmeeaassuurreess ffoorr oollddeerr PPoossttffiixx rreelleeaasseess
+
+See the section "Automatic stress-adaptive behavior" if you are running Postfix
+version 2.5 or later, or if you have applied the source code patch for stress-
+adaptive behavior from the mirrors listed at http://www.postfix.org/
+download.html.
+
+The following measures can be applied temporarily during overload. They still
+allow mmoosstt legitimate clients to connect and send mail, but may affect some
+legitimate clients.
+
+ * Reduce smtpd_timeout (default: 300s). Experience on the postfix-users list
+ from a variety of sysadmins shows that reducing the "normal" smtpd_timeout
+ to 60s is unlikely to affect legitimate clients. However, it is unlikely to
+ become the Postfix default because it's not RFC compliant. Setting
+ smtpd_timeout to 10s (line 2 below) or even 5s under stress will still
+ allow mmoosstt legitimate clients to connect and send mail, but may delay mail
+ from some clients. No mail should be lost, as long as this measure is used
+ only temporarily.
+
+ * Reduce smtpd_hard_error_limit (default: 20). Setting this to 1 under stress
+ (line 3 below) helps by disconnecting clients after a single error, giving
+ other clients a chance to connect. However, this may cause significant
+ delays with legitimate mail, such as a mailing list that contains a few no-
+ longer-active user names that didn't bother to unsubscribe. No mail should
+ be lost, as long as this measure is used only temporarily.
+
+ * Use an smtpd_junk_command_limit of 1 instead of the default 100. This
+ prevents clients from keeping idle connections open by repeatedly sending
+ NOOP or RSET commands.
+
+ 1 /etc/postfix/main.cf:
+ 2 smtpd_timeout = 10
+ 3 smtpd_hard_error_limit = 1
+ 4 smtpd_junk_command_limit = 1
+
+With these measures, no mail should be lost, as long as these measures are used
+only temporarily. The next section of this document introduces a way to
+automate this process.
+
+DDeetteeccttiinngg ssuuppppoorrtt ffoorr ssttrreessss--aaddaappttiivvee bbeehhaavviioorr
+
+To find out if your Postfix installation supports stress-adaptive behavior, use
+the "ps" command, and look for the smtpd processes. Postfix has stress-adaptive
+support when you see "-o stress=" or "-o stress=yes" command-line options.
+Remember that Postfix never enables stress-adaptive behavior on servers that
+listen on local addresses only.
+
+The following example is for FreeBSD or Linux. On Solaris, HP-UX and other
+System-V flavors, use "ps -ef" instead of "ps ax".
+
+ $ ps ax|grep smtpd
+ 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
+ 84345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-
+ policy.pl
+
+You can't use postconf(1) to detect stress-adaptive support. The postconf(1)
+command ignores the existence of the stress parameter in main.cf, because the
+parameter has no effect there. Command-line "-o parameter" settings always take
+precedence over main.cf parameter settings.
+
+If you configure stress-adaptive behavior in main.cf when it isn't supported,
+nothing bad will happen. The processes will run as if the stress parameter
+always has an empty value.
+
+FFoorrcciinngg ssttrreessss--aaddaappttiivvee bbeehhaavviioorr oonn oorr ooffff
+
+You can manually force stress-adaptive behavior on, by adding a "-o stress=yes"
+command-line option in master.cf. This can be useful for testing overrides on
+the SMTP service. Issue "postfix reload" to make the change effective.
+
+Note: setting the stress parameter in main.cf has no effect for services that
+accept remote connections.
+
+ 1 /etc/postfix/master.cf:
+ 2 # =============================================================
+ 3 # service type private unpriv chroot wakeup maxproc command
+ 4 # =============================================================
+ 5 #
+ 6 smtp inet n - n - - smtpd
+ 7 -o stress=yes
+ 8 -o . . .
+
+To permanently force stress-adaptive behavior off with a specific service,
+specify "-o stress=" on its master.cf command line. This may be desirable for
+the "submission" service. Issue "postfix reload" to make the change effective.
+
+Note: setting the stress parameter in main.cf has no effect for services that
+accept remote connections.
+
+ 1 /etc/postfix/master.cf:
+ 2 # =============================================================
+ 3 # service type private unpriv chroot wakeup maxproc command
+ 4 # =============================================================
+ 5 #
+ 6 submission inet n - n - - smtpd
+ 7 -o stress=
+ 8 -o . . .
+
+OOtthheerr mmeeaassuurreess ttoo ooffff--llooaadd zzoommbbiieess
+
+The postscreen(8) daemon, introduced with Postfix 2.8, provides additional
+protection against mail server overload. One postscreen(8) process handles
+multiple inbound SMTP connections, and decides which clients may talk to a
+Postfix SMTP server process. By keeping spambots away, postscreen(8) leaves
+more SMTP server processes available for legitimate clients, and delays the
+onset of server overload conditions.
+
+CCrreeddiittss
+
+ * Thanks to the postfix-users mailing list members for sharing early
+ experiences with the stress-adaptive feature.
+ * The RBL example and several other paragraphs of text were adapted from
+ postfix-users postings by Noel Jones.
+ * Wietse implemented stress-adaptive behavior as the smallest possible patch
+ while he should be working on other things.
+