summaryrefslogtreecommitdiffstats
path: root/proto/STRESS_README.html
diff options
context:
space:
mode:
Diffstat (limited to 'proto/STRESS_README.html')
-rw-r--r--proto/STRESS_README.html567
1 files changed, 567 insertions, 0 deletions
diff --git a/proto/STRESS_README.html b/proto/STRESS_README.html
new file mode 100644
index 0000000..4935226
--- /dev/null
+++ b/proto/STRESS_README.html
@@ -0,0 +1,567 @@
+<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
+ "http://www.w3.org/TR/html4/loose.dtd">
+
+<html>
+
+<head>
+
+<title>Postfix Stress-Dependent Configuration</title>
+
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<link rel='stylesheet' type='text/css' href='postfix-doc.css'>
+
+</head>
+
+<body>
+
+<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
+Stress-Dependent Configuration</h1>
+
+<hr>
+
+<h2>Overview </h2>
+
+<p> This document describes the symptoms of Postfix SMTP server
+overload. It presents permanent main.cf changes to avoid overload
+during normal operation, and temporary main.cf changes to cope with
+an unexpected burst of mail. This document makes specific suggestions
+for Postfix 2.5 and later which support stress-adaptive behavior,
+and for earlier Postfix versions that don't. </p>
+
+<p> Topics covered in this document: </p>
+
+<ul>
+
+<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a>
+
+<li><a href="#adapt"> Automatic stress-adaptive behavior </a>
+
+<li><a href="#concurrency"> Service more SMTP clients at the same time </a>
+
+<li><a href="#time"> Spend less time per SMTP client </a>
+
+<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
+
+<li><a href="#legacy"> Temporary measures for older Postfix releases </a>
+
+<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
+
+<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
+
+<li><a href="#other"> Other measures to off-load zombies </a>
+
+<li><a href="#credits"> Credits </a>
+
+</ul>
+
+<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
+
+<p> Under normal conditions, the Postfix SMTP server responds
+immediately when an SMTP client connects to it; the time to deliver
+mail is noticeable only with large messages. Performance degrades
+dramatically when the number of SMTP clients exceeds the number of
+Postfix SMTP server processes. When an SMTP client connects while
+all Postfix SMTP server processes are busy, the client must wait
+until a server process becomes available. </p>
+
+<p> SMTP server overload may be caused by a surge of legitimate
+mail (example: a DNS registrar opens a new zone for registrations),
+by mistake (mail explosion caused by a forwarding loop) or by malice
+(worm outbreak, botnet, or other illegitimate activity). </p>
+
+<p> Symptoms of Postfix SMTP server overload are: </p>
+
+<ul>
+
+<li> <p> Remote SMTP clients experience a long delay before Postfix
+sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
+
+<ul>
+
+<li> <p> NOTE: Broken DNS configurations can also cause lengthy
+delays before Postfix sends "220 hostname.example.com ...". These
+delays also exist when Postfix is NOT overloaded. </p>
+
+<li> <p> NOTE: To avoid "overload" delays for end-user mail
+clients, enable the "submission" service entry in master.cf (present
+since Postfix 2.1), and tell users to connect to this instead of
+the public SMTP service. </p>
+
+</ul>
+
+<li> <p> The Postfix SMTP server logs an increased number of "lost
+connection after CONNECT" events. This happens because remote SMTP
+clients disconnect before Postfix answers the connection. </p>
+
+<ul>
+
+<li> <p> NOTE: A portscan for open SMTP ports can also result in
+"lost connection ..." logfile messages. </p>
+
+</ul>
+
+<li> <p> Postfix 2.3 and later logs a warning that all server ports
+are busy: </p>
+
+<pre>
+Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
+ (25) has reached its process limit "30": new clients may experience
+ noticeable delays
+Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this
+ condition, increase the process count in master.cf or reduce the
+ service time per client
+Oct 3 20:39:27 spike postfix/master[28905]: warning: see
+ <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of
+ stress-adapting configuration settings
+</pre>
+
+</ul>
+
+<p> Legitimate mail that doesn't get through during an episode of
+Postfix SMTP server overload is not necessarily lost. It should
+still arrive once the situation returns to normal, as long as the
+overload condition is temporary. </p>
+
+<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
+
+<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
+It works as follows. When a "public" network service such as the
+SMTP server runs into an "all server ports are busy" condition, the
+Postfix master(8) daemon logs a warning, restarts the service
+(without interrupting existing network sessions), and runs the
+service with "-o stress=yes" on the server process command line:
+</p>
+
+<blockquote>
+<pre>
+80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
+</pre>
+</blockquote>
+
+<p> Normally, the Postfix master(8) daemon runs such a service with
+"-o stress=" on the command line (i.e. with an empty parameter
+value): </p>
+
+<blockquote>
+<pre>
+83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
+</pre>
+</blockquote>
+
+<p> You won't see "-o stress" command-line parameters with services
+that have local clients only. These include services internal to
+Postfix such as the queue manager, and services that listen on a
+loopback interface only, such as after-filter SMTP services. </p>
+
+<p> The "stress" parameter value is the key to making main.cf
+parameter settings stress adaptive. The following settings are the
+default with Postfix 2.6 and later. </p>
+
+<blockquote>
+<pre>
+1 smtpd_timeout = ${stress?{10}:{300}}s
+2 smtpd_hard_error_limit = ${stress?{1}:{20}}
+3 smtpd_junk_command_limit = ${stress?{1}:{100}}
+4 # Parameters added after Postfix 2.6:
+5 smtpd_per_record_deadline = ${stress?{yes}:{no}}
+6 smtpd_starttls_timeout = ${stress?{10}:{300}}s
+7 address_verify_poll_count = ${stress?{1}:{3}}
+</pre>
+</blockquote>
+
+<p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y}
+instead of the newer form ${stress?{x}:{y}}. </p>
+
+<p> The syntax of ${name?{value}:{value}}, ${name?value} and
+${name:value} is explained at the beginning of the postconf(5)
+manual page. </p>
+
+<p> Translation: <p>
+
+<ul>
+
+<li> <p> Line 1: under conditions of stress, use an smtpd_timeout
+value of 10 seconds instead of the default 300 seconds. Experience
+on the postfix-users list from a variety of sysadmins shows that
+reducing the "normal" smtpd_timeout to 60s is unlikely to affect
+legitimate clients. However, it is unlikely to become the Postfix
+default because it's not RFC compliant. Setting smtpd_timeout to
+10s or even 5s under stress will still allow most
+legitimate clients to connect and send mail, but may delay mail
+from some clients. No mail should be lost, as long as this measure
+is used only temporarily. </p>
+
+<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit
+of 1 instead of the default 20. This disconnects clients
+after a single error, giving other clients a chance to connect.
+However, this may cause significant delays with legitimate mail,
+such as a mailing list that contains a few no-longer-active user
+names that didn't bother to unsubscribe. No mail should be lost,
+as long as this measure is used only temporarily. </p>
+
+<li> <p> Line 3: under conditions of stress, use an
+smtpd_junk_command_limit of 1 instead of the default 100. This
+prevents clients from keeping connections open by repeatedly
+sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p>
+
+<li> <p> Line 5: under conditions of stress, change the behavior
+of smtpd_timeout and smtpd_starttls_timeout, from a time limit per
+read or write system call, to a time limit to send or receive a
+complete record (an SMTP command line, SMTP response line, SMTP
+message content line, or TLS protocol message). </p>
+
+<li> <p> Line 6: under conditions of stress, reduce the time limit
+for TLS protocol handshake messages to 10 seconds, from the default
+value of 300 seconds. See also the smtpd_timeout discussion above.
+</p>
+
+<li> <p> Line 7: under conditions of stress, do not wait up to 6
+seconds for the completion of an address verification probe. If the
+result is not already in the address verification cache, reply
+immediately with $unverified_recipient_tempfail_action or
+$unverified_sender_tempfail_action. No mail should be lost, as long
+as this measure is used only temporarily. </p>
+
+</ul>
+
+<p> NOTE: Please keep in mind that the stress-adaptive feature is
+a fairly desperate measure to keep <b>some</b> legitimate mail
+flowing under overload conditions. If a site is reaching the SMTP
+server process limit when there isn't an attack or bot flood
+occurring, then either the process limit needs to be raised or more
+hardware needs to be added. </p>
+
+<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
+
+<p> This section and the ones that follow discuss permanent measures
+against mail server overload. </p>
+
+<p> One measure to avoid the "all server processes busy" condition
+is to service more SMTP clients simultaneously. For this you need
+to increase the number of Postfix SMTP server processes. This will
+improve the
+responsiveness for remote SMTP clients, as long as the server machine
+has enough hardware and software resources to run the additional
+processes, and as long as the file system can keep up with the
+additional load. </p>
+
+<ul>
+
+<li> <p> You increase the number of SMTP server processes either
+by increasing the default_process_limit in main.cf (line 3 below),
+or by increasing the SMTP server's "maxproc" field in master.cf
+(line 10 below). Either way, you need to issue a "postfix reload"
+command to make the change effective. </p>
+
+<li> <p> Process limits above 1000 require Postfix version 2.4 or
+later, and an operating system that supports kernel-based event
+filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll).
+</p>
+
+<li> <p> More processes use more memory. You can reduce the Postfix
+memory footprint by using cdb:
+lookup tables instead of Berkeley DB's hash: or btree: tables. </p>
+
+<pre>
+ 1 /etc/postfix/main.cf:
+ 2 # Raise the global process limit, 100 since Postfix 2.0.
+ 3 default_process_limit = 200
+ 4
+ 5 /etc/postfix/master.cf:
+ 6 # =============================================================
+ 7 # service type private unpriv chroot wakeup maxproc command
+ 8 # =============================================================
+ 9 # Raise the SMTP service process limit only.
+10 smtp inet n - n - 200 smtpd
+</pre>
+
+<li> <p> NOTE: older versions of the SMTPD_POLICY_README document
+contain a mistake: they configure a fixed number of policy daemon
+processes. When you raise the SMTP server's "maxproc" field in
+master.cf, SMTP server processes will report problems when connecting
+to policy server processes, because there aren't enough of them.
+Examples of errors are "connection refused" or "operation timed
+out". </p>
+
+<p> To fix, edit master.cf and specify a zero "maxproc" field
+in all policy server entries; see line 6 in the example below.
+Issue a "postfix reload" command to make the change effective. </p>
+
+<pre>
+1 /etc/postfix/master.cf:
+2 # =============================================================
+3 # service type private unpriv chroot wakeup maxproc command
+4 # =============================================================
+5 # Disable the policy service process limit.
+6 policy unix - n n - 0 spawn
+7 user=nobody argv=/some/where/policy-server
+</pre>
+
+</ul>
+
+<h2><a name="time"> Spend less time per SMTP client </a></h2>
+
+<p> When increasing the number of SMTP server processes is not
+practical, you can improve Postfix server responsiveness by eliminating
+delays. When Postfix spends less time per SMTP session, the same
+number of SMTP server processes can service more clients in a given
+amount of time. </p>
+
+<ul>
+
+<li> <p> Eliminate non-functional RBL lookups (blocklists that are
+no longer in operation). These lookups can degrade performance.
+Postfix logs a warning when an RBL server does not respond. </p>
+
+<li> <p> Eliminate redundant RBL lookups (people often use multiple
+Spamhaus RBLs that include each other). To find out whether RBLs
+include other RBLs, look up the websites that document the RBL's
+policies. </p>
+
+<li> <p> Eliminate header_checks and body_checks, and keep just a few
+emergency patterns to block the latest worm explosion or backscatter
+mail. See BACKSCATTER_README for examples of the latter.
+
+<li> <p> Group your header_checks and body_checks patterns to avoid
+unnecessary pattern matching operations:
+
+<pre>
+ 1 /etc/postfix/header_checks:
+ 2 if /^Subject:/
+ 3 /^Subject: virus found in mail from you/ reject
+ 4 /^Subject: ..other../ reject
+ 5 endif
+ 6
+ 7 if /^Received:/
+ 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1
+ 9 /^Received: from ..other../ reject ....
+10 endif
+</pre>
+
+</ul>
+
+<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2>
+
+<p> Under conditions of overload you can improve Postfix SMTP server
+responsiveness by hanging up on suspicious clients, so that other
+clients get a chance to talk to Postfix. </p>
+
+<ul>
+
+<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
+(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
+RBLs (see next bullet) or that match selected non-RBL restrictions
+such as SMTP access maps. The Postfix SMTP server will reject mail
+and disconnect without waiting for the remote SMTP client to send
+a QUIT command. </p>
+
+<li> <p> To hang up connections from denylisted zombies, you can
+set specific Postfix SMTP server reject codes for specific RBLs,
+and for individual responses from specific RBLs. We'll use
+zen.spamhaus.org as an example; by the time you read this document,
+details may have changed. Right now, their documents say that a
+response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
+address, which means that the machine is probably running a bot of
+some kind. To give a 521 response instead of the default 554
+response, use something like: </p>
+
+<pre>
+ 1 /etc/postfix/main.cf:
+ 2 smtpd_client_restrictions =
+ 3 permit_mynetworks
+ 4 reject_rbl_client zen.spamhaus.org=127.0.0.10
+ 5 reject_rbl_client zen.spamhaus.org=127.0.0.11
+ 6 reject_rbl_client zen.spamhaus.org
+ 7
+ 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
+ 9
+10 /etc/postfix/rbl_reply_maps:
+11 # With Postfix 2.3-2.5 use "421" to hang up connections.
+12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
+13 $rbl_class [$rbl_what] blocked using
+14 $rbl_domain${rbl_reason?; $rbl_reason}
+15
+16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
+17 $rbl_class [$rbl_what] blocked using
+18 $rbl_domain${rbl_reason?; $rbl_reason}
+</pre>
+
+<p> Although the above example shows three RBL lookups (lines 4-6),
+Postfix will only do a single DNS query, so it does not affect the
+performance. </p>
+
+<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
+cause Postfix to disconnect). The down-side of replying with 421
+is that it works only for zombies and other malware. If the client
+is running a real MTA, then it may connect again several times until
+the mail expires in its queue. When this is a problem, stick with
+the default 554 reply, and use "smtpd_hard_error_limit = 1" as
+described below. </p>
+
+<li> <p> You can automatically turn on the above overload measure
+with Postfix 2.5 and later, or with earlier releases that contain
+the stress-adaptive behavior source code patch from the mirrors
+listed at http://www.postfix.org/download.html. Simply replace line
+above 8 with: </p>
+
+<pre>
+ 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
+</pre>
+
+</ul>
+
+<p> More information about automatic stress-adaptive behavior is
+in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
+</p>
+
+<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
+
+<p> See the section "<a href="#adapt">Automatic stress-adaptive
+behavior</a>" if you are running Postfix version 2.5 or later, or
+if you have applied the source code patch for stress-adaptive
+behavior from the mirrors listed at http://www.postfix.org/download.html.
+</p>
+
+<p> The following measures can be applied temporarily during overload.
+They still allow <b>most</b> legitimate clients to connect and send
+mail, but may affect some legitimate clients. </p>
+
+<ul>
+
+<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the
+postfix-users list from a variety of sysadmins shows that reducing
+the "normal" smtpd_timeout to 60s is unlikely to affect legitimate
+clients. However, it is unlikely to become the Postfix default
+because it's not RFC compliant. Setting smtpd_timeout to 10s (line
+2 below) or even 5s under stress will still allow <b>most</b>
+legitimate clients to connect and send mail, but may delay mail
+from some clients. No mail should be lost, as long as this measure
+is used only temporarily. </p>
+
+<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this
+to 1 under stress (line 3 below) helps by disconnecting clients
+after a single error, giving other clients a chance to connect.
+However, this may cause significant delays with legitimate mail,
+such as a mailing list that contains a few no-longer-active user
+names that didn't bother to unsubscribe. No mail should be lost,
+as long as this measure is used only temporarily. </p>
+
+<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default
+100. This prevents clients from keeping idle connections open by
+repeatedly sending NOOP or RSET commands. </p>
+
+</ul>
+
+<blockquote>
+<pre>
+1 /etc/postfix/main.cf:
+2 smtpd_timeout = 10
+3 smtpd_hard_error_limit = 1
+4 smtpd_junk_command_limit = 1
+</pre>
+</blockquote>
+
+<p> With these measures, no mail should be lost, as long
+as these measures are used only temporarily. The next section of
+this document introduces a way to automate this process. </p>
+
+<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2>
+
+<p> To find out if your Postfix installation supports stress-adaptive
+behavior, use the "ps" command, and look for the smtpd processes.
+Postfix has stress-adaptive support when you see "-o stress=" or
+"-o stress=yes" command-line options. Remember that Postfix never
+enables stress-adaptive behavior on servers that listen on local
+addresses only. </p>
+
+<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX
+and other System-V flavors, use "ps -ef" instead of "ps ax". </p>
+
+<blockquote>
+<pre>
+$ ps ax|grep smtpd
+83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
+84345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl
+</pre>
+</blockquote>
+
+<p> You can't use postconf(1) to detect stress-adaptive support.
+The postconf(1) command ignores the existence of the stress parameter
+in main.cf, because the parameter has no effect there. Command-line
+"-o parameter" settings always take precedence over main.cf parameter
+settings. <p>
+
+<p> If you configure stress-adaptive behavior in main.cf when it
+isn't supported, nothing bad will happen. The processes will run
+as if the stress parameter always has an empty value. </p>
+
+<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2>
+
+<p> You can manually force stress-adaptive behavior on, by adding
+a "-o stress=yes" command-line option in master.cf. This can be
+useful for testing overrides on the SMTP service. Issue "postfix
+reload" to make the change effective. </p>
+
+<p> Note: setting the stress parameter in main.cf has no effect for
+services that accept remote connections. </p>
+
+<blockquote>
+<pre>
+1 /etc/postfix/master.cf:
+2 # =============================================================
+3 # service type private unpriv chroot wakeup maxproc command
+4 # =============================================================
+5 #
+6 smtp inet n - n - - smtpd
+7 -o stress=yes
+8 -o . . .
+</pre>
+</blockquote>
+
+<p> To permanently force stress-adaptive behavior off with a specific
+service, specify "-o stress=" on its master.cf command line. This
+may be desirable for the "submission" service. Issue "postfix reload"
+to make the change effective. </p>
+
+<p> Note: setting the stress parameter in main.cf has no effect for
+services that accept remote connections. </p>
+
+<blockquote>
+<pre>
+1 /etc/postfix/master.cf:
+2 # =============================================================
+3 # service type private unpriv chroot wakeup maxproc command
+4 # =============================================================
+5 #
+6 submission inet n - n - - smtpd
+7 -o stress=
+8 -o . . .
+</pre>
+</blockquote>
+
+<h2><a name="other"> Other measures to off-load zombies </a> </h2>
+
+<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides
+additional protection against mail server overload. One postscreen(8)
+process handles multiple inbound SMTP connections, and decides which
+clients may talk to a Postfix SMTP server process. By keeping
+spambots away, postscreen(8) leaves more SMTP server processes
+available for legitimate clients, and delays the onset of server
+overload conditions. </p>
+
+<h2><a name="credits"> Credits </a></h2>
+
+<ul>
+
+<li> Thanks to the postfix-users mailing list members for sharing
+early experiences with the stress-adaptive feature.
+
+<li> The RBL example and several other paragraphs of text were
+adapted from postfix-users postings by Noel Jones.
+
+<li> Wietse implemented stress-adaptive behavior as the smallest
+possible patch while he should be working on other things.
+
+</ul>
+
+</body> </html>