diff options
Diffstat (limited to 'proto/STRESS_README.html')
-rw-r--r-- | proto/STRESS_README.html | 566 |
1 files changed, 566 insertions, 0 deletions
diff --git a/proto/STRESS_README.html b/proto/STRESS_README.html new file mode 100644 index 0000000..30fa5f2 --- /dev/null +++ b/proto/STRESS_README.html @@ -0,0 +1,566 @@ +<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" + "http://www.w3.org/TR/html4/loose.dtd"> + +<html> + +<head> + +<title>Postfix Stress-Dependent Configuration</title> + +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + +</head> + +<body> + +<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix +Stress-Dependent Configuration</h1> + +<hr> + +<h2>Overview </h2> + +<p> This document describes the symptoms of Postfix SMTP server +overload. It presents permanent main.cf changes to avoid overload +during normal operation, and temporary main.cf changes to cope with +an unexpected burst of mail. This document makes specific suggestions +for Postfix 2.5 and later which support stress-adaptive behavior, +and for earlier Postfix versions that don't. </p> + +<p> Topics covered in this document: </p> + +<ul> + +<li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> + +<li><a href="#adapt"> Automatic stress-adaptive behavior </a> + +<li><a href="#concurrency"> Service more SMTP clients at the same time </a> + +<li><a href="#time"> Spend less time per SMTP client </a> + +<li><a href="#hangup"> Disconnect suspicious SMTP clients </a> + +<li><a href="#legacy"> Temporary measures for older Postfix releases </a> + +<li><a href="#feature"> Detecting support for stress-adaptive behavior </a> + +<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a> + +<li><a href="#other"> Other measures to off-load zombies </a> + +<li><a href="#credits"> Credits </a> + +</ul> + +<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2> + +<p> Under normal conditions, the Postfix SMTP server responds +immediately when an SMTP client connects to it; the time to deliver +mail is noticeable only with large messages. Performance degrades +dramatically when the number of SMTP clients exceeds the number of +Postfix SMTP server processes. When an SMTP client connects while +all Postfix SMTP server processes are busy, the client must wait +until a server process becomes available. </p> + +<p> SMTP server overload may be caused by a surge of legitimate +mail (example: a DNS registrar opens a new zone for registrations), +by mistake (mail explosion caused by a forwarding loop) or by malice +(worm outbreak, botnet, or other illegitimate activity). </p> + +<p> Symptoms of Postfix SMTP server overload are: </p> + +<ul> + +<li> <p> Remote SMTP clients experience a long delay before Postfix +sends the "220 hostname.example.com ESMTP Postfix" greeting. </p> + +<ul> + +<li> <p> NOTE: Broken DNS configurations can also cause lengthy +delays before Postfix sends "220 hostname.example.com ...". These +delays also exist when Postfix is NOT overloaded. </p> + +<li> <p> NOTE: To avoid "overload" delays for end-user mail +clients, enable the "submission" service entry in master.cf (present +since Postfix 2.1), and tell users to connect to this instead of +the public SMTP service. </p> + +</ul> + +<li> <p> The Postfix SMTP server logs an increased number of "lost +connection after CONNECT" events. This happens because remote SMTP +clients disconnect before Postfix answers the connection. </p> + +<ul> + +<li> <p> NOTE: A portscan for open SMTP ports can also result in +"lost connection ..." logfile messages. </p> + +</ul> + +<li> <p> Postfix 2.3 and later logs a warning that all server ports +are busy: </p> + +<pre> +Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp" + (25) has reached its process limit "30": new clients may experience + noticeable delays +Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this + condition, increase the process count in master.cf or reduce the + service time per client +Oct 3 20:39:27 spike postfix/master[28905]: warning: see + <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of + stress-adapting configuration settings +</pre> + +</ul> + +<p> Legitimate mail that doesn't get through during an episode of +Postfix SMTP server overload is not necessarily lost. It should +still arrive once the situation returns to normal, as long as the +overload condition is temporary. </p> + +<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2> + +<p> Postfix version 2.5 introduces automatic stress-adaptive behavior. +It works as follows. When a "public" network service such as the +SMTP server runs into an "all server ports are busy" condition, the +Postfix master(8) daemon logs a warning, restarts the service +(without interrupting existing network sessions), and runs the +service with "-o stress=yes" on the server process command line: +</p> + +<blockquote> +<pre> +80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes +</pre> +</blockquote> + +<p> Normally, the Postfix master(8) daemon runs such a service with +"-o stress=" on the command line (i.e. with an empty parameter +value): </p> + +<blockquote> +<pre> +83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= +</pre> +</blockquote> + +<p> You won't see "-o stress" command-line parameters with services +that have local clients only. These include services internal to +Postfix such as the queue manager, and services that listen on a +loopback interface only, such as after-filter SMTP services. </p> + +<p> The "stress" parameter value is the key to making main.cf +parameter settings stress adaptive. The following settings are the +default with Postfix 2.6 and later. </p> + +<blockquote> +<pre> +1 smtpd_timeout = ${stress?{10}:{300}}s +2 smtpd_hard_error_limit = ${stress?{1}:{20}} +3 smtpd_junk_command_limit = ${stress?{1}:{100}} +4 # Parameters added after Postfix 2.6: +5 smtpd_per_record_deadline = ${stress?{yes}:{no}} +6 smtpd_starttls_timeout = ${stress?{10}:{300}}s +7 address_verify_poll_count = ${stress?{1}:{3}} +</pre> +</blockquote> + +<p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y} +instead of the newer form ${stress?{x}:{y}}. </p> + +<p> The syntax of ${name?{value}:{value}}, ${name?value} and +${name:value} is explained at the beginning of the postconf(5) +manual page. </p> + +<p> Translation: <p> + +<ul> + +<li> <p> Line 1: under conditions of stress, use an smtpd_timeout +value of 10 seconds instead of the default 300 seconds. Experience +on the postfix-users list from a variety of sysadmins shows that +reducing the "normal" smtpd_timeout to 60s is unlikely to affect +legitimate clients. However, it is unlikely to become the Postfix +default because it's not RFC compliant. Setting smtpd_timeout to +10s or even 5s under stress will still allow most +legitimate clients to connect and send mail, but may delay mail +from some clients. No mail should be lost, as long as this measure +is used only temporarily. </p> + +<li> <p> Line 2: under conditions of stress, use an smtpd_hard_error_limit +of 1 instead of the default 20. This disconnects clients +after a single error, giving other clients a chance to connect. +However, this may cause significant delays with legitimate mail, +such as a mailing list that contains a few no-longer-active user +names that didn't bother to unsubscribe. No mail should be lost, +as long as this measure is used only temporarily. </p> + +<li> <p> Line 3: under conditions of stress, use an +smtpd_junk_command_limit of 1 instead of the default 100. This +prevents clients from keeping connections open by repeatedly +sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p> + +<li> <p> Line 5: under conditions of stress, change the behavior +of smtpd_timeout and smtpd_starttls_timeout, from a time limit per +read or write system call, to a time limit to send or receive a +complete record (an SMTP command line, SMTP response line, SMTP +message content line, or TLS protocol message). </p> + +<li> <p> Line 6: under conditions of stress, reduce the time limit +for TLS protocol handshake messages to 10 seconds, from the default +value of 300 seconds. See also the smtpd_timeout discussion above. +</p> + +<li> <p> Line 7: under conditions of stress, do not wait up to 6 +seconds for the completion of an address verification probe. If the +result is not already in the address verification cache, reply +immediately with $unverified_recipient_tempfail_action or +$unverified_sender_tempfail_action. No mail should be lost, as long +as this measure is used only temporarily. </p> + +</ul> + +<p> NOTE: Please keep in mind that the stress-adaptive feature is +a fairly desperate measure to keep <b>some</b> legitimate mail +flowing under overload conditions. If a site is reaching the SMTP +server process limit when there isn't an attack or bot flood +occurring, then either the process limit needs to be raised or more +hardware needs to be added. </p> + +<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2> + +<p> This section and the ones that follow discuss permanent measures +against mail server overload. </p> + +<p> One measure to avoid the "all server processes busy" condition +is to service more SMTP clients simultaneously. For this you need +to increase the number of Postfix SMTP server processes. This will +improve the +responsiveness for remote SMTP clients, as long as the server machine +has enough hardware and software resources to run the additional +processes, and as long as the file system can keep up with the +additional load. </p> + +<ul> + +<li> <p> You increase the number of SMTP server processes either +by increasing the default_process_limit in main.cf (line 3 below), +or by increasing the SMTP server's "maxproc" field in master.cf +(line 10 below). Either way, you need to issue a "postfix reload" +command to make the change effective. </p> + +<li> <p> Process limits above 1000 require Postfix version 2.4 or +later, and an operating system that supports kernel-based event +filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll). +</p> + +<li> <p> More processes use more memory. You can reduce the Postfix +memory footprint by using cdb: +lookup tables instead of Berkeley DB's hash: or btree: tables. </p> + +<pre> + 1 /etc/postfix/main.cf: + 2 # Raise the global process limit, 100 since Postfix 2.0. + 3 default_process_limit = 200 + 4 + 5 /etc/postfix/master.cf: + 6 # ============================================================= + 7 # service type private unpriv chroot wakeup maxproc command + 8 # ============================================================= + 9 # Raise the SMTP service process limit only. +10 smtp inet n - n - 200 smtpd +</pre> + +<li> <p> NOTE: older versions of the SMTPD_POLICY_README document +contain a mistake: they configure a fixed number of policy daemon +processes. When you raise the SMTP server's "maxproc" field in +master.cf, SMTP server processes will report problems when connecting +to policy server processes, because there aren't enough of them. +Examples of errors are "connection refused" or "operation timed +out". </p> + +<p> To fix, edit master.cf and specify a zero "maxproc" field +in all policy server entries; see line 6 in the example below. +Issue a "postfix reload" command to make the change effective. </p> + +<pre> +1 /etc/postfix/master.cf: +2 # ============================================================= +3 # service type private unpriv chroot wakeup maxproc command +4 # ============================================================= +5 # Disable the policy service process limit. +6 policy unix - n n - 0 spawn +7 user=nobody argv=/some/where/policy-server +</pre> + +</ul> + +<h2><a name="time"> Spend less time per SMTP client </a></h2> + +<p> When increasing the number of SMTP server processes is not +practical, you can improve Postfix server responsiveness by eliminating +delays. When Postfix spends less time per SMTP session, the same +number of SMTP server processes can service more clients in a given +amount of time. </p> + +<ul> + +<li> <p> Eliminate non-functional RBL lookups (blocklists that are +no longer in operation). These lookups can degrade performance. +Postfix logs a warning when an RBL server does not respond. </p> + +<li> <p> Eliminate redundant RBL lookups (people often use multiple +Spamhaus RBLs that include each other). To find out whether RBLs +include other RBLs, look up the websites that document the RBL's +policies. </p> + +<li> <p> Eliminate header_checks and body_checks, and keep just a few +emergency patterns to block the latest worm explosion or backscatter +mail. See BACKSCATTER_README for examples of the latter. + +<li> <p> Group your header_checks and body_checks patterns to avoid +unnecessary pattern matching operations: + +<pre> + 1 /etc/postfix/header_checks: + 2 if /^Subject:/ + 3 /^Subject: virus found in mail from you/ reject + 4 /^Subject: ..other../ reject + 5 endif + 6 + 7 if /^Received:/ + 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1 + 9 /^Received: from ..other../ reject .... +10 endif +</pre> + +</ul> + +<h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2> + +<p> Under conditions of overload you can improve Postfix SMTP server +responsiveness by hanging up on suspicious clients, so that other +clients get a chance to talk to Postfix. </p> + +<ul> + +<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" +(Postfix 2.3-2.5) to hang up on clients that that match botnet-related +RBLs (see next bullet) or that match selected non-RBL restrictions +such as SMTP access maps. The Postfix SMTP server will reject mail +and disconnect without waiting for the remote SMTP client to send +a QUIT command. </p> + +<li> <p> To hang up connections from denylisted zombies, you can +set specific Postfix SMTP server reject codes for specific RBLs, +and for individual responses from specific RBLs. We'll use +zen.spamhaus.org as an example; by the time you read this document, +details may have changed. Right now, their documents say that a +response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP +address, which means that the machine is probably running a bot of +some kind. To give a 521 response instead of the default 554 +response, use something like: </p> + +<pre> + 1 /etc/postfix/main.cf: + 2 smtpd_client_restrictions = + 3 permit_mynetworks + 4 reject_rbl_client zen.spamhaus.org=127.0.0.10 + 5 reject_rbl_client zen.spamhaus.org=127.0.0.11 + 6 reject_rbl_client zen.spamhaus.org + 7 + 8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps + 9 +10 /etc/postfix/rbl_reply_maps: +11 # With Postfix 2.3-2.5 use "421" to hang up connections. +12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable; +13 $rbl_class [$rbl_what] blocked using +14 $rbl_domain${rbl_reason?; $rbl_reason} +15 +16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable; +17 $rbl_class [$rbl_what] blocked using +18 $rbl_domain${rbl_reason?; $rbl_reason} +</pre> + +<p> Although the above example shows three RBL lookups (lines 4-6), +Postfix will only do a single DNS query, so it does not affect the +performance. </p> + +<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not +cause Postfix to disconnect). The down-side of replying with 421 +is that it works only for zombies and other malware. If the client +is running a real MTA, then it may connect again several times until +the mail expires in its queue. When this is a problem, stick with +the default 554 reply, and use "smtpd_hard_error_limit = 1" as +described below. </p> + +<li> <p> You can automatically turn on the above overload measure +with Postfix 2.5 and later, or with earlier releases that contain +the stress-adaptive behavior source code patch from the mirrors +listed at http://www.postfix.org/download.html. Simply replace line +above 8 with: </p> + +<pre> + 8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps} +</pre> + +</ul> + +<p> More information about automatic stress-adaptive behavior is +in section "<a href="#adapt">Automatic stress-adaptive behavior</a>". +</p> + +<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2> + +<p> See the section "<a href="#adapt">Automatic stress-adaptive +behavior</a>" if you are running Postfix version 2.5 or later, or +if you have applied the source code patch for stress-adaptive +behavior from the mirrors listed at http://www.postfix.org/download.html. +</p> + +<p> The following measures can be applied temporarily during overload. +They still allow <b>most</b> legitimate clients to connect and send +mail, but may affect some legitimate clients. </p> + +<ul> + +<li> <p> Reduce smtpd_timeout (default: 300s). Experience on the +postfix-users list from a variety of sysadmins shows that reducing +the "normal" smtpd_timeout to 60s is unlikely to affect legitimate +clients. However, it is unlikely to become the Postfix default +because it's not RFC compliant. Setting smtpd_timeout to 10s (line +2 below) or even 5s under stress will still allow <b>most</b> +legitimate clients to connect and send mail, but may delay mail +from some clients. No mail should be lost, as long as this measure +is used only temporarily. </p> + +<li> <p> Reduce smtpd_hard_error_limit (default: 20). Setting this +to 1 under stress (line 3 below) helps by disconnecting clients +after a single error, giving other clients a chance to connect. +However, this may cause significant delays with legitimate mail, +such as a mailing list that contains a few no-longer-active user +names that didn't bother to unsubscribe. No mail should be lost, +as long as this measure is used only temporarily. </p> + +<li> <p> Use an smtpd_junk_command_limit of 1 instead of the default +100. This prevents clients from keeping idle connections open by +repeatedly sending NOOP or RSET commands. </p> + +</ul> + +<blockquote> +<pre> +1 /etc/postfix/main.cf: +2 smtpd_timeout = 10 +3 smtpd_hard_error_limit = 1 +4 smtpd_junk_command_limit = 1 +</pre> +</blockquote> + +<p> With these measures, no mail should be lost, as long +as these measures are used only temporarily. The next section of +this document introduces a way to automate this process. </p> + +<h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2> + +<p> To find out if your Postfix installation supports stress-adaptive +behavior, use the "ps" command, and look for the smtpd processes. +Postfix has stress-adaptive support when you see "-o stress=" or +"-o stress=yes" command-line options. Remember that Postfix never +enables stress-adaptive behavior on servers that listen on local +addresses only. </p> + +<p> The following example is for FreeBSD or Linux. On Solaris, HP-UX +and other System-V flavors, use "ps -ef" instead of "ps ax". </p> + +<blockquote> +<pre> +$ ps ax|grep smtpd +83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= +84345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl +</pre> +</blockquote> + +<p> You can't use postconf(1) to detect stress-adaptive support. +The postconf(1) command ignores the existence of the stress parameter +in main.cf, because the parameter has no effect there. Command-line +"-o parameter" settings always take precedence over main.cf parameter +settings. <p> + +<p> If you configure stress-adaptive behavior in main.cf when it +isn't supported, nothing bad will happen. The processes will run +as if the stress parameter always has an empty value. </p> + +<h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2> + +<p> You can manually force stress-adaptive behavior on, by adding +a "-o stress=yes" command-line option in master.cf. This can be +useful for testing overrides on the SMTP service. Issue "postfix +reload" to make the change effective. </p> + +<p> Note: setting the stress parameter in main.cf has no effect for +services that accept remote connections. </p> + +<blockquote> +<pre> +1 /etc/postfix/master.cf: +2 # ============================================================= +3 # service type private unpriv chroot wakeup maxproc command +4 # ============================================================= +5 # +6 smtp inet n - n - - smtpd +7 -o stress=yes +8 -o . . . +</pre> +</blockquote> + +<p> To permanently force stress-adaptive behavior off with a specific +service, specify "-o stress=" on its master.cf command line. This +may be desirable for the "submission" service. Issue "postfix reload" +to make the change effective. </p> + +<p> Note: setting the stress parameter in main.cf has no effect for +services that accept remote connections. </p> + +<blockquote> +<pre> +1 /etc/postfix/master.cf: +2 # ============================================================= +3 # service type private unpriv chroot wakeup maxproc command +4 # ============================================================= +5 # +6 submission inet n - n - - smtpd +7 -o stress= +8 -o . . . +</pre> +</blockquote> + +<h2><a name="other"> Other measures to off-load zombies </a> </h2> + +<p> The postscreen(8) daemon, introduced with Postfix 2.8, provides +additional protection against mail server overload. One postscreen(8) +process handles multiple inbound SMTP connections, and decides which +clients may talk to a Postfix SMTP server process. By keeping +spambots away, postscreen(8) leaves more SMTP server processes +available for legitimate clients, and delays the onset of server +overload conditions. </p> + +<h2><a name="credits"> Credits </a></h2> + +<ul> + +<li> Thanks to the postfix-users mailing list members for sharing +early experiences with the stress-adaptive feature. + +<li> The RBL example and several other paragraphs of text were +adapted from postfix-users postings by Noel Jones. + +<li> Wietse implemented stress-adaptive behavior as the smallest +possible patch while he should be working on other things. + +</ul> + +</body> </html> |