<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Postfix Stress-Dependent Configuration</title> <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> </head> <body> <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Stress-Dependent Configuration</h1> <hr> <h2>Overview </h2> <p> This document describes the symptoms of Postfix SMTP server overload. It presents permanent <a href="postconf.5.html">main.cf</a> changes to avoid overload during normal operation, and temporary <a href="postconf.5.html">main.cf</a> changes to cope with an unexpected burst of mail. This document makes specific suggestions for Postfix 2.5 and later which support stress-adaptive behavior, and for earlier Postfix versions that don't. </p> <p> Topics covered in this document: </p> <ul> <li><a href="#overload"> Symptoms of Postfix SMTP server overload </a> <li><a href="#adapt"> Automatic stress-adaptive behavior </a> <li><a href="#concurrency"> Service more SMTP clients at the same time </a> <li><a href="#time"> Spend less time per SMTP client </a> <li><a href="#hangup"> Disconnect suspicious SMTP clients </a> <li><a href="#legacy"> Temporary measures for older Postfix releases </a> <li><a href="#feature"> Detecting support for stress-adaptive behavior </a> <li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a> <li><a href="#other"> Other measures to off-load zombies </a> <li><a href="#credits"> Credits </a> </ul> <h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2> <p> Under normal conditions, the Postfix SMTP server responds immediately when an SMTP client connects to it; the time to deliver mail is noticeable only with large messages. Performance degrades dramatically when the number of SMTP clients exceeds the number of Postfix SMTP server processes. When an SMTP client connects while all Postfix SMTP server processes are busy, the client must wait until a server process becomes available. </p> <p> SMTP server overload may be caused by a surge of legitimate mail (example: a DNS registrar opens a new zone for registrations), by mistake (mail explosion caused by a forwarding loop) or by malice (worm outbreak, botnet, or other illegitimate activity). </p> <p> Symptoms of Postfix SMTP server overload are: </p> <ul> <li> <p> Remote SMTP clients experience a long delay before Postfix sends the "220 hostname.example.com ESMTP Postfix" greeting. </p> <ul> <li> <p> NOTE: Broken DNS configurations can also cause lengthy delays before Postfix sends "220 hostname.example.com ...". These delays also exist when Postfix is NOT overloaded. </p> <li> <p> NOTE: To avoid "overload" delays for end-user mail clients, enable the "submission" service entry in <a href="master.5.html">master.cf</a> (present since Postfix 2.1), and tell users to connect to this instead of the public SMTP service. </p> </ul> <li> <p> The Postfix SMTP server logs an increased number of "lost connection after CONNECT" events. This happens because remote SMTP clients disconnect before Postfix answers the connection. </p> <ul> <li> <p> NOTE: A portscan for open SMTP ports can also result in "lost connection ..." logfile messages. </p> </ul> <li> <p> Postfix 2.3 and later logs a warning that all server ports are busy: </p> <pre> Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp" (25) has reached its process limit "30": new clients may experience noticeable delays Oct 3 20:39:27 spike postfix/master[28905]: warning: to avoid this condition, increase the process count in <a href="master.5.html">master.cf</a> or reduce the service time per client Oct 3 20:39:27 spike postfix/master[28905]: warning: see <a href="http://www.postfix.org/STRESS_README.html">http://www.postfix.org/STRESS_README.html</a> for examples of stress-adapting configuration settings </pre> </ul> <p> Legitimate mail that doesn't get through during an episode of Postfix SMTP server overload is not necessarily lost. It should still arrive once the situation returns to normal, as long as the overload condition is temporary. </p> <h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2> <p> Postfix version 2.5 introduces automatic stress-adaptive behavior. It works as follows. When a "public" network service such as the SMTP server runs into an "all server ports are busy" condition, the Postfix <a href="master.8.html">master(8)</a> daemon logs a warning, restarts the service (without interrupting existing network sessions), and runs the service with "-o stress=yes" on the server process command line: </p> <blockquote> <pre> 80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes </pre> </blockquote> <p> Normally, the Postfix <a href="master.8.html">master(8)</a> daemon runs such a service with "-o stress=" on the command line (i.e. with an empty parameter value): </p> <blockquote> <pre> 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= </pre> </blockquote> <p> You won't see "-o stress" command-line parameters with services that have local clients only. These include services internal to Postfix such as the queue manager, and services that listen on a loopback interface only, such as after-filter SMTP services. </p> <p> The "stress" parameter value is the key to making <a href="postconf.5.html">main.cf</a> parameter settings stress adaptive. The following settings are the default with Postfix 2.6 and later. </p> <blockquote> <pre> 1 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = ${stress?{10}:{300}}s 2 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = ${stress?{1}:{20}} 3 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> = ${stress?{1}:{100}} 4 # Parameters added after Postfix 2.6: 5 <a href="postconf.5.html#smtpd_per_record_deadline">smtpd_per_record_deadline</a> = ${stress?{yes}:{no}} 6 <a href="postconf.5.html#smtpd_starttls_timeout">smtpd_starttls_timeout</a> = ${stress?{10}:{300}}s 7 <a href="postconf.5.html#address_verify_poll_count">address_verify_poll_count</a> = ${stress?{1}:{3}} </pre> </blockquote> <p> Postfix versions before 3.0 use the older form ${stress?x}${stress:y} instead of the newer form ${stress?{x}:{y}}. </p> <p> The syntax of ${name?{value}:{value}}, ${name?value} and ${name:value} is explained at the beginning of the <a href="postconf.5.html">postconf(5)</a> manual page. </p> <p> Translation: <p> <ul> <li> <p> Line 1: under conditions of stress, use an <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> value of 10 seconds instead of the default 300 seconds. Experience on the postfix-users list from a variety of sysadmins shows that reducing the "normal" <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 60s is unlikely to affect legitimate clients. However, it is unlikely to become the Postfix default because it's not RFC compliant. Setting <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 10s or even 5s under stress will still allow most legitimate clients to connect and send mail, but may delay mail from some clients. No mail should be lost, as long as this measure is used only temporarily. </p> <li> <p> Line 2: under conditions of stress, use an <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> of 1 instead of the default 20. This disconnects clients after a single error, giving other clients a chance to connect. However, this may cause significant delays with legitimate mail, such as a mailing list that contains a few no-longer-active user names that didn't bother to unsubscribe. No mail should be lost, as long as this measure is used only temporarily. </p> <li> <p> Line 3: under conditions of stress, use an <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> of 1 instead of the default 100. This prevents clients from keeping connections open by repeatedly sending HELO, EHLO, NOOP, RSET, VRFY or ETRN commands. </p> <li> <p> Line 5: under conditions of stress, change the behavior of <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> and <a href="postconf.5.html#smtpd_starttls_timeout">smtpd_starttls_timeout</a>, from a time limit per read or write system call, to a time limit to send or receive a complete record (an SMTP command line, SMTP response line, SMTP message content line, or TLS protocol message). </p> <li> <p> Line 6: under conditions of stress, reduce the time limit for TLS protocol handshake messages to 10 seconds, from the default value of 300 seconds. See also the <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> discussion above. </p> <li> <p> Line 7: under conditions of stress, do not wait up to 6 seconds for the completion of an address verification probe. If the result is not already in the address verification cache, reply immediately with $<a href="postconf.5.html#unverified_recipient_tempfail_action">unverified_recipient_tempfail_action</a> or $<a href="postconf.5.html#unverified_sender_tempfail_action">unverified_sender_tempfail_action</a>. No mail should be lost, as long as this measure is used only temporarily. </p> </ul> <p> NOTE: Please keep in mind that the stress-adaptive feature is a fairly desperate measure to keep <b>some</b> legitimate mail flowing under overload conditions. If a site is reaching the SMTP server process limit when there isn't an attack or bot flood occurring, then either the process limit needs to be raised or more hardware needs to be added. </p> <h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2> <p> This section and the ones that follow discuss permanent measures against mail server overload. </p> <p> One measure to avoid the "all server processes busy" condition is to service more SMTP clients simultaneously. For this you need to increase the number of Postfix SMTP server processes. This will improve the responsiveness for remote SMTP clients, as long as the server machine has enough hardware and software resources to run the additional processes, and as long as the file system can keep up with the additional load. </p> <ul> <li> <p> You increase the number of SMTP server processes either by increasing the <a href="postconf.5.html#default_process_limit">default_process_limit</a> in <a href="postconf.5.html">main.cf</a> (line 3 below), or by increasing the SMTP server's "maxproc" field in <a href="master.5.html">master.cf</a> (line 10 below). Either way, you need to issue a "postfix reload" command to make the change effective. </p> <li> <p> Process limits above 1000 require Postfix version 2.4 or later, and an operating system that supports kernel-based event filters (BSD kqueue(2), Linux epoll(4), or Solaris /dev/poll). </p> <li> <p> More processes use more memory. You can reduce the Postfix memory footprint by using <a href="CDB_README.html">cdb</a>: lookup tables instead of Berkeley DB's <a href="DATABASE_README.html#types">hash</a>: or <a href="DATABASE_README.html#types">btree</a>: tables. </p> <pre> 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 2 # Raise the global process limit, 100 since Postfix 2.0. 3 <a href="postconf.5.html#default_process_limit">default_process_limit</a> = 200 4 5 /etc/postfix/<a href="master.5.html">master.cf</a>: 6 # ============================================================= 7 # service type private unpriv chroot wakeup maxproc command 8 # ============================================================= 9 # Raise the SMTP service process limit only. 10 smtp inet n - n - 200 smtpd </pre> <li> <p> NOTE: older versions of the <a href="SMTPD_POLICY_README.html">SMTPD_POLICY_README</a> document contain a mistake: they configure a fixed number of policy daemon processes. When you raise the SMTP server's "maxproc" field in <a href="master.5.html">master.cf</a>, SMTP server processes will report problems when connecting to policy server processes, because there aren't enough of them. Examples of errors are "connection refused" or "operation timed out". </p> <p> To fix, edit <a href="master.5.html">master.cf</a> and specify a zero "maxproc" field in all policy server entries; see line 6 in the example below. Issue a "postfix reload" command to make the change effective. </p> <pre> 1 /etc/postfix/<a href="master.5.html">master.cf</a>: 2 # ============================================================= 3 # service type private unpriv chroot wakeup maxproc command 4 # ============================================================= 5 # Disable the policy service process limit. 6 policy unix - n n - 0 spawn 7 user=nobody argv=/some/where/policy-server </pre> </ul> <h2><a name="time"> Spend less time per SMTP client </a></h2> <p> When increasing the number of SMTP server processes is not practical, you can improve Postfix server responsiveness by eliminating delays. When Postfix spends less time per SMTP session, the same number of SMTP server processes can service more clients in a given amount of time. </p> <ul> <li> <p> Eliminate non-functional RBL lookups (blocklists that are no longer in operation). These lookups can degrade performance. Postfix logs a warning when an RBL server does not respond. </p> <li> <p> Eliminate redundant RBL lookups (people often use multiple Spamhaus RBLs that include each other). To find out whether RBLs include other RBLs, look up the websites that document the RBL's policies. </p> <li> <p> Eliminate <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a>, and keep just a few emergency patterns to block the latest worm explosion or backscatter mail. See <a href="BACKSCATTER_README.html">BACKSCATTER_README</a> for examples of the latter. <li> <p> Group your <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a> patterns to avoid unnecessary pattern matching operations: <pre> 1 /etc/postfix/header_checks: 2 if /^Subject:/ 3 /^Subject: virus found in mail from you/ reject 4 /^Subject: ..other../ reject 5 endif 6 7 if /^Received:/ 8 /^Received: from (postfix\.org) / reject forged client name in received header: $1 9 /^Received: from ..other../ reject .... 10 endif </pre> </ul> <h2><a name="hangup"> Disconnect suspicious SMTP clients </a></h2> <p> Under conditions of overload you can improve Postfix SMTP server responsiveness by hanging up on suspicious clients, so that other clients get a chance to talk to Postfix. </p> <ul> <li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" (Postfix 2.3-2.5) to hang up on clients that that match botnet-related RBLs (see next bullet) or that match selected non-RBL restrictions such as SMTP access maps. The Postfix SMTP server will reject mail and disconnect without waiting for the remote SMTP client to send a QUIT command. </p> <li> <p> To hang up connections from blacklisted zombies, you can set specific Postfix SMTP server reject codes for specific RBLs, and for individual responses from specific RBLs. We'll use zen.spamhaus.org as an example; by the time you read this document, details may have changed. Right now, their documents say that a response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP address, which means that the machine is probably running a bot of some kind. To give a 521 response instead of the default 554 response, use something like: </p> <pre> 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 2 <a href="postconf.5.html#smtpd_client_restrictions">smtpd_client_restrictions</a> = 3 <a href="postconf.5.html#permit_mynetworks">permit_mynetworks</a> 4 <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org=127.0.0.10 5 <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org=127.0.0.11 6 <a href="postconf.5.html#reject_rbl_client">reject_rbl_client</a> zen.spamhaus.org 7 8 <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = <a href="DATABASE_README.html#types">hash</a>:/etc/postfix/rbl_reply_maps 9 10 /etc/postfix/rbl_reply_maps: 11 # With Postfix 2.3-2.5 use "421" to hang up connections. 12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable; 13 $rbl_class [$rbl_what] blocked using 14 $rbl_domain${rbl_reason?; $rbl_reason} 15 16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable; 17 $rbl_class [$rbl_what] blocked using 18 $rbl_domain${rbl_reason?; $rbl_reason} </pre> <p> Although the above example shows three RBL lookups (lines 4-6), Postfix will only do a single DNS query, so it does not affect the performance. </p> <li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not cause Postfix to disconnect). The down-side of replying with 421 is that it works only for zombies and other malware. If the client is running a real MTA, then it may connect again several times until the mail expires in its queue. When this is a problem, stick with the default 554 reply, and use "<a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1" as described below. </p> <li> <p> You can automatically turn on the above overload measure with Postfix 2.5 and later, or with earlier releases that contain the stress-adaptive behavior source code patch from the mirrors listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>. Simply replace line above 8 with: </p> <pre> 8 <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = ${stress?<a href="DATABASE_README.html#types">hash</a>:/etc/postfix/rbl_reply_maps} </pre> </ul> <p> More information about automatic stress-adaptive behavior is in section "<a href="#adapt">Automatic stress-adaptive behavior</a>". </p> <h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2> <p> See the section "<a href="#adapt">Automatic stress-adaptive behavior</a>" if you are running Postfix version 2.5 or later, or if you have applied the source code patch for stress-adaptive behavior from the mirrors listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>. </p> <p> The following measures can be applied temporarily during overload. They still allow <b>most</b> legitimate clients to connect and send mail, but may affect some legitimate clients. </p> <ul> <li> <p> Reduce <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> (default: 300s). Experience on the postfix-users list from a variety of sysadmins shows that reducing the "normal" <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 60s is unlikely to affect legitimate clients. However, it is unlikely to become the Postfix default because it's not RFC compliant. Setting <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 10s (line 2 below) or even 5s under stress will still allow <b>most</b> legitimate clients to connect and send mail, but may delay mail from some clients. No mail should be lost, as long as this measure is used only temporarily. </p> <li> <p> Reduce <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> (default: 20). Setting this to 1 under stress (line 3 below) helps by disconnecting clients after a single error, giving other clients a chance to connect. However, this may cause significant delays with legitimate mail, such as a mailing list that contains a few no-longer-active user names that didn't bother to unsubscribe. No mail should be lost, as long as this measure is used only temporarily. </p> <li> <p> Use an <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> of 1 instead of the default 100. This prevents clients from keeping idle connections open by repeatedly sending NOOP or RSET commands. </p> </ul> <blockquote> <pre> 1 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 2 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = 10 3 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1 4 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> = 1 </pre> </blockquote> <p> With these measures, no mail should be lost, as long as these measures are used only temporarily. The next section of this document introduces a way to automate this process. </p> <h2><a name="feature"> Detecting support for stress-adaptive behavior </a></h2> <p> To find out if your Postfix installation supports stress-adaptive behavior, use the "ps" command, and look for the smtpd processes. Postfix has stress-adaptive support when you see "-o stress=" or "-o stress=yes" command-line options. Remember that Postfix never enables stress-adaptive behavior on servers that listen on local addresses only. </p> <p> The following example is for FreeBSD or Linux. On Solaris, HP-UX and other System-V flavors, use "ps -ef" instead of "ps ax". </p> <blockquote> <pre> $ ps ax|grep smtpd 83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress= 84345 ?? Ss 0:00.11 /usr/bin/perl /usr/libexec/postfix/smtpd-policy.pl </pre> </blockquote> <p> You can't use <a href="postconf.1.html">postconf(1)</a> to detect stress-adaptive support. The <a href="postconf.1.html">postconf(1)</a> command ignores the existence of the stress parameter in <a href="postconf.5.html">main.cf</a>, because the parameter has no effect there. Command-line "-o parameter" settings always take precedence over <a href="postconf.5.html">main.cf</a> parameter settings. <p> <p> If you configure stress-adaptive behavior in <a href="postconf.5.html">main.cf</a> when it isn't supported, nothing bad will happen. The processes will run as if the stress parameter always has an empty value. </p> <h2><a name="forcing"> Forcing stress-adaptive behavior on or off </a></h2> <p> You can manually force stress-adaptive behavior on, by adding a "-o stress=yes" command-line option in <a href="master.5.html">master.cf</a>. This can be useful for testing overrides on the SMTP service. Issue "postfix reload" to make the change effective. </p> <p> Note: setting the stress parameter in <a href="postconf.5.html">main.cf</a> has no effect for services that accept remote connections. </p> <blockquote> <pre> 1 /etc/postfix/<a href="master.5.html">master.cf</a>: 2 # ============================================================= 3 # service type private unpriv chroot wakeup maxproc command 4 # ============================================================= 5 # 6 smtp inet n - n - - smtpd 7 -o stress=yes 8 -o . . . </pre> </blockquote> <p> To permanently force stress-adaptive behavior off with a specific service, specify "-o stress=" on its <a href="master.5.html">master.cf</a> command line. This may be desirable for the "submission" service. Issue "postfix reload" to make the change effective. </p> <p> Note: setting the stress parameter in <a href="postconf.5.html">main.cf</a> has no effect for services that accept remote connections. </p> <blockquote> <pre> 1 /etc/postfix/<a href="master.5.html">master.cf</a>: 2 # ============================================================= 3 # service type private unpriv chroot wakeup maxproc command 4 # ============================================================= 5 # 6 submission inet n - n - - smtpd 7 -o stress= 8 -o . . . </pre> </blockquote> <h2><a name="other"> Other measures to off-load zombies </a> </h2> <p> The <a href="postscreen.8.html">postscreen(8)</a> daemon, introduced with Postfix 2.8, provides additional protection against mail server overload. One <a href="postscreen.8.html">postscreen(8)</a> process handles multiple inbound SMTP connections, and decides which clients may to talk to a Postfix SMTP server process. By keeping spambots away, <a href="postscreen.8.html">postscreen(8)</a> leaves more SMTP server processes available for legitimate clients, and delays the onset of server overload conditions. </p> <h2><a name="credits"> Credits </a></h2> <ul> <li> Thanks to the postfix-users mailing list members for sharing early experiences with the stress-adaptive feature. <li> The RBL example and several other paragraphs of text were adapted from postfix-users postings by Noel Jones. <li> Wietse implemented stress-adaptive behavior as the smallest possible patch while he should be working on other things. </ul> </body> </html>