diff options
Diffstat (limited to '')
-rw-r--r-- | doc/src/sgml/html/kernel-resources.html | 544 |
1 files changed, 544 insertions, 0 deletions
diff --git a/doc/src/sgml/html/kernel-resources.html b/doc/src/sgml/html/kernel-resources.html new file mode 100644 index 0000000..fc2d261 --- /dev/null +++ b/doc/src/sgml/html/kernel-resources.html @@ -0,0 +1,544 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>19.4. Managing Kernel Resources</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="server-start.html" title="19.3. Starting the Database Server" /><link rel="next" href="server-shutdown.html" title="19.5. Shutting Down the Server" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">19.4. Managing Kernel Resources</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="server-start.html" title="19.3. Starting the Database Server">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="runtime.html" title="Chapter 19. Server Setup and Operation">Up</a></td><th width="60%" align="center">Chapter 19. Server Setup and Operation</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="server-shutdown.html" title="19.5. Shutting Down the Server">Next</a></td></tr></table><hr /></div><div class="sect1" id="KERNEL-RESOURCES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">19.4. Managing Kernel Resources <a href="#KERNEL-RESOURCES" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="kernel-resources.html#SYSVIPC">19.4.1. Shared Memory and Semaphores</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#SYSTEMD-REMOVEIPC">19.4.2. systemd RemoveIPC</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#KERNEL-RESOURCES-LIMITS">19.4.3. Resource Limits</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#LINUX-MEMORY-OVERCOMMIT">19.4.4. Linux Memory Overcommit</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#LINUX-HUGE-PAGES">19.4.5. Linux Huge Pages</a></span></dt></dl></div><p> + <span class="productname">PostgreSQL</span> can sometimes exhaust various operating system + resource limits, especially when multiple copies of the server are running + on the same system, or in very large installations. This section explains + the kernel resources used by <span class="productname">PostgreSQL</span> and the steps you + can take to resolve problems related to kernel resource consumption. + </p><div class="sect2" id="SYSVIPC"><div class="titlepage"><div><div><h3 class="title">19.4.1. Shared Memory and Semaphores <a href="#SYSVIPC" class="id_link">#</a></h3></div></div></div><a id="id-1.6.6.7.3.2" class="indexterm"></a><a id="id-1.6.6.7.3.3" class="indexterm"></a><p> + <span class="productname">PostgreSQL</span> requires the operating system to provide + inter-process communication (<acronym class="acronym">IPC</acronym>) features, specifically + shared memory and semaphores. Unix-derived systems typically provide + <span class="quote">“<span class="quote"><span class="systemitem">System V</span></span>”</span> <acronym class="acronym">IPC</acronym>, + <span class="quote">“<span class="quote"><span class="systemitem">POSIX</span></span>”</span> <acronym class="acronym">IPC</acronym>, or both. + <span class="systemitem">Windows</span> has its own implementation of + these features and is not discussed here. + </p><p> + By default, <span class="productname">PostgreSQL</span> allocates + a very small amount of System V shared memory, as well as a much larger + amount of anonymous <code class="function">mmap</code> shared memory. + Alternatively, a single large System V shared memory region can be used + (see <a class="xref" href="runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE">shared_memory_type</a>). + + In addition a significant number of semaphores, which can be either + System V or POSIX style, are created at server startup. Currently, + POSIX semaphores are used on Linux and FreeBSD systems while other + platforms use System V semaphores. + </p><p> + System V <acronym class="acronym">IPC</acronym> features are typically constrained by + system-wide allocation limits. + When <span class="productname">PostgreSQL</span> exceeds one of these limits, + the server will refuse to start and + should leave an instructive error message describing the problem + and what to do about it. (See also <a class="xref" href="server-start.html#SERVER-START-FAILURES" title="19.3.1. Server Start-up Failures">Section 19.3.1</a>.) The relevant kernel + parameters are named consistently across different systems; <a class="xref" href="kernel-resources.html#SYSVIPC-PARAMETERS" title="Table 19.1. System V IPC Parameters">Table 19.1</a> gives an overview. The methods to set + them, however, vary. Suggestions for some platforms are given below. + </p><div class="table" id="SYSVIPC-PARAMETERS"><p class="title"><strong>Table 19.1. <span class="systemitem">System V</span> <acronym class="acronym">IPC</acronym> Parameters</strong></p><div class="table-contents"><table class="table" summary="System V IPC Parameters" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /></colgroup><thead><tr><th>Name</th><th>Description</th><th>Values needed to run one <span class="productname">PostgreSQL</span> instance</th></tr></thead><tbody><tr><td><code class="varname">SHMMAX</code></td><td>Maximum size of shared memory segment (bytes)</td><td>at least 1kB, but the default is usually much higher</td></tr><tr><td><code class="varname">SHMMIN</code></td><td>Minimum size of shared memory segment (bytes)</td><td>1</td></tr><tr><td><code class="varname">SHMALL</code></td><td>Total amount of shared memory available (bytes or pages)</td><td>same as <code class="varname">SHMMAX</code> if bytes, + or <code class="literal">ceil(SHMMAX/PAGE_SIZE)</code> if pages, + plus room for other applications</td></tr><tr><td><code class="varname">SHMSEG</code></td><td>Maximum number of shared memory segments per process</td><td>only 1 segment is needed, but the default is much higher</td></tr><tr><td><code class="varname">SHMMNI</code></td><td>Maximum number of shared memory segments system-wide</td><td>like <code class="varname">SHMSEG</code> plus room for other applications</td></tr><tr><td><code class="varname">SEMMNI</code></td><td>Maximum number of semaphore identifiers (i.e., sets)</td><td>at least <code class="literal">ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16)</code> plus room for other applications</td></tr><tr><td><code class="varname">SEMMNS</code></td><td>Maximum number of semaphores system-wide</td><td><code class="literal">ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) * 17</code> plus room for other applications</td></tr><tr><td><code class="varname">SEMMSL</code></td><td>Maximum number of semaphores per set</td><td>at least 17</td></tr><tr><td><code class="varname">SEMMAP</code></td><td>Number of entries in semaphore map</td><td>see text</td></tr><tr><td><code class="varname">SEMVMX</code></td><td>Maximum value of semaphore</td><td>at least 1000 (The default is often 32767; do not change unless necessary)</td></tr></tbody></table></div></div><br class="table-break" /><p> + <span class="productname">PostgreSQL</span> requires a few bytes of System V shared memory + (typically 48 bytes, on 64-bit platforms) for each copy of the server. + On most modern operating systems, this amount can easily be allocated. + However, if you are running many copies of the server or you explicitly + configure the server to use large amounts of System V shared memory (see + <a class="xref" href="runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE">shared_memory_type</a> and <a class="xref" href="runtime-config-resource.html#GUC-DYNAMIC-SHARED-MEMORY-TYPE">dynamic_shared_memory_type</a>), it may be necessary to + increase <code class="varname">SHMALL</code>, which is the total amount of System V shared + memory system-wide. Note that <code class="varname">SHMALL</code> is measured in pages + rather than bytes on many systems. + </p><p> + Less likely to cause problems is the minimum size for shared + memory segments (<code class="varname">SHMMIN</code>), which should be at most + approximately 32 bytes for <span class="productname">PostgreSQL</span> (it is + usually just 1). The maximum number of segments system-wide + (<code class="varname">SHMMNI</code>) or per-process (<code class="varname">SHMSEG</code>) are unlikely + to cause a problem unless your system has them set to zero. + </p><p> + When using System V semaphores, + <span class="productname">PostgreSQL</span> uses one semaphore per allowed connection + (<a class="xref" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS">max_connections</a>), allowed autovacuum worker process + (<a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-MAX-WORKERS">autovacuum_max_workers</a>) and allowed background + process (<a class="xref" href="runtime-config-resource.html#GUC-MAX-WORKER-PROCESSES">max_worker_processes</a>), in sets of 16. + Each such set will + also contain a 17th semaphore which contains a <span class="quote">“<span class="quote">magic + number</span>”</span>, to detect collision with semaphore sets used by + other applications. The maximum number of semaphores in the system + is set by <code class="varname">SEMMNS</code>, which consequently must be at least + as high as <code class="varname">max_connections</code> plus + <code class="varname">autovacuum_max_workers</code> plus <code class="varname">max_wal_senders</code>, + plus <code class="varname">max_worker_processes</code>, plus one extra for each 16 + allowed connections plus workers (see the formula in <a class="xref" href="kernel-resources.html#SYSVIPC-PARAMETERS" title="Table 19.1. System V IPC Parameters">Table 19.1</a>). The parameter <code class="varname">SEMMNI</code> + determines the limit on the number of semaphore sets that can + exist on the system at one time. Hence this parameter must be at + least <code class="literal">ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16)</code>. + Lowering the number + of allowed connections is a temporary workaround for failures, + which are usually confusingly worded <span class="quote">“<span class="quote">No space + left on device</span>”</span>, from the function <code class="function">semget</code>. + </p><p> + In some cases it might also be necessary to increase + <code class="varname">SEMMAP</code> to be at least on the order of + <code class="varname">SEMMNS</code>. If the system has this parameter + (many do not), it defines the size of the semaphore + resource map, in which each contiguous block of available semaphores + needs an entry. When a semaphore set is freed it is either added to + an existing entry that is adjacent to the freed block or it is + registered under a new map entry. If the map is full, the freed + semaphores get lost (until reboot). Fragmentation of the semaphore + space could over time lead to fewer available semaphores than there + should be. + </p><p> + Various other settings related to <span class="quote">“<span class="quote">semaphore undo</span>”</span>, such as + <code class="varname">SEMMNU</code> and <code class="varname">SEMUME</code>, do not affect + <span class="productname">PostgreSQL</span>. + </p><p> + When using POSIX semaphores, the number of semaphores needed is the + same as for System V, that is one semaphore per allowed connection + (<a class="xref" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS">max_connections</a>), allowed autovacuum worker process + (<a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-MAX-WORKERS">autovacuum_max_workers</a>) and allowed background + process (<a class="xref" href="runtime-config-resource.html#GUC-MAX-WORKER-PROCESSES">max_worker_processes</a>). + On the platforms where this option is preferred, there is no specific + kernel limit on the number of POSIX semaphores. + </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><span class="systemitem">AIX</span> + <a id="id-1.6.6.7.3.14.1.1.2" class="indexterm"></a> + </span></dt><dd><p> + It should not be necessary to do + any special configuration for such parameters as + <code class="varname">SHMMAX</code>, as it appears this is configured to + allow all memory to be used as shared memory. That is the + sort of configuration commonly used for other databases such + as <span class="application">DB/2</span>.</p><p> It might, however, be necessary to modify the global + <code class="command">ulimit</code> information in + <code class="filename">/etc/security/limits</code>, as the default hard + limits for file sizes (<code class="varname">fsize</code>) and numbers of + files (<code class="varname">nofiles</code>) might be too low. + </p></dd><dt><span class="term"><span class="systemitem">FreeBSD</span> + <a id="id-1.6.6.7.3.14.2.1.2" class="indexterm"></a> + </span></dt><dd><p> + The default shared memory settings are usually good enough, unless + you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>. + System V semaphores are not used on this platform. + </p><p> + The default IPC settings can be changed using + the <code class="command">sysctl</code> or + <code class="command">loader</code> interfaces. The following + parameters can be set using <code class="command">sysctl</code>: +</p><pre class="screen"> +<code class="prompt">#</code> <strong class="userinput"><code>sysctl kern.ipc.shmall=32768</code></strong> +<code class="prompt">#</code> <strong class="userinput"><code>sysctl kern.ipc.shmmax=134217728</code></strong> +</pre><p> + To make these settings persist over reboots, modify + <code class="filename">/etc/sysctl.conf</code>. + </p><p> + If you have set <code class="literal">shared_memory_type</code> to + <code class="literal">sysv</code>, you might also want to configure your kernel + to lock System V shared memory into RAM and prevent it from being paged + out to swap. This can be accomplished using the <code class="command">sysctl</code> + setting <code class="literal">kern.ipc.shm_use_phys</code>. + </p><p> + If running in a FreeBSD jail, you should set its + <code class="literal">sysvshm</code> parameter to <code class="literal">new</code>, so that + it has its own separate System V shared memory namespace. + (Before FreeBSD 11.0, it was necessary to enable shared access to + the host's IPC namespace from jails, and take measures to avoid + collisions.) + </p></dd><dt><span class="term"><span class="systemitem">NetBSD</span> + <a id="id-1.6.6.7.3.14.3.1.2" class="indexterm"></a> + </span></dt><dd><p> + The default shared memory settings are usually good enough, unless + you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>. + You will usually want to increase <code class="literal">kern.ipc.semmni</code> + and <code class="literal">kern.ipc.semmns</code>, + as <span class="systemitem">NetBSD</span>'s default settings + for these are uncomfortably small. + </p><p> + IPC parameters can be adjusted using <code class="command">sysctl</code>, + for example: +</p><pre class="screen"> +<code class="prompt">#</code> <strong class="userinput"><code>sysctl -w kern.ipc.semmni=100</code></strong> +</pre><p> + To make these settings persist over reboots, modify + <code class="filename">/etc/sysctl.conf</code>. + </p><p> + If you have set <code class="literal">shared_memory_type</code> to + <code class="literal">sysv</code>, you might also want to configure your kernel + to lock System V shared memory into RAM and prevent it from being paged + out to swap. This can be accomplished using the <code class="command">sysctl</code> + setting <code class="literal">kern.ipc.shm_use_phys</code>. + </p></dd><dt><span class="term"><span class="systemitem">OpenBSD</span> + <a id="id-1.6.6.7.3.14.4.1.2" class="indexterm"></a> + </span></dt><dd><p> + The default shared memory settings are usually good enough, unless + you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>. + You will usually want to + increase <code class="literal">kern.seminfo.semmni</code> + and <code class="literal">kern.seminfo.semmns</code>, + as <span class="systemitem">OpenBSD</span>'s default settings + for these are uncomfortably small. + </p><p> + IPC parameters can be adjusted using <code class="command">sysctl</code>, + for example: +</p><pre class="screen"> +<code class="prompt">#</code> <strong class="userinput"><code>sysctl kern.seminfo.semmni=100</code></strong> +</pre><p> + To make these settings persist over reboots, modify + <code class="filename">/etc/sysctl.conf</code>. + </p></dd><dt><span class="term"><span class="systemitem">Linux</span> + <a id="id-1.6.6.7.3.14.5.1.2" class="indexterm"></a> + </span></dt><dd><p> + The default shared memory settings are usually good enough, unless + you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>, + and even then only on older kernel versions that shipped with low defaults. + System V semaphores are not used on this platform. + </p><p> + The shared memory size settings can be changed via the + <code class="command">sysctl</code> interface. For example, to allow 16 GB: +</p><pre class="screen"> +<code class="prompt">$</code> <strong class="userinput"><code>sysctl -w kernel.shmmax=17179869184</code></strong> +<code class="prompt">$</code> <strong class="userinput"><code>sysctl -w kernel.shmall=4194304</code></strong> +</pre><p> + To make these settings persist over reboots, see + <code class="filename">/etc/sysctl.conf</code>. + </p></dd><dt><span class="term"><span class="systemitem">macOS</span> + <a id="id-1.6.6.7.3.14.6.1.2" class="indexterm"></a> + </span></dt><dd><p> + The default shared memory and semaphore settings are usually good enough, unless + you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>. + </p><p> + The recommended method for configuring shared memory in macOS + is to create a file named <code class="filename">/etc/sysctl.conf</code>, + containing variable assignments such as: +</p><pre class="programlisting"> +kern.sysv.shmmax=4194304 +kern.sysv.shmmin=1 +kern.sysv.shmmni=32 +kern.sysv.shmseg=8 +kern.sysv.shmall=1024 +</pre><p> + Note that in some macOS versions, + <span class="emphasis"><em>all five</em></span> shared-memory parameters must be set in + <code class="filename">/etc/sysctl.conf</code>, else the values will be ignored. + </p><p> + <code class="varname">SHMMAX</code> can only be set to a multiple of 4096. + </p><p> + <code class="varname">SHMALL</code> is measured in 4 kB pages on this platform. + </p><p> + It is possible to change all but <code class="varname">SHMMNI</code> on the fly, using + <span class="application">sysctl</span>. But it's still best to set up your preferred + values via <code class="filename">/etc/sysctl.conf</code>, so that the values will be + kept across reboots. + </p></dd><dt><span class="term"><span class="systemitem">Solaris</span><br /></span><span class="term"><span class="systemitem">illumos</span></span></dt><dd><p> + The default shared memory and semaphore settings are usually good enough for most + <span class="productname">PostgreSQL</span> applications. Solaris defaults + to a <code class="varname">SHMMAX</code> of one-quarter of system <acronym class="acronym">RAM</acronym>. + To further adjust this setting, use a project setting associated + with the <code class="literal">postgres</code> user. For example, run the + following as <code class="literal">root</code>: +</p><pre class="programlisting"> +projadd -c "PostgreSQL DB User" -K "project.max-shm-memory=(privileged,8GB,deny)" -U postgres -G postgres user.postgres +</pre><p> + </p><p> + This command adds the <code class="literal">user.postgres</code> project and + sets the shared memory maximum for the <code class="literal">postgres</code> + user to 8GB, and takes effect the next time that user logs + in, or when you restart <span class="productname">PostgreSQL</span> (not reload). + The above assumes that <span class="productname">PostgreSQL</span> is run by + the <code class="literal">postgres</code> user in the <code class="literal">postgres</code> + group. No server reboot is required. + </p><p> + Other recommended kernel setting changes for database servers which will + have a large number of connections are: +</p><pre class="programlisting"> +project.max-shm-ids=(priv,32768,deny) +project.max-sem-ids=(priv,4096,deny) +project.max-msg-ids=(priv,4096,deny) +</pre><p> + </p><p> + Additionally, if you are running <span class="productname">PostgreSQL</span> + inside a zone, you may need to raise the zone resource usage + limits as well. See "Chapter2: Projects and Tasks" in the + <em class="citetitle">System Administrator's Guide</em> for more + information on <code class="literal">projects</code> and <code class="command">prctl</code>. + </p></dd></dl></div></div><div class="sect2" id="SYSTEMD-REMOVEIPC"><div class="titlepage"><div><div><h3 class="title">19.4.2. systemd RemoveIPC <a href="#SYSTEMD-REMOVEIPC" class="id_link">#</a></h3></div></div></div><a id="id-1.6.6.7.4.2" class="indexterm"></a><p> + If <span class="productname">systemd</span> is in use, some care must be taken + that IPC resources (including shared memory) are not prematurely + removed by the operating system. This is especially of concern when + installing PostgreSQL from source. Users of distribution packages of + PostgreSQL are less likely to be affected, as + the <code class="literal">postgres</code> user is then normally created as a system + user. + </p><p> + The setting <code class="literal">RemoveIPC</code> + in <code class="filename">logind.conf</code> controls whether IPC objects are + removed when a user fully logs out. System users are exempt. This + setting defaults to on in stock <span class="productname">systemd</span>, but + some operating system distributions default it to off. + </p><p> + A typical observed effect when this setting is on is that shared memory + objects used for parallel query execution are removed at apparently random + times, leading to errors and warnings while attempting to open and remove + them, like +</p><pre class="screen"> +WARNING: could not remove shared memory segment "/PostgreSQL.1450751626": No such file or directory +</pre><p> + Different types of IPC objects (shared memory vs. semaphores, System V + vs. POSIX) are treated slightly differently + by <span class="productname">systemd</span>, so one might observe that some IPC + resources are not removed in the same way as others. But it is not + advisable to rely on these subtle differences. + </p><p> + A <span class="quote">“<span class="quote">user logging out</span>”</span> might happen as part of a maintenance + job or manually when an administrator logs in as + the <code class="literal">postgres</code> user or something similar, so it is hard + to prevent in general. + </p><p> + What is a <span class="quote">“<span class="quote">system user</span>”</span> is determined + at <span class="productname">systemd</span> compile time from + the <code class="symbol">SYS_UID_MAX</code> setting + in <code class="filename">/etc/login.defs</code>. + </p><p> + Packaging and deployment scripts should be careful to create + the <code class="literal">postgres</code> user as a system user by + using <code class="literal">useradd -r</code>, <code class="literal">adduser --system</code>, + or equivalent. + </p><p> + Alternatively, if the user account was created incorrectly or cannot be + changed, it is recommended to set +</p><pre class="programlisting"> +RemoveIPC=no +</pre><p> + in <code class="filename">/etc/systemd/logind.conf</code> or another appropriate + configuration file. + </p><div class="caution"><h3 class="title">Caution</h3><p> + At least one of these two things has to be ensured, or the PostgreSQL + server will be very unreliable. + </p></div></div><div class="sect2" id="KERNEL-RESOURCES-LIMITS"><div class="titlepage"><div><div><h3 class="title">19.4.3. Resource Limits <a href="#KERNEL-RESOURCES-LIMITS" class="id_link">#</a></h3></div></div></div><p> + Unix-like operating systems enforce various kinds of resource limits + that might interfere with the operation of your + <span class="productname">PostgreSQL</span> server. Of particular + importance are limits on the number of processes per user, the + number of open files per process, and the amount of memory available + to each process. Each of these have a <span class="quote">“<span class="quote">hard</span>”</span> and a + <span class="quote">“<span class="quote">soft</span>”</span> limit. The soft limit is what actually counts + but it can be changed by the user up to the hard limit. The hard + limit can only be changed by the root user. The system call + <code class="function">setrlimit</code> is responsible for setting these + parameters. The shell's built-in command <code class="command">ulimit</code> + (Bourne shells) or <code class="command">limit</code> (<span class="application">csh</span>) is + used to control the resource limits from the command line. On + BSD-derived systems the file <code class="filename">/etc/login.conf</code> + controls the various resource limits set during login. See the + operating system documentation for details. The relevant + parameters are <code class="varname">maxproc</code>, + <code class="varname">openfiles</code>, and <code class="varname">datasize</code>. For + example: +</p><pre class="programlisting"> +default:\ +... + :datasize-cur=256M:\ + :maxproc-cur=256:\ + :openfiles-cur=256:\ +... +</pre><p> + (<code class="literal">-cur</code> is the soft limit. Append + <code class="literal">-max</code> to set the hard limit.) + </p><p> + Kernels can also have system-wide limits on some resources. + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + On <span class="productname">Linux</span> the kernel parameter + <code class="varname">fs.file-max</code> determines the maximum number of open + files that the kernel will support. It can be changed with + <code class="literal">sysctl -w fs.file-max=<em class="replaceable"><code>N</code></em></code>. + To make the setting persist across reboots, add an assignment + in <code class="filename">/etc/sysctl.conf</code>. + The maximum limit of files per process is fixed at the time the + kernel is compiled; see + <code class="filename">/usr/src/linux/Documentation/proc.txt</code> for + more information. + </p></li></ul></div><p> + </p><p> + The <span class="productname">PostgreSQL</span> server uses one process + per connection so you should provide for at least as many processes + as allowed connections, in addition to what you need for the rest + of your system. This is usually not a problem but if you run + several servers on one machine things might get tight. + </p><p> + The factory default limit on open files is often set to + <span class="quote">“<span class="quote">socially friendly</span>”</span> values that allow many users to + coexist on a machine without using an inappropriate fraction of + the system resources. If you run many servers on a machine this + is perhaps what you want, but on dedicated servers you might want to + raise this limit. + </p><p> + On the other side of the coin, some systems allow individual + processes to open large numbers of files; if more than a few + processes do so then the system-wide limit can easily be exceeded. + If you find this happening, and you do not want to alter the + system-wide limit, you can set <span class="productname">PostgreSQL</span>'s <a class="xref" href="runtime-config-resource.html#GUC-MAX-FILES-PER-PROCESS">max_files_per_process</a> configuration parameter to + limit the consumption of open files. + </p><p> + Another kernel limit that may be of concern when supporting large + numbers of client connections is the maximum socket connection queue + length. If more than that many connection requests arrive within a very + short period, some may get rejected before the <span class="productname">PostgreSQL</span> server can service + the requests, with those clients receiving unhelpful connection failure + errors such as <span class="quote">“<span class="quote">Resource temporarily unavailable</span>”</span> or + <span class="quote">“<span class="quote">Connection refused</span>”</span>. The default queue length limit is 128 + on many platforms. To raise it, adjust the appropriate kernel parameter + via <span class="application">sysctl</span>, then restart the <span class="productname">PostgreSQL</span> server. + The parameter is variously named <code class="varname">net.core.somaxconn</code> + on Linux, <code class="varname">kern.ipc.soacceptqueue</code> on newer FreeBSD, + and <code class="varname">kern.ipc.somaxconn</code> on macOS and other BSD + variants. + </p></div><div class="sect2" id="LINUX-MEMORY-OVERCOMMIT"><div class="titlepage"><div><div><h3 class="title">19.4.4. Linux Memory Overcommit <a href="#LINUX-MEMORY-OVERCOMMIT" class="id_link">#</a></h3></div></div></div><a id="id-1.6.6.7.6.2" class="indexterm"></a><a id="id-1.6.6.7.6.3" class="indexterm"></a><a id="id-1.6.6.7.6.4" class="indexterm"></a><p> + The default virtual memory behavior on Linux is not + optimal for <span class="productname">PostgreSQL</span>. Because of the + way that the kernel implements memory overcommit, the kernel might + terminate the <span class="productname">PostgreSQL</span> postmaster (the + supervisor server process) if the memory demands of either + <span class="productname">PostgreSQL</span> or another process cause the + system to run out of virtual memory. + </p><p> + If this happens, you will see a kernel message that looks like + this (consult your system documentation and configuration on where + to look for such a message): +</p><pre class="programlisting"> +Out of Memory: Killed process 12345 (postgres). +</pre><p> + This indicates that the <code class="filename">postgres</code> process + has been terminated due to memory pressure. + Although existing database connections will continue to function + normally, no new connections will be accepted. To recover, + <span class="productname">PostgreSQL</span> will need to be restarted. + </p><p> + One way to avoid this problem is to run + <span class="productname">PostgreSQL</span> on a machine where you can + be sure that other processes will not run the machine out of + memory. If memory is tight, increasing the swap space of the + operating system can help avoid the problem, because the + out-of-memory (OOM) killer is invoked only when physical memory and + swap space are exhausted. + </p><p> + If <span class="productname">PostgreSQL</span> itself is the cause of the + system running out of memory, you can avoid the problem by changing + your configuration. In some cases, it may help to lower memory-related + configuration parameters, particularly + <a class="link" href="runtime-config-resource.html#GUC-SHARED-BUFFERS"><code class="varname">shared_buffers</code></a>, + <a class="link" href="runtime-config-resource.html#GUC-WORK-MEM"><code class="varname">work_mem</code></a>, and + <a class="link" href="runtime-config-resource.html#GUC-HASH-MEM-MULTIPLIER"><code class="varname">hash_mem_multiplier</code></a>. + In other cases, the problem may be caused by allowing too many + connections to the database server itself. In many cases, it may + be better to reduce + <a class="link" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS"><code class="varname">max_connections</code></a> + and instead make use of external connection-pooling software. + </p><p> + It is possible to modify the + kernel's behavior so that it will not <span class="quote">“<span class="quote">overcommit</span>”</span> memory. + Although this setting will not prevent the <a class="ulink" href="https://lwn.net/Articles/104179/" target="_top">OOM killer</a> from being invoked + altogether, it will lower the chances significantly and will therefore + lead to more robust system behavior. This is done by selecting strict + overcommit mode via <code class="command">sysctl</code>: +</p><pre class="programlisting"> +sysctl -w vm.overcommit_memory=2 +</pre><p> + or placing an equivalent entry in <code class="filename">/etc/sysctl.conf</code>. + You might also wish to modify the related setting + <code class="varname">vm.overcommit_ratio</code>. For details see the kernel documentation + file <a class="ulink" href="https://www.kernel.org/doc/Documentation/vm/overcommit-accounting" target="_top">https://www.kernel.org/doc/Documentation/vm/overcommit-accounting</a>. + </p><p> + Another approach, which can be used with or without altering + <code class="varname">vm.overcommit_memory</code>, is to set the process-specific + <em class="firstterm">OOM score adjustment</em> value for the postmaster process to + <code class="literal">-1000</code>, thereby guaranteeing it will not be targeted by the OOM + killer. The simplest way to do this is to execute +</p><pre class="programlisting"> +echo -1000 > /proc/self/oom_score_adj +</pre><p> + in the <span class="productname">PostgreSQL</span> startup script just before + invoking <code class="filename">postgres</code>. + Note that this action must be done as root, or it will have no effect; + so a root-owned startup script is the easiest place to do it. If you + do this, you should also set these environment variables in the startup + script before invoking <code class="filename">postgres</code>: +</p><pre class="programlisting"> +export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj +export PG_OOM_ADJUST_VALUE=0 +</pre><p> + These settings will cause postmaster child processes to run with the + normal OOM score adjustment of zero, so that the OOM killer can still + target them at need. You could use some other value for + <code class="envar">PG_OOM_ADJUST_VALUE</code> if you want the child processes to run + with some other OOM score adjustment. (<code class="envar">PG_OOM_ADJUST_VALUE</code> + can also be omitted, in which case it defaults to zero.) If you do not + set <code class="envar">PG_OOM_ADJUST_FILE</code>, the child processes will run with the + same OOM score adjustment as the postmaster, which is unwise since the + whole point is to ensure that the postmaster has a preferential setting. + </p></div><div class="sect2" id="LINUX-HUGE-PAGES"><div class="titlepage"><div><div><h3 class="title">19.4.5. Linux Huge Pages <a href="#LINUX-HUGE-PAGES" class="id_link">#</a></h3></div></div></div><p> + Using huge pages reduces overhead when using large contiguous chunks of + memory, as <span class="productname">PostgreSQL</span> does, particularly when + using large values of <a class="xref" href="runtime-config-resource.html#GUC-SHARED-BUFFERS">shared_buffers</a>. To use this + feature in <span class="productname">PostgreSQL</span> you need a kernel + with <code class="varname">CONFIG_HUGETLBFS=y</code> and + <code class="varname">CONFIG_HUGETLB_PAGE=y</code>. You will also have to configure + the operating system to provide enough huge pages of the desired size. + To determine the number of huge pages needed, use the + <code class="command">postgres</code> command to see the value of + <a class="xref" href="runtime-config-preset.html#GUC-SHARED-MEMORY-SIZE-IN-HUGE-PAGES">shared_memory_size_in_huge_pages</a>. Note that the + server must be shut down to view this runtime-computed parameter. + This might look like: +</p><pre class="programlisting"> +$ <strong class="userinput"><code>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</code></strong> +3170 +$ <strong class="userinput"><code>grep ^Hugepagesize /proc/meminfo</code></strong> +Hugepagesize: 2048 kB +$ <strong class="userinput"><code>ls /sys/kernel/mm/hugepages</code></strong> +hugepages-1048576kB hugepages-2048kB +</pre><p> + + In this example the default is 2MB, but you can also explicitly request + either 2MB or 1GB with <a class="xref" href="runtime-config-resource.html#GUC-HUGE-PAGE-SIZE">huge_page_size</a> to adapt + the number of pages calculated by + <code class="varname">shared_memory_size_in_huge_pages</code>. + + While we need at least <code class="literal">3170</code> huge pages in this example, + a larger setting would be appropriate if other programs on the machine + also need huge pages. + We can set this with: +</p><pre class="programlisting"> +# <strong class="userinput"><code>sysctl -w vm.nr_hugepages=3170</code></strong> +</pre><p> + Don't forget to add this setting to <code class="filename">/etc/sysctl.conf</code> + so that it is reapplied after reboots. For non-default huge page sizes, + we can instead use: +</p><pre class="programlisting"> +# <strong class="userinput"><code>echo 3170 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</code></strong> +</pre><p> + It is also possible to provide these settings at boot time using + kernel parameters such as <code class="literal">hugepagesz=2M hugepages=3170</code>. + </p><p> + Sometimes the kernel is not able to allocate the desired number of huge + pages immediately due to fragmentation, so it might be necessary + to repeat the command or to reboot. (Immediately after a reboot, most of + the machine's memory should be available to convert into huge pages.) + To verify the huge page allocation situation for a given size, use: +</p><pre class="programlisting"> +$ <strong class="userinput"><code>cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</code></strong> +</pre><p> + </p><p> + It may also be necessary to give the database server's operating system + user permission to use huge pages by setting + <code class="varname">vm.hugetlb_shm_group</code> via <span class="application">sysctl</span>, and/or + give permission to lock memory with <code class="command">ulimit -l</code>. + </p><p> + The default behavior for huge pages in + <span class="productname">PostgreSQL</span> is to use them when possible, with + the system's default huge page size, and + to fall back to normal pages on failure. To enforce the use of huge + pages, you can set <a class="xref" href="runtime-config-resource.html#GUC-HUGE-PAGES">huge_pages</a> + to <code class="literal">on</code> in <code class="filename">postgresql.conf</code>. + Note that with this setting <span class="productname">PostgreSQL</span> will fail to + start if not enough huge pages are available. + </p><p> + For a detailed description of the <span class="productname">Linux</span> huge + pages feature have a look + at <a class="ulink" href="https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt" target="_top">https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt</a>. + </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="server-start.html" title="19.3. Starting the Database Server">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="runtime.html" title="Chapter 19. Server Setup and Operation">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="server-shutdown.html" title="19.5. Shutting Down the Server">Next</a></td></tr><tr><td width="40%" align="left" valign="top">19.3. Starting the Database Server </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 19.5. Shutting Down the Server</td></tr></table></div></body></html>
\ No newline at end of file |