summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/kernel-resources.html
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/src/sgml/html/kernel-resources.html544
1 files changed, 544 insertions, 0 deletions
diff --git a/doc/src/sgml/html/kernel-resources.html b/doc/src/sgml/html/kernel-resources.html
new file mode 100644
index 0000000..fc2d261
--- /dev/null
+++ b/doc/src/sgml/html/kernel-resources.html
@@ -0,0 +1,544 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>19.4. Managing Kernel Resources</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="server-start.html" title="19.3. Starting the Database Server" /><link rel="next" href="server-shutdown.html" title="19.5. Shutting Down the Server" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">19.4. Managing Kernel Resources</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="server-start.html" title="19.3. Starting the Database Server">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="runtime.html" title="Chapter 19. Server Setup and Operation">Up</a></td><th width="60%" align="center">Chapter 19. Server Setup and Operation</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="server-shutdown.html" title="19.5. Shutting Down the Server">Next</a></td></tr></table><hr /></div><div class="sect1" id="KERNEL-RESOURCES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">19.4. Managing Kernel Resources <a href="#KERNEL-RESOURCES" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="kernel-resources.html#SYSVIPC">19.4.1. Shared Memory and Semaphores</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#SYSTEMD-REMOVEIPC">19.4.2. systemd RemoveIPC</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#KERNEL-RESOURCES-LIMITS">19.4.3. Resource Limits</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#LINUX-MEMORY-OVERCOMMIT">19.4.4. Linux Memory Overcommit</a></span></dt><dt><span class="sect2"><a href="kernel-resources.html#LINUX-HUGE-PAGES">19.4.5. Linux Huge Pages</a></span></dt></dl></div><p>
+ <span class="productname">PostgreSQL</span> can sometimes exhaust various operating system
+ resource limits, especially when multiple copies of the server are running
+ on the same system, or in very large installations. This section explains
+ the kernel resources used by <span class="productname">PostgreSQL</span> and the steps you
+ can take to resolve problems related to kernel resource consumption.
+ </p><div class="sect2" id="SYSVIPC"><div class="titlepage"><div><div><h3 class="title">19.4.1. Shared Memory and Semaphores <a href="#SYSVIPC" class="id_link">#</a></h3></div></div></div><a id="id-1.6.6.7.3.2" class="indexterm"></a><a id="id-1.6.6.7.3.3" class="indexterm"></a><p>
+ <span class="productname">PostgreSQL</span> requires the operating system to provide
+ inter-process communication (<acronym class="acronym">IPC</acronym>) features, specifically
+ shared memory and semaphores. Unix-derived systems typically provide
+ <span class="quote">“<span class="quote"><span class="systemitem">System V</span></span>”</span> <acronym class="acronym">IPC</acronym>,
+ <span class="quote">“<span class="quote"><span class="systemitem">POSIX</span></span>”</span> <acronym class="acronym">IPC</acronym>, or both.
+ <span class="systemitem">Windows</span> has its own implementation of
+ these features and is not discussed here.
+ </p><p>
+ By default, <span class="productname">PostgreSQL</span> allocates
+ a very small amount of System V shared memory, as well as a much larger
+ amount of anonymous <code class="function">mmap</code> shared memory.
+ Alternatively, a single large System V shared memory region can be used
+ (see <a class="xref" href="runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE">shared_memory_type</a>).
+
+ In addition a significant number of semaphores, which can be either
+ System V or POSIX style, are created at server startup. Currently,
+ POSIX semaphores are used on Linux and FreeBSD systems while other
+ platforms use System V semaphores.
+ </p><p>
+ System V <acronym class="acronym">IPC</acronym> features are typically constrained by
+ system-wide allocation limits.
+ When <span class="productname">PostgreSQL</span> exceeds one of these limits,
+ the server will refuse to start and
+ should leave an instructive error message describing the problem
+ and what to do about it. (See also <a class="xref" href="server-start.html#SERVER-START-FAILURES" title="19.3.1. Server Start-up Failures">Section 19.3.1</a>.) The relevant kernel
+ parameters are named consistently across different systems; <a class="xref" href="kernel-resources.html#SYSVIPC-PARAMETERS" title="Table 19.1. System V IPC Parameters">Table 19.1</a> gives an overview. The methods to set
+ them, however, vary. Suggestions for some platforms are given below.
+ </p><div class="table" id="SYSVIPC-PARAMETERS"><p class="title"><strong>Table 19.1. <span class="systemitem">System V</span> <acronym class="acronym">IPC</acronym> Parameters</strong></p><div class="table-contents"><table class="table" summary="System V IPC Parameters" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /></colgroup><thead><tr><th>Name</th><th>Description</th><th>Values needed to run one <span class="productname">PostgreSQL</span> instance</th></tr></thead><tbody><tr><td><code class="varname">SHMMAX</code></td><td>Maximum size of shared memory segment (bytes)</td><td>at least 1kB, but the default is usually much higher</td></tr><tr><td><code class="varname">SHMMIN</code></td><td>Minimum size of shared memory segment (bytes)</td><td>1</td></tr><tr><td><code class="varname">SHMALL</code></td><td>Total amount of shared memory available (bytes or pages)</td><td>same as <code class="varname">SHMMAX</code> if bytes,
+ or <code class="literal">ceil(SHMMAX/PAGE_SIZE)</code> if pages,
+ plus room for other applications</td></tr><tr><td><code class="varname">SHMSEG</code></td><td>Maximum number of shared memory segments per process</td><td>only 1 segment is needed, but the default is much higher</td></tr><tr><td><code class="varname">SHMMNI</code></td><td>Maximum number of shared memory segments system-wide</td><td>like <code class="varname">SHMSEG</code> plus room for other applications</td></tr><tr><td><code class="varname">SEMMNI</code></td><td>Maximum number of semaphore identifiers (i.e., sets)</td><td>at least <code class="literal">ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16)</code> plus room for other applications</td></tr><tr><td><code class="varname">SEMMNS</code></td><td>Maximum number of semaphores system-wide</td><td><code class="literal">ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) * 17</code> plus room for other applications</td></tr><tr><td><code class="varname">SEMMSL</code></td><td>Maximum number of semaphores per set</td><td>at least 17</td></tr><tr><td><code class="varname">SEMMAP</code></td><td>Number of entries in semaphore map</td><td>see text</td></tr><tr><td><code class="varname">SEMVMX</code></td><td>Maximum value of semaphore</td><td>at least 1000 (The default is often 32767; do not change unless necessary)</td></tr></tbody></table></div></div><br class="table-break" /><p>
+ <span class="productname">PostgreSQL</span> requires a few bytes of System V shared memory
+ (typically 48 bytes, on 64-bit platforms) for each copy of the server.
+ On most modern operating systems, this amount can easily be allocated.
+ However, if you are running many copies of the server or you explicitly
+ configure the server to use large amounts of System V shared memory (see
+ <a class="xref" href="runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE">shared_memory_type</a> and <a class="xref" href="runtime-config-resource.html#GUC-DYNAMIC-SHARED-MEMORY-TYPE">dynamic_shared_memory_type</a>), it may be necessary to
+ increase <code class="varname">SHMALL</code>, which is the total amount of System V shared
+ memory system-wide. Note that <code class="varname">SHMALL</code> is measured in pages
+ rather than bytes on many systems.
+ </p><p>
+ Less likely to cause problems is the minimum size for shared
+ memory segments (<code class="varname">SHMMIN</code>), which should be at most
+ approximately 32 bytes for <span class="productname">PostgreSQL</span> (it is
+ usually just 1). The maximum number of segments system-wide
+ (<code class="varname">SHMMNI</code>) or per-process (<code class="varname">SHMSEG</code>) are unlikely
+ to cause a problem unless your system has them set to zero.
+ </p><p>
+ When using System V semaphores,
+ <span class="productname">PostgreSQL</span> uses one semaphore per allowed connection
+ (<a class="xref" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS">max_connections</a>), allowed autovacuum worker process
+ (<a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-MAX-WORKERS">autovacuum_max_workers</a>) and allowed background
+ process (<a class="xref" href="runtime-config-resource.html#GUC-MAX-WORKER-PROCESSES">max_worker_processes</a>), in sets of 16.
+ Each such set will
+ also contain a 17th semaphore which contains a <span class="quote">“<span class="quote">magic
+ number</span>”</span>, to detect collision with semaphore sets used by
+ other applications. The maximum number of semaphores in the system
+ is set by <code class="varname">SEMMNS</code>, which consequently must be at least
+ as high as <code class="varname">max_connections</code> plus
+ <code class="varname">autovacuum_max_workers</code> plus <code class="varname">max_wal_senders</code>,
+ plus <code class="varname">max_worker_processes</code>, plus one extra for each 16
+ allowed connections plus workers (see the formula in <a class="xref" href="kernel-resources.html#SYSVIPC-PARAMETERS" title="Table 19.1. System V IPC Parameters">Table 19.1</a>). The parameter <code class="varname">SEMMNI</code>
+ determines the limit on the number of semaphore sets that can
+ exist on the system at one time. Hence this parameter must be at
+ least <code class="literal">ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16)</code>.
+ Lowering the number
+ of allowed connections is a temporary workaround for failures,
+ which are usually confusingly worded <span class="quote">“<span class="quote">No space
+ left on device</span>”</span>, from the function <code class="function">semget</code>.
+ </p><p>
+ In some cases it might also be necessary to increase
+ <code class="varname">SEMMAP</code> to be at least on the order of
+ <code class="varname">SEMMNS</code>. If the system has this parameter
+ (many do not), it defines the size of the semaphore
+ resource map, in which each contiguous block of available semaphores
+ needs an entry. When a semaphore set is freed it is either added to
+ an existing entry that is adjacent to the freed block or it is
+ registered under a new map entry. If the map is full, the freed
+ semaphores get lost (until reboot). Fragmentation of the semaphore
+ space could over time lead to fewer available semaphores than there
+ should be.
+ </p><p>
+ Various other settings related to <span class="quote">“<span class="quote">semaphore undo</span>”</span>, such as
+ <code class="varname">SEMMNU</code> and <code class="varname">SEMUME</code>, do not affect
+ <span class="productname">PostgreSQL</span>.
+ </p><p>
+ When using POSIX semaphores, the number of semaphores needed is the
+ same as for System V, that is one semaphore per allowed connection
+ (<a class="xref" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS">max_connections</a>), allowed autovacuum worker process
+ (<a class="xref" href="runtime-config-autovacuum.html#GUC-AUTOVACUUM-MAX-WORKERS">autovacuum_max_workers</a>) and allowed background
+ process (<a class="xref" href="runtime-config-resource.html#GUC-MAX-WORKER-PROCESSES">max_worker_processes</a>).
+ On the platforms where this option is preferred, there is no specific
+ kernel limit on the number of POSIX semaphores.
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><span class="systemitem">AIX</span>
+ <a id="id-1.6.6.7.3.14.1.1.2" class="indexterm"></a>
+ </span></dt><dd><p>
+ It should not be necessary to do
+ any special configuration for such parameters as
+ <code class="varname">SHMMAX</code>, as it appears this is configured to
+ allow all memory to be used as shared memory. That is the
+ sort of configuration commonly used for other databases such
+ as <span class="application">DB/2</span>.</p><p> It might, however, be necessary to modify the global
+ <code class="command">ulimit</code> information in
+ <code class="filename">/etc/security/limits</code>, as the default hard
+ limits for file sizes (<code class="varname">fsize</code>) and numbers of
+ files (<code class="varname">nofiles</code>) might be too low.
+ </p></dd><dt><span class="term"><span class="systemitem">FreeBSD</span>
+ <a id="id-1.6.6.7.3.14.2.1.2" class="indexterm"></a>
+ </span></dt><dd><p>
+ The default shared memory settings are usually good enough, unless
+ you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>.
+ System V semaphores are not used on this platform.
+ </p><p>
+ The default IPC settings can be changed using
+ the <code class="command">sysctl</code> or
+ <code class="command">loader</code> interfaces. The following
+ parameters can be set using <code class="command">sysctl</code>:
+</p><pre class="screen">
+<code class="prompt">#</code> <strong class="userinput"><code>sysctl kern.ipc.shmall=32768</code></strong>
+<code class="prompt">#</code> <strong class="userinput"><code>sysctl kern.ipc.shmmax=134217728</code></strong>
+</pre><p>
+ To make these settings persist over reboots, modify
+ <code class="filename">/etc/sysctl.conf</code>.
+ </p><p>
+ If you have set <code class="literal">shared_memory_type</code> to
+ <code class="literal">sysv</code>, you might also want to configure your kernel
+ to lock System V shared memory into RAM and prevent it from being paged
+ out to swap. This can be accomplished using the <code class="command">sysctl</code>
+ setting <code class="literal">kern.ipc.shm_use_phys</code>.
+ </p><p>
+ If running in a FreeBSD jail, you should set its
+ <code class="literal">sysvshm</code> parameter to <code class="literal">new</code>, so that
+ it has its own separate System V shared memory namespace.
+ (Before FreeBSD 11.0, it was necessary to enable shared access to
+ the host's IPC namespace from jails, and take measures to avoid
+ collisions.)
+ </p></dd><dt><span class="term"><span class="systemitem">NetBSD</span>
+ <a id="id-1.6.6.7.3.14.3.1.2" class="indexterm"></a>
+ </span></dt><dd><p>
+ The default shared memory settings are usually good enough, unless
+ you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>.
+ You will usually want to increase <code class="literal">kern.ipc.semmni</code>
+ and <code class="literal">kern.ipc.semmns</code>,
+ as <span class="systemitem">NetBSD</span>'s default settings
+ for these are uncomfortably small.
+ </p><p>
+ IPC parameters can be adjusted using <code class="command">sysctl</code>,
+ for example:
+</p><pre class="screen">
+<code class="prompt">#</code> <strong class="userinput"><code>sysctl -w kern.ipc.semmni=100</code></strong>
+</pre><p>
+ To make these settings persist over reboots, modify
+ <code class="filename">/etc/sysctl.conf</code>.
+ </p><p>
+ If you have set <code class="literal">shared_memory_type</code> to
+ <code class="literal">sysv</code>, you might also want to configure your kernel
+ to lock System V shared memory into RAM and prevent it from being paged
+ out to swap. This can be accomplished using the <code class="command">sysctl</code>
+ setting <code class="literal">kern.ipc.shm_use_phys</code>.
+ </p></dd><dt><span class="term"><span class="systemitem">OpenBSD</span>
+ <a id="id-1.6.6.7.3.14.4.1.2" class="indexterm"></a>
+ </span></dt><dd><p>
+ The default shared memory settings are usually good enough, unless
+ you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>.
+ You will usually want to
+ increase <code class="literal">kern.seminfo.semmni</code>
+ and <code class="literal">kern.seminfo.semmns</code>,
+ as <span class="systemitem">OpenBSD</span>'s default settings
+ for these are uncomfortably small.
+ </p><p>
+ IPC parameters can be adjusted using <code class="command">sysctl</code>,
+ for example:
+</p><pre class="screen">
+<code class="prompt">#</code> <strong class="userinput"><code>sysctl kern.seminfo.semmni=100</code></strong>
+</pre><p>
+ To make these settings persist over reboots, modify
+ <code class="filename">/etc/sysctl.conf</code>.
+ </p></dd><dt><span class="term"><span class="systemitem">Linux</span>
+ <a id="id-1.6.6.7.3.14.5.1.2" class="indexterm"></a>
+ </span></dt><dd><p>
+ The default shared memory settings are usually good enough, unless
+ you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>,
+ and even then only on older kernel versions that shipped with low defaults.
+ System V semaphores are not used on this platform.
+ </p><p>
+ The shared memory size settings can be changed via the
+ <code class="command">sysctl</code> interface. For example, to allow 16 GB:
+</p><pre class="screen">
+<code class="prompt">$</code> <strong class="userinput"><code>sysctl -w kernel.shmmax=17179869184</code></strong>
+<code class="prompt">$</code> <strong class="userinput"><code>sysctl -w kernel.shmall=4194304</code></strong>
+</pre><p>
+ To make these settings persist over reboots, see
+ <code class="filename">/etc/sysctl.conf</code>.
+ </p></dd><dt><span class="term"><span class="systemitem">macOS</span>
+ <a id="id-1.6.6.7.3.14.6.1.2" class="indexterm"></a>
+ </span></dt><dd><p>
+ The default shared memory and semaphore settings are usually good enough, unless
+ you have set <code class="literal">shared_memory_type</code> to <code class="literal">sysv</code>.
+ </p><p>
+ The recommended method for configuring shared memory in macOS
+ is to create a file named <code class="filename">/etc/sysctl.conf</code>,
+ containing variable assignments such as:
+</p><pre class="programlisting">
+kern.sysv.shmmax=4194304
+kern.sysv.shmmin=1
+kern.sysv.shmmni=32
+kern.sysv.shmseg=8
+kern.sysv.shmall=1024
+</pre><p>
+ Note that in some macOS versions,
+ <span class="emphasis"><em>all five</em></span> shared-memory parameters must be set in
+ <code class="filename">/etc/sysctl.conf</code>, else the values will be ignored.
+ </p><p>
+ <code class="varname">SHMMAX</code> can only be set to a multiple of 4096.
+ </p><p>
+ <code class="varname">SHMALL</code> is measured in 4 kB pages on this platform.
+ </p><p>
+ It is possible to change all but <code class="varname">SHMMNI</code> on the fly, using
+ <span class="application">sysctl</span>. But it's still best to set up your preferred
+ values via <code class="filename">/etc/sysctl.conf</code>, so that the values will be
+ kept across reboots.
+ </p></dd><dt><span class="term"><span class="systemitem">Solaris</span><br /></span><span class="term"><span class="systemitem">illumos</span></span></dt><dd><p>
+ The default shared memory and semaphore settings are usually good enough for most
+ <span class="productname">PostgreSQL</span> applications. Solaris defaults
+ to a <code class="varname">SHMMAX</code> of one-quarter of system <acronym class="acronym">RAM</acronym>.
+ To further adjust this setting, use a project setting associated
+ with the <code class="literal">postgres</code> user. For example, run the
+ following as <code class="literal">root</code>:
+</p><pre class="programlisting">
+projadd -c "PostgreSQL DB User" -K "project.max-shm-memory=(privileged,8GB,deny)" -U postgres -G postgres user.postgres
+</pre><p>
+ </p><p>
+ This command adds the <code class="literal">user.postgres</code> project and
+ sets the shared memory maximum for the <code class="literal">postgres</code>
+ user to 8GB, and takes effect the next time that user logs
+ in, or when you restart <span class="productname">PostgreSQL</span> (not reload).
+ The above assumes that <span class="productname">PostgreSQL</span> is run by
+ the <code class="literal">postgres</code> user in the <code class="literal">postgres</code>
+ group. No server reboot is required.
+ </p><p>
+ Other recommended kernel setting changes for database servers which will
+ have a large number of connections are:
+</p><pre class="programlisting">
+project.max-shm-ids=(priv,32768,deny)
+project.max-sem-ids=(priv,4096,deny)
+project.max-msg-ids=(priv,4096,deny)
+</pre><p>
+ </p><p>
+ Additionally, if you are running <span class="productname">PostgreSQL</span>
+ inside a zone, you may need to raise the zone resource usage
+ limits as well. See "Chapter2: Projects and Tasks" in the
+ <em class="citetitle">System Administrator's Guide</em> for more
+ information on <code class="literal">projects</code> and <code class="command">prctl</code>.
+ </p></dd></dl></div></div><div class="sect2" id="SYSTEMD-REMOVEIPC"><div class="titlepage"><div><div><h3 class="title">19.4.2. systemd RemoveIPC <a href="#SYSTEMD-REMOVEIPC" class="id_link">#</a></h3></div></div></div><a id="id-1.6.6.7.4.2" class="indexterm"></a><p>
+ If <span class="productname">systemd</span> is in use, some care must be taken
+ that IPC resources (including shared memory) are not prematurely
+ removed by the operating system. This is especially of concern when
+ installing PostgreSQL from source. Users of distribution packages of
+ PostgreSQL are less likely to be affected, as
+ the <code class="literal">postgres</code> user is then normally created as a system
+ user.
+ </p><p>
+ The setting <code class="literal">RemoveIPC</code>
+ in <code class="filename">logind.conf</code> controls whether IPC objects are
+ removed when a user fully logs out. System users are exempt. This
+ setting defaults to on in stock <span class="productname">systemd</span>, but
+ some operating system distributions default it to off.
+ </p><p>
+ A typical observed effect when this setting is on is that shared memory
+ objects used for parallel query execution are removed at apparently random
+ times, leading to errors and warnings while attempting to open and remove
+ them, like
+</p><pre class="screen">
+WARNING: could not remove shared memory segment "/PostgreSQL.1450751626": No such file or directory
+</pre><p>
+ Different types of IPC objects (shared memory vs. semaphores, System V
+ vs. POSIX) are treated slightly differently
+ by <span class="productname">systemd</span>, so one might observe that some IPC
+ resources are not removed in the same way as others. But it is not
+ advisable to rely on these subtle differences.
+ </p><p>
+ A <span class="quote">“<span class="quote">user logging out</span>”</span> might happen as part of a maintenance
+ job or manually when an administrator logs in as
+ the <code class="literal">postgres</code> user or something similar, so it is hard
+ to prevent in general.
+ </p><p>
+ What is a <span class="quote">“<span class="quote">system user</span>”</span> is determined
+ at <span class="productname">systemd</span> compile time from
+ the <code class="symbol">SYS_UID_MAX</code> setting
+ in <code class="filename">/etc/login.defs</code>.
+ </p><p>
+ Packaging and deployment scripts should be careful to create
+ the <code class="literal">postgres</code> user as a system user by
+ using <code class="literal">useradd -r</code>, <code class="literal">adduser --system</code>,
+ or equivalent.
+ </p><p>
+ Alternatively, if the user account was created incorrectly or cannot be
+ changed, it is recommended to set
+</p><pre class="programlisting">
+RemoveIPC=no
+</pre><p>
+ in <code class="filename">/etc/systemd/logind.conf</code> or another appropriate
+ configuration file.
+ </p><div class="caution"><h3 class="title">Caution</h3><p>
+ At least one of these two things has to be ensured, or the PostgreSQL
+ server will be very unreliable.
+ </p></div></div><div class="sect2" id="KERNEL-RESOURCES-LIMITS"><div class="titlepage"><div><div><h3 class="title">19.4.3. Resource Limits <a href="#KERNEL-RESOURCES-LIMITS" class="id_link">#</a></h3></div></div></div><p>
+ Unix-like operating systems enforce various kinds of resource limits
+ that might interfere with the operation of your
+ <span class="productname">PostgreSQL</span> server. Of particular
+ importance are limits on the number of processes per user, the
+ number of open files per process, and the amount of memory available
+ to each process. Each of these have a <span class="quote">“<span class="quote">hard</span>”</span> and a
+ <span class="quote">“<span class="quote">soft</span>”</span> limit. The soft limit is what actually counts
+ but it can be changed by the user up to the hard limit. The hard
+ limit can only be changed by the root user. The system call
+ <code class="function">setrlimit</code> is responsible for setting these
+ parameters. The shell's built-in command <code class="command">ulimit</code>
+ (Bourne shells) or <code class="command">limit</code> (<span class="application">csh</span>) is
+ used to control the resource limits from the command line. On
+ BSD-derived systems the file <code class="filename">/etc/login.conf</code>
+ controls the various resource limits set during login. See the
+ operating system documentation for details. The relevant
+ parameters are <code class="varname">maxproc</code>,
+ <code class="varname">openfiles</code>, and <code class="varname">datasize</code>. For
+ example:
+</p><pre class="programlisting">
+default:\
+...
+ :datasize-cur=256M:\
+ :maxproc-cur=256:\
+ :openfiles-cur=256:\
+...
+</pre><p>
+ (<code class="literal">-cur</code> is the soft limit. Append
+ <code class="literal">-max</code> to set the hard limit.)
+ </p><p>
+ Kernels can also have system-wide limits on some resources.
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ On <span class="productname">Linux</span> the kernel parameter
+ <code class="varname">fs.file-max</code> determines the maximum number of open
+ files that the kernel will support. It can be changed with
+ <code class="literal">sysctl -w fs.file-max=<em class="replaceable"><code>N</code></em></code>.
+ To make the setting persist across reboots, add an assignment
+ in <code class="filename">/etc/sysctl.conf</code>.
+ The maximum limit of files per process is fixed at the time the
+ kernel is compiled; see
+ <code class="filename">/usr/src/linux/Documentation/proc.txt</code> for
+ more information.
+ </p></li></ul></div><p>
+ </p><p>
+ The <span class="productname">PostgreSQL</span> server uses one process
+ per connection so you should provide for at least as many processes
+ as allowed connections, in addition to what you need for the rest
+ of your system. This is usually not a problem but if you run
+ several servers on one machine things might get tight.
+ </p><p>
+ The factory default limit on open files is often set to
+ <span class="quote">“<span class="quote">socially friendly</span>”</span> values that allow many users to
+ coexist on a machine without using an inappropriate fraction of
+ the system resources. If you run many servers on a machine this
+ is perhaps what you want, but on dedicated servers you might want to
+ raise this limit.
+ </p><p>
+ On the other side of the coin, some systems allow individual
+ processes to open large numbers of files; if more than a few
+ processes do so then the system-wide limit can easily be exceeded.
+ If you find this happening, and you do not want to alter the
+ system-wide limit, you can set <span class="productname">PostgreSQL</span>'s <a class="xref" href="runtime-config-resource.html#GUC-MAX-FILES-PER-PROCESS">max_files_per_process</a> configuration parameter to
+ limit the consumption of open files.
+ </p><p>
+ Another kernel limit that may be of concern when supporting large
+ numbers of client connections is the maximum socket connection queue
+ length. If more than that many connection requests arrive within a very
+ short period, some may get rejected before the <span class="productname">PostgreSQL</span> server can service
+ the requests, with those clients receiving unhelpful connection failure
+ errors such as <span class="quote">“<span class="quote">Resource temporarily unavailable</span>”</span> or
+ <span class="quote">“<span class="quote">Connection refused</span>”</span>. The default queue length limit is 128
+ on many platforms. To raise it, adjust the appropriate kernel parameter
+ via <span class="application">sysctl</span>, then restart the <span class="productname">PostgreSQL</span> server.
+ The parameter is variously named <code class="varname">net.core.somaxconn</code>
+ on Linux, <code class="varname">kern.ipc.soacceptqueue</code> on newer FreeBSD,
+ and <code class="varname">kern.ipc.somaxconn</code> on macOS and other BSD
+ variants.
+ </p></div><div class="sect2" id="LINUX-MEMORY-OVERCOMMIT"><div class="titlepage"><div><div><h3 class="title">19.4.4. Linux Memory Overcommit <a href="#LINUX-MEMORY-OVERCOMMIT" class="id_link">#</a></h3></div></div></div><a id="id-1.6.6.7.6.2" class="indexterm"></a><a id="id-1.6.6.7.6.3" class="indexterm"></a><a id="id-1.6.6.7.6.4" class="indexterm"></a><p>
+ The default virtual memory behavior on Linux is not
+ optimal for <span class="productname">PostgreSQL</span>. Because of the
+ way that the kernel implements memory overcommit, the kernel might
+ terminate the <span class="productname">PostgreSQL</span> postmaster (the
+ supervisor server process) if the memory demands of either
+ <span class="productname">PostgreSQL</span> or another process cause the
+ system to run out of virtual memory.
+ </p><p>
+ If this happens, you will see a kernel message that looks like
+ this (consult your system documentation and configuration on where
+ to look for such a message):
+</p><pre class="programlisting">
+Out of Memory: Killed process 12345 (postgres).
+</pre><p>
+ This indicates that the <code class="filename">postgres</code> process
+ has been terminated due to memory pressure.
+ Although existing database connections will continue to function
+ normally, no new connections will be accepted. To recover,
+ <span class="productname">PostgreSQL</span> will need to be restarted.
+ </p><p>
+ One way to avoid this problem is to run
+ <span class="productname">PostgreSQL</span> on a machine where you can
+ be sure that other processes will not run the machine out of
+ memory. If memory is tight, increasing the swap space of the
+ operating system can help avoid the problem, because the
+ out-of-memory (OOM) killer is invoked only when physical memory and
+ swap space are exhausted.
+ </p><p>
+ If <span class="productname">PostgreSQL</span> itself is the cause of the
+ system running out of memory, you can avoid the problem by changing
+ your configuration. In some cases, it may help to lower memory-related
+ configuration parameters, particularly
+ <a class="link" href="runtime-config-resource.html#GUC-SHARED-BUFFERS"><code class="varname">shared_buffers</code></a>,
+ <a class="link" href="runtime-config-resource.html#GUC-WORK-MEM"><code class="varname">work_mem</code></a>, and
+ <a class="link" href="runtime-config-resource.html#GUC-HASH-MEM-MULTIPLIER"><code class="varname">hash_mem_multiplier</code></a>.
+ In other cases, the problem may be caused by allowing too many
+ connections to the database server itself. In many cases, it may
+ be better to reduce
+ <a class="link" href="runtime-config-connection.html#GUC-MAX-CONNECTIONS"><code class="varname">max_connections</code></a>
+ and instead make use of external connection-pooling software.
+ </p><p>
+ It is possible to modify the
+ kernel's behavior so that it will not <span class="quote">“<span class="quote">overcommit</span>”</span> memory.
+ Although this setting will not prevent the <a class="ulink" href="https://lwn.net/Articles/104179/" target="_top">OOM killer</a> from being invoked
+ altogether, it will lower the chances significantly and will therefore
+ lead to more robust system behavior. This is done by selecting strict
+ overcommit mode via <code class="command">sysctl</code>:
+</p><pre class="programlisting">
+sysctl -w vm.overcommit_memory=2
+</pre><p>
+ or placing an equivalent entry in <code class="filename">/etc/sysctl.conf</code>.
+ You might also wish to modify the related setting
+ <code class="varname">vm.overcommit_ratio</code>. For details see the kernel documentation
+ file <a class="ulink" href="https://www.kernel.org/doc/Documentation/vm/overcommit-accounting" target="_top">https://www.kernel.org/doc/Documentation/vm/overcommit-accounting</a>.
+ </p><p>
+ Another approach, which can be used with or without altering
+ <code class="varname">vm.overcommit_memory</code>, is to set the process-specific
+ <em class="firstterm">OOM score adjustment</em> value for the postmaster process to
+ <code class="literal">-1000</code>, thereby guaranteeing it will not be targeted by the OOM
+ killer. The simplest way to do this is to execute
+</p><pre class="programlisting">
+echo -1000 &gt; /proc/self/oom_score_adj
+</pre><p>
+ in the <span class="productname">PostgreSQL</span> startup script just before
+ invoking <code class="filename">postgres</code>.
+ Note that this action must be done as root, or it will have no effect;
+ so a root-owned startup script is the easiest place to do it. If you
+ do this, you should also set these environment variables in the startup
+ script before invoking <code class="filename">postgres</code>:
+</p><pre class="programlisting">
+export PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
+export PG_OOM_ADJUST_VALUE=0
+</pre><p>
+ These settings will cause postmaster child processes to run with the
+ normal OOM score adjustment of zero, so that the OOM killer can still
+ target them at need. You could use some other value for
+ <code class="envar">PG_OOM_ADJUST_VALUE</code> if you want the child processes to run
+ with some other OOM score adjustment. (<code class="envar">PG_OOM_ADJUST_VALUE</code>
+ can also be omitted, in which case it defaults to zero.) If you do not
+ set <code class="envar">PG_OOM_ADJUST_FILE</code>, the child processes will run with the
+ same OOM score adjustment as the postmaster, which is unwise since the
+ whole point is to ensure that the postmaster has a preferential setting.
+ </p></div><div class="sect2" id="LINUX-HUGE-PAGES"><div class="titlepage"><div><div><h3 class="title">19.4.5. Linux Huge Pages <a href="#LINUX-HUGE-PAGES" class="id_link">#</a></h3></div></div></div><p>
+ Using huge pages reduces overhead when using large contiguous chunks of
+ memory, as <span class="productname">PostgreSQL</span> does, particularly when
+ using large values of <a class="xref" href="runtime-config-resource.html#GUC-SHARED-BUFFERS">shared_buffers</a>. To use this
+ feature in <span class="productname">PostgreSQL</span> you need a kernel
+ with <code class="varname">CONFIG_HUGETLBFS=y</code> and
+ <code class="varname">CONFIG_HUGETLB_PAGE=y</code>. You will also have to configure
+ the operating system to provide enough huge pages of the desired size.
+ To determine the number of huge pages needed, use the
+ <code class="command">postgres</code> command to see the value of
+ <a class="xref" href="runtime-config-preset.html#GUC-SHARED-MEMORY-SIZE-IN-HUGE-PAGES">shared_memory_size_in_huge_pages</a>. Note that the
+ server must be shut down to view this runtime-computed parameter.
+ This might look like:
+</p><pre class="programlisting">
+$ <strong class="userinput"><code>postgres -D $PGDATA -C shared_memory_size_in_huge_pages</code></strong>
+3170
+$ <strong class="userinput"><code>grep ^Hugepagesize /proc/meminfo</code></strong>
+Hugepagesize: 2048 kB
+$ <strong class="userinput"><code>ls /sys/kernel/mm/hugepages</code></strong>
+hugepages-1048576kB hugepages-2048kB
+</pre><p>
+
+ In this example the default is 2MB, but you can also explicitly request
+ either 2MB or 1GB with <a class="xref" href="runtime-config-resource.html#GUC-HUGE-PAGE-SIZE">huge_page_size</a> to adapt
+ the number of pages calculated by
+ <code class="varname">shared_memory_size_in_huge_pages</code>.
+
+ While we need at least <code class="literal">3170</code> huge pages in this example,
+ a larger setting would be appropriate if other programs on the machine
+ also need huge pages.
+ We can set this with:
+</p><pre class="programlisting">
+# <strong class="userinput"><code>sysctl -w vm.nr_hugepages=3170</code></strong>
+</pre><p>
+ Don't forget to add this setting to <code class="filename">/etc/sysctl.conf</code>
+ so that it is reapplied after reboots. For non-default huge page sizes,
+ we can instead use:
+</p><pre class="programlisting">
+# <strong class="userinput"><code>echo 3170 &gt; /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</code></strong>
+</pre><p>
+ It is also possible to provide these settings at boot time using
+ kernel parameters such as <code class="literal">hugepagesz=2M hugepages=3170</code>.
+ </p><p>
+ Sometimes the kernel is not able to allocate the desired number of huge
+ pages immediately due to fragmentation, so it might be necessary
+ to repeat the command or to reboot. (Immediately after a reboot, most of
+ the machine's memory should be available to convert into huge pages.)
+ To verify the huge page allocation situation for a given size, use:
+</p><pre class="programlisting">
+$ <strong class="userinput"><code>cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages</code></strong>
+</pre><p>
+ </p><p>
+ It may also be necessary to give the database server's operating system
+ user permission to use huge pages by setting
+ <code class="varname">vm.hugetlb_shm_group</code> via <span class="application">sysctl</span>, and/or
+ give permission to lock memory with <code class="command">ulimit -l</code>.
+ </p><p>
+ The default behavior for huge pages in
+ <span class="productname">PostgreSQL</span> is to use them when possible, with
+ the system's default huge page size, and
+ to fall back to normal pages on failure. To enforce the use of huge
+ pages, you can set <a class="xref" href="runtime-config-resource.html#GUC-HUGE-PAGES">huge_pages</a>
+ to <code class="literal">on</code> in <code class="filename">postgresql.conf</code>.
+ Note that with this setting <span class="productname">PostgreSQL</span> will fail to
+ start if not enough huge pages are available.
+ </p><p>
+ For a detailed description of the <span class="productname">Linux</span> huge
+ pages feature have a look
+ at <a class="ulink" href="https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt" target="_top">https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt</a>.
+ </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="server-start.html" title="19.3. Starting the Database Server">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="runtime.html" title="Chapter 19. Server Setup and Operation">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="server-shutdown.html" title="19.5. Shutting Down the Server">Next</a></td></tr><tr><td width="40%" align="left" valign="top">19.3. Starting the Database Server </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 19.5. Shutting Down the Server</td></tr></table></div></body></html> \ No newline at end of file