19.2. Creating a Database Cluster

19.2. Creating a Database Cluster
Prev	Up	Chapter 19. Server Setup and Operation	Home	Next

19.2. Creating a Database Cluster #

19.2.1. Use of Secondary File Systems
19.2.2. File Systems

+ Before you can do anything, you must initialize a database storage + area on disk. We call this a database cluster. + (The SQL standard uses the term catalog cluster.) A + database cluster is a collection of databases that is managed by a + single instance of a running database server. After initialization, a + database cluster will contain a database named postgres, + which is meant as a default database for use by utilities, users and third + party applications. The database server itself does not require the + postgres database to exist, but many external utility + programs assume it exists. There are two more databases created within + each cluster during initialization, named template1 + and template0. As the names suggest, these will be + used as templates for subsequently-created databases; they should not be + used for actual work. (See Chapter 23 for + information about creating new databases within a cluster.) +

+ In file system terms, a database cluster is a single directory + under which all data will be stored. We call this the data + directory or data area. It is + completely up to you where you choose to store your data. There is no + default, although locations such as + /usr/local/pgsql/data or + /var/lib/pgsql/data are popular. + The data directory must be initialized before being used, using the program + initdb + which is installed with PostgreSQL. +

+ If you are using a pre-packaged version + of PostgreSQL, it may well have a specific + convention for where to place the data directory, and it may also + provide a script for creating the data directory. In that case you + should use that script in preference to + running initdb directly. + Consult the package-level documentation for details. +

+ To initialize a database cluster manually, + run initdb and specify the desired + file system location of the database cluster with the + -D option, for example: +

+$ initdb -D /usr/local/pgsql/data
+

+ Note that you must execute this command while logged into the + PostgreSQL user account, which is + described in the previous section. +

Tip

+ As an alternative to the -D option, you can set + the environment variable PGDATA. + +

+ Alternatively, you can run initdb via + the pg_ctl + program like so: +

+$ pg_ctl -D /usr/local/pgsql/data initdb
+

+ This may be more intuitive if you are + using pg_ctl for starting and stopping the + server (see Section 19.3), so + that pg_ctl would be the sole command you use + for managing the database server instance. +

+ initdb will attempt to create the directory you + specify if it does not already exist. Of course, this will fail if + initdb does not have permissions to write in the + parent directory. It's generally recommendable that the + PostgreSQL user own not just the data + directory but its parent directory as well, so that this should not + be a problem. If the desired parent directory doesn't exist either, + you will need to create it first, using root privileges if the + grandparent directory isn't writable. So the process might look + like this: +

+root# mkdir /usr/local/pgsql
+root# chown postgres /usr/local/pgsql
+root# su postgres
+postgres$ initdb -D /usr/local/pgsql/data
+

+ initdb will refuse to run if the data directory + exists and already contains files; this is to prevent accidentally + overwriting an existing installation. +

+ Because the data directory contains all the data stored in the + database, it is essential that it be secured from unauthorized + access. initdb therefore revokes access + permissions from everyone but the + PostgreSQL user, and optionally, group. + Group access, when enabled, is read-only. This allows an unprivileged + user in the same group as the cluster owner to take a backup of the + cluster data or perform other operations that only require read access. +

+ Note that enabling or disabling group access on an existing cluster requires + the cluster to be shut down and the appropriate mode to be set on all + directories and files before restarting + PostgreSQL. Otherwise, a mix of modes might + exist in the data directory. For clusters that allow access only by the + owner, the appropriate modes are 0700 for directories + and 0600 for files. For clusters that also allow + reads by the group, the appropriate modes are 0750 + for directories and 0640 for files. +

+ However, while the directory contents are secure, the default + client authentication setup allows any local user to connect to the + database and even become the database superuser. If you do not + trust other local users, we recommend you use one of + initdb's -W, --pwprompt + or --pwfile options to assign a password to the + database superuser. + Also, specify -A scram-sha-256 + so that the default trust authentication + mode is not used; or modify the generated pg_hba.conf + file after running initdb, but + before you start the server for the first time. (Other + reasonable approaches include using peer authentication + or file system permissions to restrict connections. See Chapter 21 for more information.) +

+ initdb also initializes the default + locale for the database cluster. + Normally, it will just take the locale settings in the environment + and apply them to the initialized database. It is possible to + specify a different locale for the database; more information about + that can be found in Section 24.1. The default sort order used + within the particular database cluster is set by + initdb, and while you can create new databases using + different sort order, the order used in the template databases that initdb + creates cannot be changed without dropping and recreating them. + There is also a performance impact for using locales + other than C or POSIX. Therefore, it is + important to make this choice correctly the first time. +

+ initdb also sets the default character set encoding + for the database cluster. Normally this should be chosen to match the + locale setting. For details see Section 24.3. +

+ Non-C and non-POSIX locales rely on the + operating system's collation library for character set ordering. + This controls the ordering of keys stored in indexes. For this reason, + a cluster cannot switch to an incompatible collation library version, + either through snapshot restore, binary streaming replication, a + different operating system, or an operating system upgrade. +

19.2.1. Use of Secondary File Systems #

+ Many installations create their database clusters on file systems + (volumes) other than the machine's “root” volume. If you + choose to do this, it is not advisable to try to use the secondary + volume's topmost directory (mount point) as the data directory. + Best practice is to create a directory within the mount-point + directory that is owned by the PostgreSQL + user, and then create the data directory within that. This avoids + permissions problems, particularly for operations such + as pg_upgrade, and it also ensures clean failures if + the secondary volume is taken offline. +

19.2.2. File Systems #

+ Generally, any file system with POSIX semantics can be used for + PostgreSQL. Users prefer different file systems for a variety of reasons, + including vendor support, performance, and familiarity. Experience + suggests that, all other things being equal, one should not expect major + performance or behavior changes merely from switching file systems or + making minor file system configuration changes. +

19.2.2.1. NFS #

+ It is possible to use an NFS file system for storing + the PostgreSQL data directory. + PostgreSQL does nothing special for + NFS file systems, meaning it assumes + NFS behaves exactly like locally-connected drives. + PostgreSQL does not use any functionality that + is known to have nonstandard behavior on NFS, such as + file locking. +

+ The only firm requirement for using NFS with + PostgreSQL is that the file system is mounted + using the hard option. With the + hard option, processes can “hang” + indefinitely if there are network problems, so this configuration will + require a careful monitoring setup. The soft option + will interrupt system calls in case of network problems, but + PostgreSQL will not repeat system calls + interrupted in this way, so any such interruption will result in an I/O + error being reported. +

+ It is not necessary to use the sync mount option. The + behavior of the async option is sufficient, since + PostgreSQL issues fsync + calls at appropriate times to flush the write caches. (This is analogous + to how it works on a local file system.) However, it is strongly + recommended to use the sync export option on the NFS + server on systems where it exists (mainly Linux). + Otherwise, an fsync or equivalent on the NFS client is + not actually guaranteed to reach permanent storage on the server, which + could cause corruption similar to running with the parameter fsync off. The defaults of these mount and export + options differ between vendors and versions, so it is recommended to + check and perhaps specify them explicitly in any case to avoid any + ambiguity. +

+ In some cases, an external storage product can be accessed either via NFS + or a lower-level protocol such as iSCSI. In the latter case, the storage + appears as a block device and any available file system can be created on + it. That approach might relieve the DBA from having to deal with some of + the idiosyncrasies of NFS, but of course the complexity of managing + remote storage then happens at other levels. +

Prev	Up	Next
19.1. The PostgreSQL User Account	Home	19.3. Starting the Database Server