Multi-Version/Multi-Cluster PostgreSQL architecture
===================================================
2004, Oliver Elphick, Martin Pitt

Solving a problem
-----------------

When a new major version of PostgreSQL is released, it is necessary to dump and
reload the database. The old software must be used for the dump, and the new
software for the reload.

This was a major problem for RedHat and Debian, because a dump and reload was
not required by every upgrade and by the time the need for a dump is realised,
the old software might have been deleted. Debian had certain rather unreliable
procedures to save the old software and use it to do a dump, but these
procedures often went wrong. RedHat's installation environment is so rigid that
it is not practicable for the RedHat packages to attempt an automatic upgrade.
Debian offered a debconf choice for whether to attempt automatic upgrading; if
it failed or was not allowed, a manual upgrade had to be done, either from a
pre-existing dump or by manual invocation of the postgresql-dump script.

It is possible to run different versions of PostgreSQL simultaneously, and
indeed to run the same version on separate database clusters simultaneously. To
do so, each postgres instance must listen on a different port, so each client
must specify the correct port. By having two separate versions of the
PostgreSQL packages installed simultaneously, it is simple to do database
upgrades by dumping from the old version and uploading to the new. The
PostgreSQL client wrapper is designed to permit this.

General Architecture idea
-------------------------

The Debian packaging has been changed to create a new package for each major
version. The criterion for creating a new package is that initdb is required
when upgrading from the previous version. Thus, there are now source packages
`postgresql-8.1` and `postgresql-8.3` (and similarly for all the binary
packages).

The legacy postgresql and the other existing binary package names have become
dummy packages depending on one of the versioned equivalents. Their only
purpose is now to ensure a smooth upgrade and to register the existing database
cluster to the new architecture. These packages will be removed from the
archive as soon as the next Debian release after Sarge (Etch) is released.

Each versioned package installs into `/usr/lib/postgresql/version`.  In order
to allow users easily to select the right version and cluster when working, the
`postgresql-common` package provides the `pg_wrapper` program, which reads the
per-user and system wide configuration file and forks the correct executable
with the correct library versions according to those preferences.  `/usr/bin`
provides executables soft-linked to `pg_wrapper`.

This architecture also allows separate database clusters to be maintained for
the use of different groups of users; these clusters need not all be of the
same major version.  This allows much greater flexibility for those people who
need to make application software changes consequent on a PostgreSQL upgrade.

Detailed structure
------------------

### Configuration hierarchy

* `/etc/postgresql-common/user_clusters`: maps users against clusters and
  default databases

* `$HOME/.postgresqlrc`: per-user preferences for default version/cluster and
  database; overrides `/etc/postgresql-common/user_clusters`

* `/etc/postgresql/version/clustername`: cluster-specific configuration files:

  * `postgresql.conf`, `pg_hba.conf`, `pg_ident.conf`
  * optionally `start.conf`: startup mode of the cluster: `auto` (start/stop in
    init script), `manual` (do not start/stop in init script, but manual
    control with `pg_ctlcluster` is possible), `disabled` (`pg_ctlcluster`
    is not allowed).
  * optionally `pg_ctl.conf`: options to be passed to `pg_ctl`.
  * optionally a symbolic link `log` which points to the postgres log file.
    Defaults to `/var/log/postgresql/postgresql-version-cluster.conf`.
    Explicitly setting `log_directory` and/or `log_filename` in
    `postgresql.conf` overrides this.

### Per-version files and programs

* `/usr/lib/postgresql/version`
* `/usr/share/postgresql/version`
* `/usr/share/doc/postgresql/postgresql-doc-version`:
version specific program and data files

### Common programs

* `/usr/share/postgresql-common/pg_wrapper`: environment chooser and program selector
* `/usr/bin/program`: symbolic links to pg_wrapper, for all client programs
* `/usr/bin/pg_lsclusters`: list all available clusters with their status and configuration
* `/usr/bin/pg_createcluster: wrapper for `initdb`, sets up the necessary configuration structure
* `/usr/bin/pg_ctlcluster`: wrapper for `pg_ctl`, control the cluster postgres server
* `/usr/bin/pg_upgradecluster`: upgrade a cluster to a newer major version
* `/usr/bin/pg_dropcluster`: remove a cluster and its configuration

### /etc/init.d/postgresql

This script handles the postgres server processes for each version and all
their clusters. However, most of the actual work is done by the new
`pg_ctlcluster` program.

### pg_upgradecluster

This program replaces postgresql-dump (a Debian specific program).

It is used to migrate a cluster from one major version to another.

Usage: `pg_upgradecluster [-v newversion] version name [data_dir]`

`-v`: specifies the version to upgrade to; defaults to the newest available version.

 -- The Debian PostgreSQL maintainers