diff options
Diffstat (limited to '')
-rw-r--r-- | docs/advanced/pool.rst | 332 |
1 files changed, 332 insertions, 0 deletions
diff --git a/docs/advanced/pool.rst b/docs/advanced/pool.rst new file mode 100644 index 0000000..adea0a7 --- /dev/null +++ b/docs/advanced/pool.rst @@ -0,0 +1,332 @@ +.. currentmodule:: psycopg_pool + +.. _connection-pools: + +Connection pools +================ + +A `connection pool`__ is an object managing a set of connections and allowing +their use in functions needing one. Because the time to establish a new +connection can be relatively long, keeping connections open can reduce latency. + +.. __: https://en.wikipedia.org/wiki/Connection_pool + +This page explains a few basic concepts of Psycopg connection pool's +behaviour. Please refer to the `ConnectionPool` object API for details about +the pool operations. + +.. note:: The connection pool objects are distributed in a package separate + from the main `psycopg` package: use ``pip install "psycopg[pool]"`` or ``pip + install psycopg_pool`` to make the `psycopg_pool` package available. See + :ref:`pool-installation`. + + +Pool life cycle +--------------- + +A simple way to use the pool is to create a single instance of it, as a +global object, and to use this object in the rest of the program, allowing +other functions, modules, threads to use it:: + + # module db.py in your program + from psycopg_pool import ConnectionPool + + pool = ConnectionPool(conninfo, **kwargs) + # the pool starts connecting immediately. + + # in another module + from .db import pool + + def my_function(): + with pool.connection() as conn: + conn.execute(...) + +Ideally you may want to call `~ConnectionPool.close()` when the use of the +pool is finished. Failing to call `!close()` at the end of the program is not +terribly bad: probably it will just result in some warnings printed on stderr. +However, if you think that it's sloppy, you could use the `atexit` module to +have `!close()` called at the end of the program. + +If you want to avoid starting to connect to the database at import time, and +want to wait for the application to be ready, you can create the pool using +`!open=False`, and call the `~ConnectionPool.open()` and +`~ConnectionPool.close()` methods when the conditions are right. Certain +frameworks provide callbacks triggered when the program is started and stopped +(for instance `FastAPI startup/shutdown events`__): they are perfect to +initiate and terminate the pool operations:: + + pool = ConnectionPool(conninfo, open=False, **kwargs) + + @app.on_event("startup") + def open_pool(): + pool.open() + + @app.on_event("shutdown") + def close_pool(): + pool.close() + +.. __: https://fastapi.tiangolo.com/advanced/events/#events-startup-shutdown + +Creating a single pool as a global variable is not the mandatory use: your +program can create more than one pool, which might be useful to connect to +more than one database, or to provide different types of connections, for +instance to provide separate read/write and read-only connections. The pool +also acts as a context manager and is open and closed, if necessary, on +entering and exiting the context block:: + + from psycopg_pool import ConnectionPool + + with ConnectionPool(conninfo, **kwargs) as pool: + run_app(pool) + + # the pool is now closed + +When the pool is open, the pool's background workers start creating the +requested `!min_size` connections, while the constructor (or the `!open()` +method) returns immediately. This allows the program some leeway to start +before the target database is up and running. However, if your application is +misconfigured, or the network is down, it means that the program will be able +to start, but the threads requesting a connection will fail with a +`PoolTimeout` only after the timeout on `~ConnectionPool.connection()` is +expired. If this behaviour is not desirable (and you prefer your program to +crash hard and fast, if the surrounding conditions are not right, because +something else will respawn it) you should call the `~ConnectionPool.wait()` +method after creating the pool, or call `!open(wait=True)`: these methods will +block until the pool is full, or will raise a `PoolTimeout` exception if the +pool isn't ready within the allocated time. + + +Connections life cycle +---------------------- + +The pool background workers create connections according to the parameters +`!conninfo`, `!kwargs`, and `!connection_class` passed to `ConnectionPool` +constructor, invoking something like :samp:`{connection_class}({conninfo}, +**{kwargs})`. Once a connection is created it is also passed to the +`!configure()` callback, if provided, after which it is put in the pool (or +passed to a client requesting it, if someone is already knocking at the door). + +If a connection expires (it passes `!max_lifetime`), or is returned to the pool +in broken state, or is found closed by `~ConnectionPool.check()`), then the +pool will dispose of it and will start a new connection attempt in the +background. + + +Using connections from the pool +------------------------------- + +The pool can be used to request connections from multiple threads or +concurrent tasks - it is hardly useful otherwise! If more connections than the +ones available in the pool are requested, the requesting threads are queued +and are served a connection as soon as one is available, either because +another client has finished using it or because the pool is allowed to grow +(when `!max_size` > `!min_size`) and a new connection is ready. + +The main way to use the pool is to obtain a connection using the +`~ConnectionPool.connection()` context, which returns a `~psycopg.Connection` +or subclass:: + + with my_pool.connection() as conn: + conn.execute("what you want") + +The `!connection()` context behaves like the `~psycopg.Connection` object +context: at the end of the block, if there is a transaction open, it will be +committed, or rolled back if the context is exited with as exception. + +At the end of the block the connection is returned to the pool and shouldn't +be used anymore by the code which obtained it. If a `!reset()` function is +specified in the pool constructor, it is called on the connection before +returning it to the pool. Note that the `!reset()` function is called in a +worker thread, so that the thread which used the connection can keep its +execution without being slowed down by it. + + +Pool connection and sizing +-------------------------- + +A pool can have a fixed size (specifying no `!max_size` or `!max_size` = +`!min_size`) or a dynamic size (when `!max_size` > `!min_size`). In both +cases, as soon as the pool is created, it will try to acquire `!min_size` +connections in the background. + +If an attempt to create a connection fails, a new attempt will be made soon +after, using an exponential backoff to increase the time between attempts, +until a maximum of `!reconnect_timeout` is reached. When that happens, the pool +will call the `!reconnect_failed()` function, if provided to the pool, and just +start a new connection attempt. You can use this function either to send +alerts or to interrupt the program and allow the rest of your infrastructure +to restart it. + +If more than `!min_size` connections are requested concurrently, new ones are +created, up to `!max_size`. Note that the connections are always created by the +background workers, not by the thread asking for the connection: if a client +requests a new connection, and a previous client terminates its job before the +new connection is ready, the waiting client will be served the existing +connection. This is especially useful in scenarios where the time to establish +a connection dominates the time for which the connection is used (see `this +analysis`__, for instance). + +.. __: https://github.com/brettwooldridge/HikariCP/blob/dev/documents/ + Welcome-To-The-Jungle.md + +If a pool grows above `!min_size`, but its usage decreases afterwards, a number +of connections are eventually closed: one every time a connection is unused +after the `!max_idle` time specified in the pool constructor. + + +What's the right size for the pool? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Big question. Who knows. However, probably not as large as you imagine. Please +take a look at `this analysis`__ for some ideas. + +.. __: https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing + +Something useful you can do is probably to use the +`~ConnectionPool.get_stats()` method and monitor the behaviour of your program +to tune the configuration parameters. The size of the pool can also be changed +at runtime using the `~ConnectionPool.resize()` method. + + +.. _null-pool: + +Null connection pools +--------------------- + +.. versionadded:: 3.1 + +Sometimes you may want leave the choice of using or not using a connection +pool as a configuration parameter of your application. For instance, you might +want to use a pool if you are deploying a "large instance" of your application +and can dedicate it a handful of connections; conversely you might not want to +use it if you deploy the application in several instances, behind a load +balancer, and/or using an external connection pool process such as PgBouncer. + +Switching between using or not using a pool requires some code change, because +the `ConnectionPool` API is different from the normal `~psycopg.connect()` +function and because the pool can perform additional connection configuration +(in the `!configure` parameter) that, if the pool is removed, should be +performed in some different code path of your application. + +The `!psycopg_pool` 3.1 package introduces the `NullConnectionPool` class. +This class has the same interface, and largely the same behaviour, of the +`!ConnectionPool`, but doesn't create any connection beforehand. When a +connection is returned, unless there are other clients already waiting, it +is closed immediately and not kept in the pool state. + +A null pool is not only a configuration convenience, but can also be used to +regulate the access to the server by a client program. If `!max_size` is set to +a value greater than 0, the pool will make sure that no more than `!max_size` +connections are created at any given time. If more clients ask for further +connections, they will be queued and served a connection as soon as a previous +client has finished using it, like for the basic pool. Other mechanisms to +throttle client requests (such as `!timeout` or `!max_waiting`) are respected +too. + +.. note:: + + Queued clients will be handed an already established connection, as soon + as a previous client has finished using it (and after the pool has + returned it to idle state and called `!reset()` on it, if necessary). + +Because normally (i.e. unless queued) every client will be served a new +connection, the time to obtain the connection is paid by the waiting client; +background workers are not normally involved in obtaining new connections. + + +Connection quality +------------------ + +The state of the connection is verified when a connection is returned to the +pool: if a connection is broken during its usage it will be discarded on +return and a new connection will be created. + +.. warning:: + + The health of the connection is not checked when the pool gives it to a + client. + +Why not? Because doing so would require an extra network roundtrip: we want to +save you from its latency. Before getting too angry about it, just think that +the connection can be lost any moment while your program is using it. As your +program should already be able to cope with a loss of a connection during its +process, it should be able to tolerate to be served a broken connection: +unpleasant but not the end of the world. + +.. warning:: + + The health of the connection is not checked when the connection is in the + pool. + +Does the pool keep a watchful eye on the quality of the connections inside it? +No, it doesn't. Why not? Because you will do it for us! Your program is only +a big ruse to make sure the connections are still alive... + +Not (entirely) trolling: if you are using a connection pool, we assume that +you are using and returning connections at a good pace. If the pool had to +check for the quality of a broken connection before your program notices it, +it should be polling each connection even faster than your program uses them. +Your database server wouldn't be amused... + +Can you do something better than that? Of course you can, there is always a +better way than polling. You can use the same recipe of :ref:`disconnections`, +reserving a connection and using a thread to monitor for any activity +happening on it. If any activity is detected, you can call the pool +`~ConnectionPool.check()` method, which will run a quick check on each +connection in the pool, removing the ones found in broken state, and using the +background workers to replace them with fresh ones. + +If you set up a similar check in your program, in case the database connection +is temporarily lost, we cannot do anything for the threads which had taken +already a connection from the pool, but no other thread should be served a +broken connection, because `!check()` would empty the pool and refill it with +working connections, as soon as they are available. + +Faster than you can say poll. Or pool. + + +.. _pool-stats: + +Pool stats +---------- + +The pool can return information about its usage using the methods +`~ConnectionPool.get_stats()` or `~ConnectionPool.pop_stats()`. Both methods +return the same values, but the latter reset the counters after its use. The +values can be sent to a monitoring system such as Graphite_ or Prometheus_. + +.. _Graphite: https://graphiteapp.org/ +.. _Prometheus: https://prometheus.io/ + +The following values should be provided, but please don't consider them as a +rigid interface: it is possible that they might change in the future. Keys +whose value is 0 may not be returned. + + +======================= ===================================================== +Metric Meaning +======================= ===================================================== + ``pool_min`` Current value for `~ConnectionPool.min_size` + ``pool_max`` Current value for `~ConnectionPool.max_size` + ``pool_size`` Number of connections currently managed by the pool + (in the pool, given to clients, being prepared) + ``pool_available`` Number of connections currently idle in the pool + ``requests_waiting`` Number of requests currently waiting in a queue to + receive a connection + ``usage_ms`` Total usage time of the connections outside the pool + ``requests_num`` Number of connections requested to the pool + ``requests_queued`` Number of requests queued because a connection wasn't + immediately available in the pool + ``requests_wait_ms`` Total time in the queue for the clients waiting + ``requests_errors`` Number of connection requests resulting in an error + (timeouts, queue full...) + ``returns_bad`` Number of connections returned to the pool in a bad + state + ``connections_num`` Number of connection attempts made by the pool to the + server + ``connections_ms`` Total time spent to establish connections with the + server + ``connections_errors`` Number of failed connection attempts + ``connections_lost`` Number of connections lost identified by + `~ConnectionPool.check()` +======================= ===================================================== |