From 5e45211a64149b3c659b90ff2de6fa982a5a93ed Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 4 May 2024 14:17:33 +0200 Subject: Adding upstream version 15.5. Signed-off-by: Daniel Baumann --- doc/src/sgml/html/replication-origins.html | 68 ++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 doc/src/sgml/html/replication-origins.html (limited to 'doc/src/sgml/html/replication-origins.html') diff --git a/doc/src/sgml/html/replication-origins.html b/doc/src/sgml/html/replication-origins.html new file mode 100644 index 0000000..ab687ef --- /dev/null +++ b/doc/src/sgml/html/replication-origins.html @@ -0,0 +1,68 @@ + +Chapter 50. Replication Progress Tracking

Chapter 50. Replication Progress Tracking

+ Replication origins are intended to make it easier to implement + logical replication solutions on top + of logical decoding. + They provide a solution to two common problems: +

  • How to safely keep track of replication progress

  • How to change replication behavior based on the + origin of a row; for example, to prevent loops in bi-directional + replication setups

+

+ Replication origins have just two properties, a name and an ID. The name, + which is what should be used to refer to the origin across systems, is + free-form text. It should be used in a way that makes conflicts + between replication origins created by different replication solutions + unlikely; e.g., by prefixing the replication solution's name to it. + The ID is used only to avoid having to store the long version + in situations where space efficiency is important. It should never be shared + across systems. +

+ Replication origins can be created using the function + pg_replication_origin_create(); + dropped using + pg_replication_origin_drop(); + and seen in the + pg_replication_origin + system catalog. +

+ One nontrivial part of building a replication solution is to keep track of + replay progress in a safe manner. When the applying process, or the whole + cluster, dies, it needs to be possible to find out up to where data has + successfully been replicated. Naive solutions to this, such as updating a + row in a table for every replayed transaction, have problems like run-time + overhead and database bloat. +

+ Using the replication origin infrastructure a session can be + marked as replaying from a remote node (using the + pg_replication_origin_session_setup() + function). Additionally the LSN and commit + time stamp of every source transaction can be configured on a per + transaction basis using + pg_replication_origin_xact_setup(). + If that's done replication progress will persist in a crash safe + manner. Replay progress for all replication origins can be seen in the + + pg_replication_origin_status + view. An individual origin's progress, e.g., when resuming + replication, can be acquired using + pg_replication_origin_progress() + for any origin or + pg_replication_origin_session_progress() + for the origin configured in the current session. +

+ In replication topologies more complex than replication from exactly one + system to one other system, another problem can be that it is hard to avoid + replicating replayed rows again. That can lead both to cycles in the + replication and inefficiencies. Replication origins provide an optional + mechanism to recognize and prevent that. When configured using the functions + referenced in the previous paragraph, every change and transaction passed to + output plugin callbacks (see Section 49.6) + generated by the session is tagged with the replication origin of the + generating session. This allows treating them differently in the output + plugin, e.g., ignoring all but locally-originating rows. Additionally + the + filter_by_origin_cb callback can be used + to filter the logical decoding change stream based on the + source. While less flexible, filtering via that callback is + considerably more efficient than doing it in the output plugin. +

\ No newline at end of file -- cgit v1.2.3