summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/replication-origins.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/replication-origins.sgml')
-rw-r--r--doc/src/sgml/replication-origins.sgml96
1 files changed, 96 insertions, 0 deletions
diff --git a/doc/src/sgml/replication-origins.sgml b/doc/src/sgml/replication-origins.sgml
new file mode 100644
index 0000000..bb0fb62
--- /dev/null
+++ b/doc/src/sgml/replication-origins.sgml
@@ -0,0 +1,96 @@
+<!-- doc/src/sgml/replication-origins.sgml -->
+<chapter id="replication-origins">
+ <title>Replication Progress Tracking</title>
+
+ <indexterm zone="replication-origins">
+ <primary>Replication Progress Tracking</primary>
+ </indexterm>
+ <indexterm zone="replication-origins">
+ <primary>Replication Origins</primary>
+ </indexterm>
+
+ <para>
+ Replication origins are intended to make it easier to implement
+ logical replication solutions on top
+ of <link linkend="logicaldecoding">logical decoding</link>.
+ They provide a solution to two common problems:
+ <itemizedlist>
+ <listitem>
+ <para>How to safely keep track of replication progress</para>
+ </listitem>
+ <listitem>
+ <para>How to change replication behavior based on the
+ origin of a row; for example, to prevent loops in bi-directional
+ replication setups</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ Replication origins have just two properties, a name and an ID. The name,
+ which is what should be used to refer to the origin across systems, is
+ free-form <type>text</type>. It should be used in a way that makes conflicts
+ between replication origins created by different replication solutions
+ unlikely; e.g., by prefixing the replication solution's name to it.
+ The ID is used only to avoid having to store the long version
+ in situations where space efficiency is important. It should never be shared
+ across systems.
+ </para>
+
+ <para>
+ Replication origins can be created using the function
+ <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>;
+ dropped using
+ <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>;
+ and seen in the
+ <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link>
+ system catalog.
+ </para>
+
+ <para>
+ One nontrivial part of building a replication solution is to keep track of
+ replay progress in a safe manner. When the applying process, or the whole
+ cluster, dies, it needs to be possible to find out up to where data has
+ successfully been replicated. Naive solutions to this, such as updating a
+ row in a table for every replayed transaction, have problems like run-time
+ overhead and database bloat.
+ </para>
+
+ <para>
+ Using the replication origin infrastructure a session can be
+ marked as replaying from a remote node (using the
+ <link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link>
+ function). Additionally the <acronym>LSN</acronym> and commit
+ time stamp of every source transaction can be configured on a per
+ transaction basis using
+ <link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact_setup()</function></link>.
+ If that's done replication progress will persist in a crash safe
+ manner. Replay progress for all replication origins can be seen in the
+ <link linkend="view-pg-replication-origin-status">
+ <structname>pg_replication_origin_status</structname>
+ </link> view. An individual origin's progress, e.g., when resuming
+ replication, can be acquired using
+ <link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link>
+ for any origin or
+ <link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link>
+ for the origin configured in the current session.
+ </para>
+
+ <para>
+ In replication topologies more complex than replication from exactly one
+ system to one other system, another problem can be that it is hard to avoid
+ replicating replayed rows again. That can lead both to cycles in the
+ replication and inefficiencies. Replication origins provide an optional
+ mechanism to recognize and prevent that. When configured using the functions
+ referenced in the previous paragraph, every change and transaction passed to
+ output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin"/>)
+ generated by the session is tagged with the replication origin of the
+ generating session. This allows treating them differently in the output
+ plugin, e.g., ignoring all but locally-originating rows. Additionally
+ the <link linkend="logicaldecoding-output-plugin-filter-origin">
+ <function>filter_by_origin_cb</function></link> callback can be used
+ to filter the logical decoding change stream based on the
+ source. While less flexible, filtering via that callback is
+ considerably more efficient than doing it in the output plugin.
+ </para>
+</chapter>