summaryrefslogtreecommitdiffstats
path: root/doc/radosgw/cloud-sync-module.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/radosgw/cloud-sync-module.rst')
-rw-r--r--doc/radosgw/cloud-sync-module.rst244
1 files changed, 244 insertions, 0 deletions
diff --git a/doc/radosgw/cloud-sync-module.rst b/doc/radosgw/cloud-sync-module.rst
new file mode 100644
index 000000000..a601bd503
--- /dev/null
+++ b/doc/radosgw/cloud-sync-module.rst
@@ -0,0 +1,244 @@
+=========================
+Cloud Sync Module
+=========================
+
+.. versionadded:: Mimic
+
+This module syncs zone data to a remote cloud service. The sync is unidirectional; data is not synced back from the
+remote zone. The goal of this module is to enable syncing data to multiple cloud providers. The currently supported
+cloud providers are those that are compatible with AWS (S3).
+
+User credentials for the remote cloud object store service need to be configured. Since many cloud services impose limits
+on the number of buckets that each user can create, the mapping of source objects and buckets is configurable.
+It is possible to configure different targets to different buckets and bucket prefixes. Note that source ACLs will not
+be preserved. It is possible to map permissions of specific source users to specific destination users.
+
+Due to API limitations there is no way to preserve original object modification time and ETag. The cloud sync module
+stores these as metadata attributes on the destination objects.
+
+
+
+Cloud Sync Tier Type Configuration
+-------------------------------------
+
+Trivial Configuration:
+~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ {
+ "connection": {
+ "access_key": <access>,
+ "secret": <secret>,
+ "endpoint": <endpoint>,
+ "host_style": <path | virtual>,
+ },
+ "acls": [ { "type": <id | email | uri>,
+ "source_id": <source_id>,
+ "dest_id": <dest_id> } ... ],
+ "target_path": <target_path>,
+ }
+
+
+Non Trivial Configuration:
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ {
+ "default": {
+ "connection": {
+ "access_key": <access>,
+ "secret": <secret>,
+ "endpoint": <endpoint>,
+ "host_style" <path | virtual>,
+ },
+ "acls": [
+ {
+ "type" : <id | email | uri>, # optional, default is id
+ "source_id": <id>,
+ "dest_id": <id>
+ } ... ]
+ "target_path": <path> # optional
+ },
+ "connections": [
+ {
+ "connection_id": <id>,
+ "access_key": <access>,
+ "secret": <secret>,
+ "endpoint": <endpoint>,
+ "host_style" <path | virtual>, # optional
+ } ... ],
+ "acl_profiles": [
+ {
+ "acls_id": <id>, # acl mappings
+ "acls": [ {
+ "type": <id | email | uri>,
+ "source_id": <id>,
+ "dest_id": <id>
+ } ... ]
+ }
+ ],
+ "profiles": [
+ {
+ "source_bucket": <source>,
+ "connection_id": <connection_id>,
+ "acls_id": <mappings_id>,
+ "target_path": <dest>, # optional
+ } ... ],
+ }
+
+
+.. Note:: Trivial configuration can coincide with the non-trivial one.
+
+
+* ``connection`` (container)
+
+Represents a connection to the remote cloud service. Contains ``connection_id``, ``access_key``,
+``secret``, ``endpoint``, and ``host_style``.
+
+* ``access_key`` (string)
+
+The remote cloud access key that will be used for a specific connection.
+
+* ``secret`` (string)
+
+The secret key for the remote cloud service.
+
+* ``endpoint`` (string)
+
+URL of remote cloud service endpoint.
+
+* ``host_style`` (path | virtual)
+
+Type of host style to be used when accessing remote cloud endpoint (default: ``path``).
+
+* ``acls`` (array)
+
+Contains a list of ``acl_mappings``.
+
+* ``acl_mapping`` (container)
+
+Each ``acl_mapping`` structure contains ``type``, ``source_id``, and ``dest_id``. These
+will define the ACL mutation that will be done on each object. An ACL mutation allows converting source
+user id to a destination id.
+
+* ``type`` (id | email | uri)
+
+ACL type: ``id`` defines user id, ``email`` defines user by email, and ``uri`` defines user by ``uri`` (group).
+
+* ``source_id`` (string)
+
+ID of user in the source zone.
+
+* ``dest_id`` (string)
+
+ID of user in the destination.
+
+* ``target_path`` (string)
+
+A string that defines how the target path is created. The target path specifies a prefix to which
+the source object name is appended. The target path configurable can include any of the following
+variables:
+- ``sid``: unique string that represents the sync instance ID
+- ``zonegroup``: the zonegroup name
+- ``zonegroup_id``: the zonegroup ID
+- ``zone``: the zone name
+- ``zone_id``: the zone id
+- ``bucket``: source bucket name
+- ``owner``: source bucket owner ID
+
+For example: ``target_path = rgwx-${zone}-${sid}/${owner}/${bucket}``
+
+
+* ``acl_profiles`` (array)
+
+An array of ``acl_profile``.
+
+* ``acl_profile`` (container)
+
+Each profile contains ``acls_id`` (string) that represents the profile, and ``acls`` array that
+holds a list of ``acl_mappings``.
+
+* ``profiles`` (array)
+
+A list of profiles. Each profile contains the following:
+- ``source_bucket``: either a bucket name, or a bucket prefix (if ends with ``*``) that defines the source bucket(s) for this profile
+- ``target_path``: as defined above
+- ``connection_id``: ID of the connection that will be used for this profile
+- ``acls_id``: ID of ACLs profile that will be used for this profile
+
+
+S3 Specific Configurables:
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently cloud sync will only work with backends that are compatible with AWS S3. There are
+a few configurables that can be used to tweak its behavior when accessing these cloud services:
+
+::
+
+ {
+ "multipart_sync_threshold": {object_size},
+ "multipart_min_part_size": {part_size}
+ }
+
+
+* ``multipart_sync_threshold`` (integer)
+
+Objects this size or larger will be synced to the cloud using multipart upload.
+
+* ``multipart_min_part_size`` (integer)
+
+Minimum parts size to use when syncing objects using multipart upload.
+
+
+How to Configure
+~~~~~~~~~~~~~~~~
+
+See :ref:`multisite` for how to multisite config instructions. The cloud sync module requires a creation of a new zone. The zone
+tier type needs to be defined as ``cloud``:
+
+::
+
+ # radosgw-admin zone create --rgw-zonegroup={zone-group-name} \
+ --rgw-zone={zone-name} \
+ --endpoints={http://fqdn}[,{http://fqdn}]
+ --tier-type=cloud
+
+
+The tier configuration can be then done using the following command
+
+::
+
+ # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
+ --rgw-zone={zone-name} \
+ --tier-config={key}={val}[,{key}={val}]
+
+The ``key`` in the configuration specifies the config variable that needs to be updated, and
+the ``val`` specifies its new value. Nested values can be accessed using period. For example:
+
+::
+
+ # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
+ --rgw-zone={zone-name} \
+ --tier-config=connection.access_key={key},connection.secret={secret}
+
+
+Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed
+in square brackets, and adding new array entry can be done by using `[]`. Index value of `-1` references
+the last entry in the array. At the moment it is not possible to create a new entry and reference it
+again at the same command.
+For example, creating a new profile for buckets starting with {prefix}:
+
+::
+
+ # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
+ --rgw-zone={zone-name} \
+ --tier-config=profiles[].source_bucket={prefix}'*'
+
+ # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
+ --rgw-zone={zone-name} \
+ --tier-config=profiles[-1].connection_id={conn_id},profiles[-1].acls_id={acls_id}
+
+
+An entry can be removed by using ``--tier-config-rm={key}``.