diff options
Diffstat (limited to 'doc/radosgw/cloud-sync-module.rst')
-rw-r--r-- | doc/radosgw/cloud-sync-module.rst | 244 |
1 files changed, 244 insertions, 0 deletions
diff --git a/doc/radosgw/cloud-sync-module.rst b/doc/radosgw/cloud-sync-module.rst new file mode 100644 index 000000000..bea7f441b --- /dev/null +++ b/doc/radosgw/cloud-sync-module.rst @@ -0,0 +1,244 @@ +========================= +Cloud Sync Module +========================= + +.. versionadded:: Mimic + +This module syncs zone data to a remote cloud service. The sync is unidirectional; data is not synced back from the +remote zone. The goal of this module is to enable syncing data to multiple cloud providers. The currently supported +cloud providers are those that are compatible with AWS (S3). + +User credentials for the remote cloud object store service need to be configured. Since many cloud services impose limits +on the number of buckets that each user can create, the mapping of source objects and buckets is configurable. +It is possible to configure different targets to different buckets and bucket prefixes. Note that source ACLs will not +be preserved. It is possible to map permissions of specific source users to specific destination users. + +Due to API limitations there is no way to preserve original object modification time and ETag. The cloud sync module +stores these as metadata attributes on the destination objects. + + + +Cloud Sync Tier Type Configuration +------------------------------------- + +Trivial Configuration: +~~~~~~~~~~~~~~~~~~~~~~ + +:: + + { + "connection": { + "access_key": <access>, + "secret": <secret>, + "endpoint": <endpoint>, + "host_style": <path | virtual>, + }, + "acls": [ { "type": <id | email | uri>, + "source_id": <source_id>, + "dest_id": <dest_id> } ... ], + "target_path": <target_path>, + } + + +Non Trivial Configuration: +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + { + "default": { + "connection": { + "access_key": <access>, + "secret": <secret>, + "endpoint": <endpoint>, + "host_style" <path | virtual>, + }, + "acls": [ + { + "type" : <id | email | uri>, # optional, default is id + "source_id": <id>, + "dest_id": <id> + } ... ] + "target_path": <path> # optional + }, + "connections": [ + { + "connection_id": <id>, + "access_key": <access>, + "secret": <secret>, + "endpoint": <endpoint>, + "host_style" <path | virtual>, # optional + } ... ], + "acl_profiles": [ + { + "acls_id": <id>, # acl mappings + "acls": [ { + "type": <id | email | uri>, + "source_id": <id>, + "dest_id": <id> + } ... ] + } + ], + "profiles": [ + { + "source_bucket": <source>, + "connection_id": <connection_id>, + "acls_id": <mappings_id>, + "target_path": <dest>, # optional + } ... ], + } + + +.. Note:: Trivial configuration can coincide with the non-trivial one. + + +* ``connection`` (container) + +Represents a connection to the remote cloud service. Contains ``conection_id`, ``access_key``, +``secret``, ``endpoint``, and ``host_style``. + +* ``access_key`` (string) + +The remote cloud access key that will be used for a specific connection. + +* ``secret`` (string) + +The secret key for the remote cloud service. + +* ``endpoint`` (string) + +URL of remote cloud service endpoint. + +* ``host_style`` (path | virtual) + +Type of host style to be used when accessing remote cloud endpoint (default: ``path``). + +* ``acls`` (array) + +Contains a list of ``acl_mappings``. + +* ``acl_mapping`` (container) + +Each ``acl_mapping`` structure contains ``type``, ``source_id``, and ``dest_id``. These +will define the ACL mutation that will be done on each object. An ACL mutation allows converting source +user id to a destination id. + +* ``type`` (id | email | uri) + +ACL type: ``id`` defines user id, ``email`` defines user by email, and ``uri`` defines user by ``uri`` (group). + +* ``source_id`` (string) + +ID of user in the source zone. + +* ``dest_id`` (string) + +ID of user in the destination. + +* ``target_path`` (string) + +A string that defines how the target path is created. The target path specifies a prefix to which +the source object name is appended. The target path configurable can include any of the following +variables: +- ``sid``: unique string that represents the sync instance ID +- ``zonegroup``: the zonegroup name +- ``zonegroup_id``: the zonegroup ID +- ``zone``: the zone name +- ``zone_id``: the zone id +- ``bucket``: source bucket name +- ``owner``: source bucket owner ID + +For example: ``target_path = rgwx-${zone}-${sid}/${owner}/${bucket}`` + + +* ``acl_profiles`` (array) + +An array of ``acl_profile``. + +* ``acl_profile`` (container) + +Each profile contains ``acls_id`` (string) that represents the profile, and ``acls`` array that +holds a list of ``acl_mappings``. + +* ``profiles`` (array) + +A list of profiles. Each profile contains the following: +- ``source_bucket``: either a bucket name, or a bucket prefix (if ends with ``*``) that defines the source bucket(s) for this profile +- ``target_path``: as defined above +- ``connection_id``: ID of the connection that will be used for this profile +- ``acls_id``: ID of ACLs profile that will be used for this profile + + +S3 Specific Configurables: +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently cloud sync will only work with backends that are compatible with AWS S3. There are +a few configurables that can be used to tweak its behavior when accessing these cloud services: + +:: + + { + "multipart_sync_threshold": {object_size}, + "multipart_min_part_size": {part_size} + } + + +* ``multipart_sync_threshold`` (integer) + +Objects this size or larger will be synced to the cloud using multipart upload. + +* ``multipart_min_part_size`` (integer) + +Minimum parts size to use when syncing objects using multipart upload. + + +How to Configure +~~~~~~~~~~~~~~~~ + +See :ref:`multisite` for how to multisite config instructions. The cloud sync module requires a creation of a new zone. The zone +tier type needs to be defined as ``cloud``: + +:: + + # radosgw-admin zone create --rgw-zonegroup={zone-group-name} \ + --rgw-zone={zone-name} \ + --endpoints={http://fqdn}[,{http://fqdn}] + --tier-type=cloud + + +The tier configuration can be then done using the following command + +:: + + # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \ + --rgw-zone={zone-name} \ + --tier-config={key}={val}[,{key}={val}] + +The ``key`` in the configuration specifies the config variable that needs to be updated, and +the ``val`` specifies its new value. Nested values can be accessed using period. For example: + +:: + + # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \ + --rgw-zone={zone-name} \ + --tier-config=connection.access_key={key},connection.secret={secret} + + +Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed +in square brackets, and adding new array entry can be done by using `[]`. Index value of `-1` references +the last entry in the array. At the moment it is not possible to create a new entry and reference it +again at the same command. +For example, creating a new profile for buckets starting with {prefix}: + +:: + + # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \ + --rgw-zone={zone-name} \ + --tier-config=profiles[].source_bucket={prefix}'*' + + # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \ + --rgw-zone={zone-name} \ + --tier-config=profiles[-1].connection_id={conn_id},profiles[-1].acls_id={acls_id} + + +An entry can be removed by using ``--tier-config-rm={key}``. |