doc/radosgw/cloud-sync-module.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244

=========================
Cloud Sync Module
=========================

.. versionadded:: Mimic

This module syncs zone data to a remote cloud service. The sync is unidirectional; data is not synced back from the
remote zone. The goal of this module is to enable syncing data to multiple cloud providers. The currently supported
cloud providers are those that are compatible with AWS (S3).

User credentials for the remote cloud object store service need to be configured. Since many cloud services impose limits
on the number of buckets that each user can create, the mapping of source objects and buckets is configurable.
It is possible to configure different targets to different buckets and bucket prefixes. Note that source ACLs will not
be preserved. It is possible to map permissions of specific source users to specific destination users.

Due to API limitations there is no way to preserve original object modification time and ETag. The cloud sync module 
stores these as metadata attributes on the destination objects.


Cloud Sync Tier Type Configuration
-------------------------------------

Trivial Configuration:
~~~~~~~~~~~~~~~~~~~~~~

::

    {
      "connection": {
        "access_key": <access>,
        "secret": <secret>,
        "endpoint": <endpoint>,
        "host_style": <path | virtual>,
      },
      "acls": [ { "type": <id | email | uri>,
                  "source_id": <source_id>,
                  "dest_id": <dest_id> } ... ],
      "target_path": <target_path>,
    }


Non Trivial Configuration:
~~~~~~~~~~~~~~~~~~~~~~~~~~

::

    {
      "default": {
        "connection": {
            "access_key": <access>,
            "secret": <secret>,
            "endpoint": <endpoint>,
            "host_style" <path | virtual>,
        },
        "acls": [
        {
          "type" : <id | email | uri>,   #  optional, default is id
          "source_id": <id>,
          "dest_id": <id>
        } ... ]
        "target_path": <path> # optional
      },
      "connections": [
          {
            "connection_id": <id>,
            "access_key": <access>,
            "secret": <secret>,
            "endpoint": <endpoint>,
            "host_style" <path | virtual>,  # optional
          } ... ],
      "acl_profiles": [
          {
            "acls_id": <id>, # acl mappings
            "acls": [ {
                "type": <id | email | uri>,
                "source_id": <id>,
                "dest_id": <id>
              } ... ]
          }
      ],
      "profiles": [
          {
           "source_bucket": <source>,
           "connection_id": <connection_id>,
           "acls_id": <mappings_id>,
           "target_path": <dest>,          # optional
          } ... ],
    }


.. Note:: Trivial configuration can coincide with the non-trivial one.


* ``connection`` (container)

Represents a connection to the remote cloud service. Contains ``connection_id``, ``access_key``,
``secret``, ``endpoint``, and ``host_style``.

* ``access_key`` (string)

The remote cloud access key that will be used for a specific connection.

* ``secret`` (string)

The secret key for the remote cloud service.

* ``endpoint`` (string)

URL of remote cloud service endpoint.

* ``host_style`` (path | virtual)

Type of host style to be used when accessing remote cloud endpoint (default: ``path``).

* ``acls`` (array)

Contains a list of ``acl_mappings``.

* ``acl_mapping`` (container)

Each ``acl_mapping`` structure contains ``type``, ``source_id``, and ``dest_id``. These
will define the ACL mutation that will be done on each object. An ACL mutation allows converting source
user id to a destination id.

* ``type`` (id | email | uri)

ACL type: ``id`` defines user id, ``email`` defines user by email, and ``uri`` defines user by ``uri`` (group).

* ``source_id`` (string)

ID of user in the source zone.

* ``dest_id`` (string)

ID of user in the destination.

* ``target_path`` (string)

A string that defines how the target path is created. The target path specifies a prefix to which
the source object name is appended. The target path configurable can include any of the following
variables:
- ``sid``: unique string that represents the sync instance ID
- ``zonegroup``: the zonegroup name
- ``zonegroup_id``: the zonegroup ID
- ``zone``: the zone name
- ``zone_id``: the zone id
- ``bucket``: source bucket name
- ``owner``: source bucket owner ID

For example: ``target_path = rgwx-${zone}-${sid}/${owner}/${bucket}``


* ``acl_profiles`` (array)

An array of ``acl_profile``.

* ``acl_profile`` (container)
 
Each profile contains ``acls_id`` (string) that represents the profile, and ``acls`` array that
holds a list of ``acl_mappings``.

* ``profiles`` (array)

A list of profiles. Each profile contains the following:
- ``source_bucket``: either a bucket name, or a bucket prefix (if ends with ``*``) that defines the source bucket(s) for this profile
- ``target_path``: as defined above
- ``connection_id``: ID of the connection that will be used for this profile
- ``acls_id``: ID of ACLs profile that will be used for this profile


S3 Specific Configurables:
~~~~~~~~~~~~~~~~~~~~~~~~~~

Currently cloud sync will only work with backends that are compatible with AWS S3. There are
a few configurables that can be used to tweak its behavior when accessing these cloud services:

::

    {
      "multipart_sync_threshold": {object_size},
      "multipart_min_part_size": {part_size}
    }


* ``multipart_sync_threshold`` (integer)

Objects this size or larger will be synced to the cloud using multipart upload.

* ``multipart_min_part_size`` (integer)

Minimum parts size to use when syncing objects using multipart upload.


How to Configure
~~~~~~~~~~~~~~~~

See :ref:`multisite` for how to multisite config instructions. The cloud sync module requires a creation of a new zone. The zone
tier type needs to be defined as ``cloud``:

::

    # radosgw-admin zone create --rgw-zonegroup={zone-group-name} \
                                --rgw-zone={zone-name} \
                                --endpoints={http://fqdn}[,{http://fqdn}]
                                --tier-type=cloud


The tier configuration can be then done using the following command

::

    # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                                --rgw-zone={zone-name} \
                                --tier-config={key}={val}[,{key}={val}]

The ``key`` in the configuration specifies the config variable that needs to be updated, and
the ``val`` specifies its new value. Nested values can be accessed using period. For example:

::

    # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                                --rgw-zone={zone-name} \
                                --tier-config=connection.access_key={key},connection.secret={secret}


Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed
in square brackets, and adding new array entry can be done by using `[]`. Index value of `-1` references
the last entry in the array. At the moment it is not possible to create a new entry and reference it
again at the same command.
For example, creating a new profile for buckets starting with {prefix}:

::

    # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                                --rgw-zone={zone-name} \
                                --tier-config=profiles[].source_bucket={prefix}'*'

    # radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                                --rgw-zone={zone-name} \
                                --tier-config=profiles[-1].connection_id={conn_id},profiles[-1].acls_id={acls_id}


An entry can be removed by using ``--tier-config-rm={key}``.