doc/radosgw/elastic-sync-module.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181

=========================
ElasticSearch Sync Module
=========================

.. versionadded:: Kraken

.. note::
     As of 31 May 2020, only Elasticsearch 6 and lower are supported. ElasticSearch 7 is not supported.

This sync module writes the metadata from other zones to `ElasticSearch`_. As of
luminous this is a json of data fields we currently store in ElasticSearch.

::

   {
        "_index" : "rgw-gold-ee5863d6",
        "_type" : "object",
        "_id" : "34137443-8592-48d9-8ca7-160255d52ade.34137.1:object1:null",
        "_score" : 1.0,
        "_source" : {
          "bucket" : "testbucket123",
          "name" : "object1",
          "instance" : "null",
          "versioned_epoch" : 0,
          "owner" : {
            "id" : "user1",
            "display_name" : "user1"
          },
          "permissions" : [
            "user1"
          ],
          "meta" : {
            "size" : 712354,
            "mtime" : "2017-05-04T12:54:16.462Z",
            "etag" : "7ac66c0f148de9519b8bd264312c4d64"
          }
        }
      }


ElasticSearch tier type configurables
-------------------------------------

* ``endpoint``

Specifies the Elasticsearch server endpoint to access

* ``num_shards`` (integer)

The number of shards that Elasticsearch will be configured with on
data sync initialization. Note that this cannot be changed after init.
Any change here requires rebuild of the Elasticsearch index and reinit
of the data sync process.

* ``num_replicas`` (integer)

The number of the replicas that Elasticsearch will be configured with
on data sync initialization.

* ``explicit_custom_meta`` (true | false)

Specifies whether all user custom metadata will be indexed, or whether
user will need to configure (at the bucket level) what custom
metadata entries should be indexed. This is false by default

* ``index_buckets_list`` (comma separated list of strings)

If empty, all buckets will be indexed. Otherwise, only buckets
specified here will be indexed. It is possible to provide bucket
prefixes (e.g., foo\*), or bucket suffixes (e.g., \*bar).

* ``approved_owners_list`` (comma separated list of strings)

If empty, buckets of all owners will be indexed (subject to other
restrictions), otherwise, only buckets owned by specified owners will
be indexed. Suffixes and prefixes can also be provided.

* ``override_index_path`` (string)

if not empty, this string will be used as the elasticsearch index
path. Otherwise the index path will be determined and generated on
sync initialization.


End user metadata queries
-------------------------

.. versionadded:: Luminous

Since the ElasticSearch cluster now stores object metadata, it is important that
the ElasticSearch endpoint is not exposed to the public and only accessible to
the cluster administrators. For exposing metadata queries to the end user itself
this poses a problem since we'd want the user to only query their metadata and
not of any other users, this would require the ElasticSearch cluster to
authenticate users in a way similar to RGW does which poses a problem.

As of Luminous RGW in the metadata master zone can now service end user
requests. This allows for not exposing the elasticsearch endpoint in public and
also solves the authentication and authorization problem since RGW itself can
authenticate the end user requests. For this purpose RGW introduces a new query
in the bucket APIs that can service elasticsearch requests. All these requests
must be sent to the metadata master zone.

Syntax
~~~~~~

Get an elasticsearch query
``````````````````````````

::

   GET /{bucket}?query={query-expr}

request params:
 - max-keys: max number of entries to return
 - marker: pagination marker

``expression := [(]<arg> <op> <value> [)][<and|or> ...]``

op is one of the following:
<, <=, ==, >=, >

For example ::

  GET /?query=name==foo

Will return all the indexed keys that user has read permission to, and
are named 'foo'.

The output will be a list of keys in XML that is similar to the S3
list buckets response.

Configure custom metadata fields
````````````````````````````````

Define which custom metadata entries should be indexed (under the
specified bucket), and what are the types of these keys. If explicit
custom metadata indexing is configured, this is needed so that rgw
will index the specified custom metadata values. Otherwise it is
needed in cases where the indexed metadata keys are of a type other
than string.

::

   POST /{bucket}?mdsearch
   x-amz-meta-search: <key [; type]> [, ...]

Multiple metadata fields must be comma separated, a type can be forced for a
field with a `;`. The currently allowed types are string(default), integer and
date

eg. if you want to index a custom object metadata x-amz-meta-year as int,
x-amz-meta-date as type date and x-amz-meta-title as string, you'd do

::

   POST /mybooks?mdsearch
   x-amz-meta-search: x-amz-meta-year;int, x-amz-meta-release-date;date, x-amz-meta-title;string


Delete custom metadata configuration
````````````````````````````````````

Delete custom metadata bucket configuration.

::

   DELETE /<bucket>?mdsearch

Get custom metadata configuration
`````````````````````````````````

Retrieve custom metadata bucket configuration.

::

   GET /<bucket>?mdsearch


.. _`Elasticsearch`: https://github.com/elastic/elasticsearch