summaryrefslogtreecommitdiffstats
path: root/doc/radosgw/elastic-sync-module.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/radosgw/elastic-sync-module.rst')
-rw-r--r--doc/radosgw/elastic-sync-module.rst181
1 files changed, 181 insertions, 0 deletions
diff --git a/doc/radosgw/elastic-sync-module.rst b/doc/radosgw/elastic-sync-module.rst
new file mode 100644
index 000000000..60c806e87
--- /dev/null
+++ b/doc/radosgw/elastic-sync-module.rst
@@ -0,0 +1,181 @@
+=========================
+ElasticSearch Sync Module
+=========================
+
+.. versionadded:: Kraken
+
+.. note::
+ As of 31 May 2020, only Elasticsearch 6 and lower are supported. ElasticSearch 7 is not supported.
+
+This sync module writes the metadata from other zones to `ElasticSearch`_. As of
+luminous this is a json of data fields we currently store in ElasticSearch.
+
+::
+
+ {
+ "_index" : "rgw-gold-ee5863d6",
+ "_type" : "object",
+ "_id" : "34137443-8592-48d9-8ca7-160255d52ade.34137.1:object1:null",
+ "_score" : 1.0,
+ "_source" : {
+ "bucket" : "testbucket123",
+ "name" : "object1",
+ "instance" : "null",
+ "versioned_epoch" : 0,
+ "owner" : {
+ "id" : "user1",
+ "display_name" : "user1"
+ },
+ "permissions" : [
+ "user1"
+ ],
+ "meta" : {
+ "size" : 712354,
+ "mtime" : "2017-05-04T12:54:16.462Z",
+ "etag" : "7ac66c0f148de9519b8bd264312c4d64"
+ }
+ }
+ }
+
+
+
+ElasticSearch tier type configurables
+-------------------------------------
+
+* ``endpoint``
+
+Specifies the Elasticsearch server endpoint to access
+
+* ``num_shards`` (integer)
+
+The number of shards that Elasticsearch will be configured with on
+data sync initialization. Note that this cannot be changed after init.
+Any change here requires rebuild of the Elasticsearch index and reinit
+of the data sync process.
+
+* ``num_replicas`` (integer)
+
+The number of the replicas that Elasticsearch will be configured with
+on data sync initialization.
+
+* ``explicit_custom_meta`` (true | false)
+
+Specifies whether all user custom metadata will be indexed, or whether
+user will need to configure (at the bucket level) what custom
+metadata entries should be indexed. This is false by default
+
+* ``index_buckets_list`` (comma separated list of strings)
+
+If empty, all buckets will be indexed. Otherwise, only buckets
+specified here will be indexed. It is possible to provide bucket
+prefixes (e.g., foo\*), or bucket suffixes (e.g., \*bar).
+
+* ``approved_owners_list`` (comma separated list of strings)
+
+If empty, buckets of all owners will be indexed (subject to other
+restrictions), otherwise, only buckets owned by specified owners will
+be indexed. Suffixes and prefixes can also be provided.
+
+* ``override_index_path`` (string)
+
+if not empty, this string will be used as the elasticsearch index
+path. Otherwise the index path will be determined and generated on
+sync initialization.
+
+
+End user metadata queries
+-------------------------
+
+.. versionadded:: Luminous
+
+Since the ElasticSearch cluster now stores object metadata, it is important that
+the ElasticSearch endpoint is not exposed to the public and only accessible to
+the cluster administrators. For exposing metadata queries to the end user itself
+this poses a problem since we'd want the user to only query their metadata and
+not of any other users, this would require the ElasticSearch cluster to
+authenticate users in a way similar to RGW does which poses a problem.
+
+As of Luminous RGW in the metadata master zone can now service end user
+requests. This allows for not exposing the elasticsearch endpoint in public and
+also solves the authentication and authorization problem since RGW itself can
+authenticate the end user requests. For this purpose RGW introduces a new query
+in the bucket APIs that can service elasticsearch requests. All these requests
+must be sent to the metadata master zone.
+
+Syntax
+~~~~~~
+
+Get an elasticsearch query
+``````````````````````````
+
+::
+
+ GET /{bucket}?query={query-expr}
+
+request params:
+ - max-keys: max number of entries to return
+ - marker: pagination marker
+
+``expression := [(]<arg> <op> <value> [)][<and|or> ...]``
+
+op is one of the following:
+<, <=, ==, >=, >
+
+For example ::
+
+ GET /?query=name==foo
+
+Will return all the indexed keys that user has read permission to, and
+are named 'foo'.
+
+The output will be a list of keys in XML that is similar to the S3
+list buckets response.
+
+Configure custom metadata fields
+````````````````````````````````
+
+Define which custom metadata entries should be indexed (under the
+specified bucket), and what are the types of these keys. If explicit
+custom metadata indexing is configured, this is needed so that rgw
+will index the specified custom metadata values. Otherwise it is
+needed in cases where the indexed metadata keys are of a type other
+than string.
+
+::
+
+ POST /{bucket}?mdsearch
+ x-amz-meta-search: <key [; type]> [, ...]
+
+Multiple metadata fields must be comma separated, a type can be forced for a
+field with a `;`. The currently allowed types are string(default), integer and
+date
+
+eg. if you want to index a custom object metadata x-amz-meta-year as int,
+x-amz-meta-date as type date and x-amz-meta-title as string, you'd do
+
+::
+
+ POST /mybooks?mdsearch
+ x-amz-meta-search: x-amz-meta-year;int, x-amz-meta-release-date;date, x-amz-meta-title;string
+
+
+Delete custom metadata configuration
+````````````````````````````````````
+
+Delete custom metadata bucket configuration.
+
+::
+
+ DELETE /<bucket>?mdsearch
+
+Get custom metadata configuration
+`````````````````````````````````
+
+Retrieve custom metadata bucket configuration.
+
+::
+
+ GET /<bucket>?mdsearch
+
+
+.. _`Elasticsearch`: https://github.com/elastic/elasticsearch