diff options
Diffstat (limited to '')
-rw-r--r-- | doc/rados/api/index.rst | 25 | ||||
-rw-r--r-- | doc/rados/api/libcephsqlite.rst | 454 | ||||
-rw-r--r-- | doc/rados/api/librados-intro.rst | 1051 | ||||
-rw-r--r-- | doc/rados/api/librados.rst | 187 | ||||
-rw-r--r-- | doc/rados/api/libradospp.rst | 9 | ||||
-rw-r--r-- | doc/rados/api/objclass-sdk.rst | 39 | ||||
-rw-r--r-- | doc/rados/api/python.rst | 428 |
7 files changed, 2193 insertions, 0 deletions
diff --git a/doc/rados/api/index.rst b/doc/rados/api/index.rst new file mode 100644 index 000000000..5422ce871 --- /dev/null +++ b/doc/rados/api/index.rst @@ -0,0 +1,25 @@ +.. _rados api: + +=========================== + Ceph Storage Cluster APIs +=========================== + +The :term:`Ceph Storage Cluster` has a messaging layer protocol that enables +clients to interact with a :term:`Ceph Monitor` and a :term:`Ceph OSD Daemon`. +``librados`` provides this functionality to :term:`Ceph Client`\s in the form of +a library. All Ceph Clients either use ``librados`` or the same functionality +encapsulated in ``librados`` to interact with the object store. For example, +``librbd`` and ``libcephfs`` leverage this functionality. You may use +``librados`` to interact with Ceph directly (e.g., an application that talks to +Ceph, your own interface to Ceph, etc.). + + +.. toctree:: + :maxdepth: 2 + + Introduction to librados <librados-intro> + librados (C) <librados> + librados (C++) <libradospp> + librados (Python) <python> + libcephsqlite (SQLite) <libcephsqlite> + object class <objclass-sdk> diff --git a/doc/rados/api/libcephsqlite.rst b/doc/rados/api/libcephsqlite.rst new file mode 100644 index 000000000..beee4a466 --- /dev/null +++ b/doc/rados/api/libcephsqlite.rst @@ -0,0 +1,454 @@ +.. _libcephsqlite: + +================ + Ceph SQLite VFS +================ + +This `SQLite VFS`_ may be used for storing and accessing a `SQLite`_ database +backed by RADOS. This allows you to fully decentralize your database using +Ceph's object store for improved availability, accessibility, and use of +storage. + +Note what this is not: a distributed SQL engine. SQLite on RADOS can be thought +of like RBD as compared to CephFS: RBD puts a disk image on RADOS for the +purposes of exclusive access by a machine and generally does not allow parallel +access by other machines; on the other hand, CephFS allows fully distributed +access to a file system from many client mounts. SQLite on RADOS is meant to be +accessed by a single SQLite client database connection at a given time. The +database may be manipulated safely by multiple clients only in a serial fashion +controlled by RADOS locks managed by the Ceph SQLite VFS. + + +Usage +^^^^^ + +Normal unmodified applications (including the sqlite command-line toolset +binary) may load the *ceph* VFS using the `SQLite Extension Loading API`_. + +.. code:: sql + + .LOAD libcephsqlite.so + +or during the invocation of ``sqlite3`` + +.. code:: sh + + sqlite3 -cmd '.load libcephsqlite.so' + +A database file is formatted as a SQLite URI:: + + file:///<"*"poolid|poolname>:[namespace]/<dbname>?vfs=ceph + +The RADOS ``namespace`` is optional. Note the triple ``///`` in the path. The URI +authority must be empty or localhost in SQLite. Only the path part of the URI +is parsed. For this reason, the URI will not parse properly if you only use two +``//``. + +A complete example of (optionally) creating a database and opening: + +.. code:: sh + + sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///foo:bar/baz.db?vfs=ceph' + +Note you cannot specify the database file as the normal positional argument to +``sqlite3``. This is because the ``.load libcephsqlite.so`` command is applied +after opening the database, but opening the database depends on the extension +being loaded first. + +An example passing the pool integer id and no RADOS namespace: + +.. code:: sh + + sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///*2:/baz.db?vfs=ceph' + +Like other Ceph tools, the *ceph* VFS looks at some environment variables that +help with configuring which Ceph cluster to communicate with and which +credential to use. Here would be a typical configuration: + +.. code:: sh + + export CEPH_CONF=/path/to/ceph.conf + export CEPH_KEYRING=/path/to/ceph.keyring + export CEPH_ARGS='--id myclientid' + ./runmyapp + # or + sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///foo:bar/baz.db?vfs=ceph' + +The default operation would look at the standard Ceph configuration file path +using the ``client.admin`` user. + + +User +^^^^ + +The *ceph* VFS requires a user credential with read access to the monitors, the +ability to blocklist dead clients of the database, and access to the OSDs +hosting the database. This can be done with authorizations as simply as: + +.. code:: sh + + ceph auth get-or-create client.X mon 'allow r, allow command "osd blocklist" with blocklistop=add' osd 'allow rwx' + +.. note:: The terminology change from ``blacklist`` to ``blocklist``; older clusters may require using the old terms. + +You may also simplify using the ``simple-rados-client-with-blocklist`` profile: + +.. code:: sh + + ceph auth get-or-create client.X mon 'profile simple-rados-client-with-blocklist' osd 'allow rwx' + +To learn why blocklisting is necessary, see :ref:`libcephsqlite-corrupt`. + + +Page Size +^^^^^^^^^ + +SQLite allows configuring the page size prior to creating a new database. It is +advisable to increase this config to 65536 (64K) when using RADOS backed +databases to reduce the number of OSD reads/writes and thereby improve +throughput and latency. + +.. code:: sql + + PRAGMA page_size = 65536 + +You may also try other values according to your application needs but note that +64K is the max imposed by SQLite. + + +Cache +^^^^^ + +The ceph VFS does not do any caching of reads or buffering of writes. Instead, +and more appropriately, the SQLite page cache is used. You may find it is too small +for most workloads and should therefore increase it significantly: + + +.. code:: sql + + PRAGMA cache_size = 4096 + +Which will cache 4096 pages or 256MB (with 64K ``page_cache``). + + +Journal Persistence +^^^^^^^^^^^^^^^^^^^ + +By default, SQLite deletes the journal for every transaction. This can be +expensive as the *ceph* VFS must delete every object backing the journal for each +transaction. For this reason, it is much faster and simpler to ask SQLite to +**persist** the journal. In this mode, SQLite will invalidate the journal via a +write to its header. This is done as: + +.. code:: sql + + PRAGMA journal_mode = PERSIST + +The cost of this may be increased unused space according to the high-water size +of the rollback journal (based on transaction type and size). + + +Exclusive Lock Mode +^^^^^^^^^^^^^^^^^^^ + +SQLite operates in a ``NORMAL`` locking mode where each transaction requires +locking the backing database file. This can add unnecessary overhead to +transactions when you know there's only ever one user of the database at a +given time. You can have SQLite lock the database once for the duration of the +connection using: + +.. code:: sql + + PRAGMA locking_mode = EXCLUSIVE + +This can more than **halve** the time taken to perform a transaction. Keep in +mind this prevents other clients from accessing the database. + +In this locking mode, each write transaction to the database requires 3 +synchronization events: once to write to the journal, another to write to the +database file, and a final write to invalidate the journal header (in +``PERSIST`` journaling mode). + + +WAL Journal +^^^^^^^^^^^ + +The `WAL Journal Mode`_ is only available when SQLite is operating in exclusive +lock mode. This is because it requires shared memory communication with other +readers and writers when in the ``NORMAL`` locking mode. + +As with local disk databases, WAL mode may significantly reduce small +transaction latency. Testing has shown it can provide more than 50% speedup +over persisted rollback journals in exclusive locking mode. You can expect +around 150-250 transactions per second depending on size. + + +Performance Notes +^^^^^^^^^^^^^^^^^ + +The filing backend for the database on RADOS is asynchronous as much as +possible. Still, performance can be anywhere from 3x-10x slower than a local +database on SSD. Latency can be a major factor. It is advisable to be familiar +with SQL transactions and other strategies for efficient database updates. +Depending on the performance of the underlying pool, you can expect small +transactions to take up to 30 milliseconds to complete. If you use the +``EXCLUSIVE`` locking mode, it can be reduced further to 15 milliseconds per +transaction. A WAL journal in ``EXCLUSIVE`` locking mode can further reduce +this as low as ~2-5 milliseconds (or the time to complete a RADOS write; you +won't get better than that!). + +There is no limit to the size of a SQLite database on RADOS imposed by the Ceph +VFS. There are standard `SQLite Limits`_ to be aware of, notably the maximum +database size of 281 TB. Large databases may or may not be performant on Ceph. +Experimentation for your own use-case is advised. + +Be aware that read-heavy queries could take significant amounts of time as +reads are necessarily synchronous (due to the VFS API). No readahead is yet +performed by the VFS. + + +Recommended Use-Cases +^^^^^^^^^^^^^^^^^^^^^ + +The original purpose of this module was to support saving relational or large +data in RADOS which needs to span multiple objects. Many current applications +with trivial state try to use RADOS omap storage on a single object but this +cannot scale without striping data across multiple objects. Unfortunately, it +is non-trivial to design a store spanning multiple objects which is consistent +and also simple to use. SQLite can be used to bridge that gap. + + +Parallel Access +^^^^^^^^^^^^^^^ + +The VFS does not yet support concurrent readers. All database access is protected +by a single exclusive lock. + + +Export or Extract Database out of RADOS +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The database is striped on RADOS and can be extracted using the RADOS cli toolset. + +.. code:: sh + + rados --pool=foo --striper get bar.db local-bar.db + rados --pool=foo --striper get bar.db-journal local-bar.db-journal + sqlite3 local-bar.db ... + +Keep in mind the rollback journal is also striped and will need to be extracted +as well if the database was in the middle of a transaction. If you're using +WAL, that journal will need to be extracted as well. + +Keep in mind that extracting the database using the striper uses the same RADOS +locks as those used by the *ceph* VFS. However, the journal file locks are not +used by the *ceph* VFS (SQLite only locks the main database file) so there is a +potential race with other SQLite clients when extracting both files. That could +result in fetching a corrupt journal. + +Instead of manually extracting the files, it would be more advisable to use the +`SQLite Backup`_ mechanism instead. + + +Temporary Tables +^^^^^^^^^^^^^^^^ + +Temporary tables backed by the ceph VFS are not supported. The main reason for +this is that the VFS lacks context about where it should put the database, i.e. +which RADOS pool. The persistent database associated with the temporary +database is not communicated via the SQLite VFS API. + +Instead, it's suggested to attach a secondary local or `In-Memory Database`_ +and put the temporary tables there. Alternatively, you may set a connection +pragma: + +.. code:: sql + + PRAGMA temp_store=memory + + +.. _libcephsqlite-breaking-locks: + +Breaking Locks +^^^^^^^^^^^^^^ + +Access to the database file is protected by an exclusive lock on the first +object stripe of the database. If the application fails without unlocking the +database (e.g. a segmentation fault), the lock is not automatically unlocked, +even if the client connection is blocklisted afterward. Eventually, the lock +will timeout subject to the configurations:: + + cephsqlite_lock_renewal_timeout = 30000 + +The timeout is in milliseconds. Once the timeout is reached, the OSD will +expire the lock and allow clients to relock. When this occurs, the database +will be recovered by SQLite and the in-progress transaction rolled back. The +new client recovering the database will also blocklist the old client to +prevent potential database corruption from rogue writes. + +The holder of the exclusive lock on the database will periodically renew the +lock so it does not lose the lock. This is necessary for large transactions or +database connections operating in ``EXCLUSIVE`` locking mode. The lock renewal +interval is adjustable via:: + + cephsqlite_lock_renewal_interval = 2000 + +This configuration is also in units of milliseconds. + +It is possible to break the lock early if you know the client is gone for good +(e.g. blocklisted). This allows restoring database access to clients +immediately. For example: + +.. code:: sh + + $ rados --pool=foo --namespace bar lock info baz.db.0000000000000000 striper.lock + {"name":"striper.lock","type":"exclusive","tag":"","lockers":[{"name":"client.4463","cookie":"555c7208-db39-48e8-a4d7-3ba92433a41a","description":"SimpleRADOSStriper","expiration":"0.000000","addr":"127.0.0.1:0/1831418345"}]} + + $ rados --pool=foo --namespace bar lock break baz.db.0000000000000000 striper.lock client.4463 --lock-cookie 555c7208-db39-48e8-a4d7-3ba92433a41a + +.. _libcephsqlite-corrupt: + +How to Corrupt Your Database +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +There is the usual reading on `How to Corrupt Your SQLite Database`_ that you +should review before using this tool. To add to that, the most likely way you +may corrupt your database is by a rogue process transiently losing network +connectivity and then resuming its work. The exclusive RADOS lock it held will +be lost but it cannot know that immediately. Any work it might do after +regaining network connectivity could corrupt the database. + +The *ceph* VFS library defaults do not allow for this scenario to occur. The Ceph +VFS will blocklist the last owner of the exclusive lock on the database if it +detects incomplete cleanup. + +By blocklisting the old client, it's no longer possible for the old client to +resume its work on the database when it returns (subject to blocklist +expiration, 3600 seconds by default). To turn off blocklisting the prior client, change:: + + cephsqlite_blocklist_dead_locker = false + +Do NOT do this unless you know database corruption cannot result due to other +guarantees. If this config is true (the default), the *ceph* VFS will cowardly +fail if it cannot blocklist the prior instance (due to lack of authorization, +for example). + +One example where out-of-band mechanisms exist to blocklist the last dead +holder of the exclusive lock on the database is in the ``ceph-mgr``. The +monitors are made aware of the RADOS connection used for the *ceph* VFS and will +blocklist the instance during ``ceph-mgr`` failover. This prevents a zombie +``ceph-mgr`` from continuing work and potentially corrupting the database. For +this reason, it is not necessary for the *ceph* VFS to do the blocklist command +in the new instance of the ``ceph-mgr`` (but it still does so, harmlessly). + +To blocklist the *ceph* VFS manually, you may see the instance address of the +*ceph* VFS using the ``ceph_status`` SQL function: + +.. code:: sql + + SELECT ceph_status(); + +.. code:: + + {"id":788461300,"addr":"172.21.10.4:0/1472139388"} + +You may easily manipulate that information using the `JSON1 extension`_: + +.. code:: sql + + SELECT json_extract(ceph_status(), '$.addr'); + +.. code:: + + 172.21.10.4:0/3563721180 + +This is the address you would pass to the ceph blocklist command: + +.. code:: sh + + ceph osd blocklist add 172.21.10.4:0/3082314560 + + +Performance Statistics +^^^^^^^^^^^^^^^^^^^^^^ + +The *ceph* VFS provides a SQLite function, ``ceph_perf``, for querying the +performance statistics of the VFS. The data is from "performance counters" as +in other Ceph services normally queried via an admin socket. + +.. code:: sql + + SELECT ceph_perf(); + +.. code:: + + {"libcephsqlite_vfs":{"op_open":{"avgcount":2,"sum":0.150001291,"avgtime":0.075000645},"op_delete":{"avgcount":0,"sum":0.000000000,"avgtime":0.000000000},"op_access":{"avgcount":1,"sum":0.003000026,"avgtime":0.003000026},"op_fullpathname":{"avgcount":1,"sum":0.064000551,"avgtime":0.064000551},"op_currenttime":{"avgcount":0,"sum":0.000000000,"avgtime":0.000000000},"opf_close":{"avgcount":1,"sum":0.000000000,"avgtime":0.000000000},"opf_read":{"avgcount":3,"sum":0.036000310,"avgtime":0.012000103},"opf_write":{"avgcount":0,"sum":0.000000000,"avgtime":0.000000000},"opf_truncate":{"avgcount":0,"sum":0.000000000,"avgtime":0.000000000},"opf_sync":{"avgcount":0,"sum":0.000000000,"avgtime":0.000000000},"opf_filesize":{"avgcount":2,"sum":0.000000000,"avgtime":0.000000000},"opf_lock":{"avgcount":1,"sum":0.158001360,"avgtime":0.158001360},"opf_unlock":{"avgcount":1,"sum":0.101000871,"avgtime":0.101000871},"opf_checkreservedlock":{"avgcount":1,"sum":0.002000017,"avgtime":0.002000017},"opf_filecontrol":{"avgcount":4,"sum":0.000000000,"avgtime":0.000000000},"opf_sectorsize":{"avgcount":0,"sum":0.000000000,"avgtime":0.000000000},"opf_devicecharacteristics":{"avgcount":4,"sum":0.000000000,"avgtime":0.000000000}},"libcephsqlite_striper":{"update_metadata":0,"update_allocated":0,"update_size":0,"update_version":0,"shrink":0,"shrink_bytes":0,"lock":1,"unlock":1}} + +You may easily manipulate that information using the `JSON1 extension`_: + +.. code:: sql + + SELECT json_extract(ceph_perf(), '$.libcephsqlite_vfs.opf_sync.avgcount'); + +.. code:: + + 776 + +That tells you the number of times SQLite has called the xSync method of the +`SQLite IO Methods`_ of the VFS (for **all** open database connections in the +process). You could analyze the performance stats before and after a number of +queries to see the number of file system syncs required (this would just be +proportional to the number of transactions). Alternatively, you may be more +interested in the average latency to complete a write: + +.. code:: sql + + SELECT json_extract(ceph_perf(), '$.libcephsqlite_vfs.opf_write'); + +.. code:: + + {"avgcount":7873,"sum":0.675005797,"avgtime":0.000085736} + +Which would tell you there have been 7873 writes with an average +time-to-complete of 85 microseconds. That clearly shows the calls are executed +asynchronously. Returning to sync: + +.. code:: sql + + SELECT json_extract(ceph_perf(), '$.libcephsqlite_vfs.opf_sync'); + +.. code:: + + {"avgcount":776,"sum":4.802041199,"avgtime":0.006188197} + +6 milliseconds were spent on average executing a sync call. This gathers all of +the asynchronous writes as well as an asynchronous update to the size of the +striped file. + + +Debugging +^^^^^^^^^ + +Debugging libcephsqlite can be turned on via:: + + debug_cephsqlite + +If running the ``sqlite3`` command-line tool, use: + +.. code:: sh + + env CEPH_ARGS='--log_to_file true --log-file sqlite3.log --debug_cephsqlite 20 --debug_ms 1' sqlite3 ... + +This will save all the usual Ceph debugging to a file ``sqlite3.log`` for inspection. + + +.. _SQLite: https://sqlite.org/index.html +.. _SQLite VFS: https://www.sqlite.org/vfs.html +.. _SQLite Backup: https://www.sqlite.org/backup.html +.. _SQLite Limits: https://www.sqlite.org/limits.html +.. _SQLite Extension Loading API: https://sqlite.org/c3ref/load_extension.html +.. _In-Memory Database: https://www.sqlite.org/inmemorydb.html +.. _WAL Journal Mode: https://sqlite.org/wal.html +.. _How to Corrupt Your SQLite Database: https://www.sqlite.org/howtocorrupt.html +.. _JSON1 Extension: https://www.sqlite.org/json1.html +.. _SQLite IO Methods: https://www.sqlite.org/c3ref/io_methods.html diff --git a/doc/rados/api/librados-intro.rst b/doc/rados/api/librados-intro.rst new file mode 100644 index 000000000..5174188b4 --- /dev/null +++ b/doc/rados/api/librados-intro.rst @@ -0,0 +1,1051 @@ +========================== + Introduction to librados +========================== + +The :term:`Ceph Storage Cluster` provides the basic storage service that allows +:term:`Ceph` to uniquely deliver **object, block, and file storage** in one +unified system. However, you are not limited to using the RESTful, block, or +POSIX interfaces. Based upon :abbr:`RADOS (Reliable Autonomic Distributed Object +Store)`, the ``librados`` API enables you to create your own interface to the +Ceph Storage Cluster. + +The ``librados`` API enables you to interact with the two types of daemons in +the Ceph Storage Cluster: + +- The :term:`Ceph Monitor`, which maintains a master copy of the cluster map. +- The :term:`Ceph OSD Daemon` (OSD), which stores data as objects on a storage node. + +.. ditaa:: + +---------------------------------+ + | Ceph Storage Cluster Protocol | + | (librados) | + +---------------------------------+ + +---------------+ +---------------+ + | OSDs | | Monitors | + +---------------+ +---------------+ + +This guide provides a high-level introduction to using ``librados``. +Refer to :doc:`../../architecture` for additional details of the Ceph +Storage Cluster. To use the API, you need a running Ceph Storage Cluster. +See `Installation (Quick)`_ for details. + + +Step 1: Getting librados +======================== + +Your client application must bind with ``librados`` to connect to the Ceph +Storage Cluster. You must install ``librados`` and any required packages to +write applications that use ``librados``. The ``librados`` API is written in +C++, with additional bindings for C, Python, Java and PHP. + + +Getting librados for C/C++ +-------------------------- + +To install ``librados`` development support files for C/C++ on Debian/Ubuntu +distributions, execute the following: + +.. prompt:: bash $ + + sudo apt-get install librados-dev + +To install ``librados`` development support files for C/C++ on RHEL/CentOS +distributions, execute the following: + +.. prompt:: bash $ + + sudo yum install librados2-devel + +Once you install ``librados`` for developers, you can find the required +headers for C/C++ under ``/usr/include/rados``: + +.. prompt:: bash $ + + ls /usr/include/rados + + +Getting librados for Python +--------------------------- + +The ``rados`` module provides ``librados`` support to Python +applications. You may install ``python3-rados`` for Debian, Ubuntu, SLE or +openSUSE or the ``python-rados`` package for CentOS/RHEL. + +To install ``librados`` development support files for Python on Debian/Ubuntu +distributions, execute the following: + +.. prompt:: bash $ + + sudo apt-get install python3-rados + +To install ``librados`` development support files for Python on RHEL/CentOS +distributions, execute the following: + +.. prompt:: bash $ + + sudo yum install python-rados + +To install ``librados`` development support files for Python on SLE/openSUSE +distributions, execute the following: + +.. prompt:: bash $ + + sudo zypper install python3-rados + +You can find the module under ``/usr/share/pyshared`` on Debian systems, +or under ``/usr/lib/python*/site-packages`` on CentOS/RHEL systems. + + +Getting librados for Java +------------------------- + +To install ``librados`` for Java, you need to execute the following procedure: + +#. Install ``jna.jar``. For Debian/Ubuntu, execute: + + .. prompt:: bash $ + + sudo apt-get install libjna-java + + For CentOS/RHEL, execute: + + .. prompt:: bash $ + + sudo yum install jna + + The JAR files are located in ``/usr/share/java``. + +#. Clone the ``rados-java`` repository: + + .. prompt:: bash $ + + git clone --recursive https://github.com/ceph/rados-java.git + +#. Build the ``rados-java`` repository: + + .. prompt:: bash $ + + cd rados-java + ant + + The JAR file is located under ``rados-java/target``. + +#. Copy the JAR for RADOS to a common location (e.g., ``/usr/share/java``) and + ensure that it and the JNA JAR are in your JVM's classpath. For example: + + .. prompt:: bash $ + + sudo cp target/rados-0.1.3.jar /usr/share/java/rados-0.1.3.jar + sudo ln -s /usr/share/java/jna-3.2.7.jar /usr/lib/jvm/default-java/jre/lib/ext/jna-3.2.7.jar + sudo ln -s /usr/share/java/rados-0.1.3.jar /usr/lib/jvm/default-java/jre/lib/ext/rados-0.1.3.jar + +To build the documentation, execute the following: + +.. prompt:: bash $ + + ant docs + + +Getting librados for PHP +------------------------- + +To install the ``librados`` extension for PHP, you need to execute the following procedure: + +#. Install php-dev. For Debian/Ubuntu, execute: + + .. prompt:: bash $ + + sudo apt-get install php5-dev build-essential + + For CentOS/RHEL, execute: + + .. prompt:: bash $ + + sudo yum install php-devel + +#. Clone the ``phprados`` repository: + + .. prompt:: bash $ + + git clone https://github.com/ceph/phprados.git + +#. Build ``phprados``: + + .. prompt:: bash $ + + cd phprados + phpize + ./configure + make + sudo make install + +#. Enable ``phprados`` by adding the following line to ``php.ini``:: + + extension=rados.so + + +Step 2: Configuring a Cluster Handle +==================================== + +A :term:`Ceph Client`, via ``librados``, interacts directly with OSDs to store +and retrieve data. To interact with OSDs, the client app must invoke +``librados`` and connect to a Ceph Monitor. Once connected, ``librados`` +retrieves the :term:`Cluster Map` from the Ceph Monitor. When the client app +wants to read or write data, it creates an I/O context and binds to a +:term:`Pool`. The pool has an associated :term:`CRUSH rule` that defines how it +will place data in the storage cluster. Via the I/O context, the client +provides the object name to ``librados``, which takes the object name +and the cluster map (i.e., the topology of the cluster) and `computes`_ the +placement group and `OSD`_ for locating the data. Then the client application +can read or write data. The client app doesn't need to learn about the topology +of the cluster directly. + +.. ditaa:: + +--------+ Retrieves +---------------+ + | Client |------------>| Cluster Map | + +--------+ +---------------+ + | + v Writes + /-----\ + | obj | + \-----/ + | To + v + +--------+ +---------------+ + | Pool |---------->| CRUSH Rule | + +--------+ Selects +---------------+ + + +The Ceph Storage Cluster handle encapsulates the client configuration, including: + +- The `user ID`_ for ``rados_create()`` or user name for ``rados_create2()`` + (preferred). +- The :term:`cephx` authentication key +- The monitor ID and IP address +- Logging levels +- Debugging levels + +Thus, the first steps in using the cluster from your app are to 1) create +a cluster handle that your app will use to connect to the storage cluster, +and then 2) use that handle to connect. To connect to the cluster, the +app must supply a monitor address, a username and an authentication key +(cephx is enabled by default). + +.. tip:: Talking to different Ceph Storage Clusters – or to the same cluster + with different users – requires different cluster handles. + +RADOS provides a number of ways for you to set the required values. For +the monitor and encryption key settings, an easy way to handle them is to ensure +that your Ceph configuration file contains a ``keyring`` path to a keyring file +and at least one monitor address (e.g., ``mon_host``). For example:: + + [global] + mon_host = 192.168.1.1 + keyring = /etc/ceph/ceph.client.admin.keyring + +Once you create the handle, you can read a Ceph configuration file to configure +the handle. You can also pass arguments to your app and parse them with the +function for parsing command line arguments (e.g., ``rados_conf_parse_argv()``), +or parse Ceph environment variables (e.g., ``rados_conf_parse_env()``). Some +wrappers may not implement convenience methods, so you may need to implement +these capabilities. The following diagram provides a high-level flow for the +initial connection. + + +.. ditaa:: + +---------+ +---------+ + | Client | | Monitor | + +---------+ +---------+ + | | + |-----+ create | + | | cluster | + |<----+ handle | + | | + |-----+ read | + | | config | + |<----+ file | + | | + | connect | + |-------------->| + | | + |<--------------| + | connected | + | | + + +Once connected, your app can invoke functions that affect the whole cluster +with only the cluster handle. For example, once you have a cluster +handle, you can: + +- Get cluster statistics +- Use Pool Operation (exists, create, list, delete) +- Get and set the configuration + + +One of the powerful features of Ceph is the ability to bind to different pools. +Each pool may have a different number of placement groups, object replicas and +replication strategies. For example, a pool could be set up as a "hot" pool that +uses SSDs for frequently used objects or a "cold" pool that uses erasure coding. + +The main difference in the various ``librados`` bindings is between C and +the object-oriented bindings for C++, Java and Python. The object-oriented +bindings use objects to represent cluster handles, IO Contexts, iterators, +exceptions, etc. + + +C Example +--------- + +For C, creating a simple cluster handle using the ``admin`` user, configuring +it and connecting to the cluster might look something like this: + +.. code-block:: c + + #include <stdio.h> + #include <stdlib.h> + #include <string.h> + #include <rados/librados.h> + + int main (int argc, const char **argv) + { + + /* Declare the cluster handle and required arguments. */ + rados_t cluster; + char cluster_name[] = "ceph"; + char user_name[] = "client.admin"; + uint64_t flags = 0; + + /* Initialize the cluster handle with the "ceph" cluster name and the "client.admin" user */ + int err; + err = rados_create2(&cluster, cluster_name, user_name, flags); + + if (err < 0) { + fprintf(stderr, "%s: Couldn't create the cluster handle! %s\n", argv[0], strerror(-err)); + exit(EXIT_FAILURE); + } else { + printf("\nCreated a cluster handle.\n"); + } + + + /* Read a Ceph configuration file to configure the cluster handle. */ + err = rados_conf_read_file(cluster, "/etc/ceph/ceph.conf"); + if (err < 0) { + fprintf(stderr, "%s: cannot read config file: %s\n", argv[0], strerror(-err)); + exit(EXIT_FAILURE); + } else { + printf("\nRead the config file.\n"); + } + + /* Read command line arguments */ + err = rados_conf_parse_argv(cluster, argc, argv); + if (err < 0) { + fprintf(stderr, "%s: cannot parse command line arguments: %s\n", argv[0], strerror(-err)); + exit(EXIT_FAILURE); + } else { + printf("\nRead the command line arguments.\n"); + } + + /* Connect to the cluster */ + err = rados_connect(cluster); + if (err < 0) { + fprintf(stderr, "%s: cannot connect to cluster: %s\n", argv[0], strerror(-err)); + exit(EXIT_FAILURE); + } else { + printf("\nConnected to the cluster.\n"); + } + + } + +Compile your client and link to ``librados`` using ``-lrados``. For example: + +.. prompt:: bash $ + + gcc ceph-client.c -lrados -o ceph-client + + +C++ Example +----------- + +The Ceph project provides a C++ example in the ``ceph/examples/librados`` +directory. For C++, a simple cluster handle using the ``admin`` user requires +you to initialize a ``librados::Rados`` cluster handle object: + +.. code-block:: c++ + + #include <iostream> + #include <string> + #include <rados/librados.hpp> + + int main(int argc, const char **argv) + { + + int ret = 0; + + /* Declare the cluster handle and required variables. */ + librados::Rados cluster; + char cluster_name[] = "ceph"; + char user_name[] = "client.admin"; + uint64_t flags = 0; + + /* Initialize the cluster handle with the "ceph" cluster name and "client.admin" user */ + { + ret = cluster.init2(user_name, cluster_name, flags); + if (ret < 0) { + std::cerr << "Couldn't initialize the cluster handle! error " << ret << std::endl; + return EXIT_FAILURE; + } else { + std::cout << "Created a cluster handle." << std::endl; + } + } + + /* Read a Ceph configuration file to configure the cluster handle. */ + { + ret = cluster.conf_read_file("/etc/ceph/ceph.conf"); + if (ret < 0) { + std::cerr << "Couldn't read the Ceph configuration file! error " << ret << std::endl; + return EXIT_FAILURE; + } else { + std::cout << "Read the Ceph configuration file." << std::endl; + } + } + + /* Read command line arguments */ + { + ret = cluster.conf_parse_argv(argc, argv); + if (ret < 0) { + std::cerr << "Couldn't parse command line options! error " << ret << std::endl; + return EXIT_FAILURE; + } else { + std::cout << "Parsed command line options." << std::endl; + } + } + + /* Connect to the cluster */ + { + ret = cluster.connect(); + if (ret < 0) { + std::cerr << "Couldn't connect to cluster! error " << ret << std::endl; + return EXIT_FAILURE; + } else { + std::cout << "Connected to the cluster." << std::endl; + } + } + + return 0; + } + + +Compile the source; then, link ``librados`` using ``-lrados``. +For example: + +.. prompt:: bash $ + + g++ -g -c ceph-client.cc -o ceph-client.o + g++ -g ceph-client.o -lrados -o ceph-client + + + +Python Example +-------------- + +Python uses the ``admin`` id and the ``ceph`` cluster name by default, and +will read the standard ``ceph.conf`` file if the conffile parameter is +set to the empty string. The Python binding converts C++ errors +into exceptions. + + +.. code-block:: python + + import rados + + try: + cluster = rados.Rados(conffile='') + except TypeError as e: + print('Argument validation error: {}'.format(e)) + raise e + + print("Created cluster handle.") + + try: + cluster.connect() + except Exception as e: + print("connection error: {}".format(e)) + raise e + finally: + print("Connected to the cluster.") + + +Execute the example to verify that it connects to your cluster: + +.. prompt:: bash $ + + python ceph-client.py + + +Java Example +------------ + +Java requires you to specify the user ID (``admin``) or user name +(``client.admin``), and uses the ``ceph`` cluster name by default . The Java +binding converts C++-based errors into exceptions. + +.. code-block:: java + + import com.ceph.rados.Rados; + import com.ceph.rados.RadosException; + + import java.io.File; + + public class CephClient { + public static void main (String args[]){ + + try { + Rados cluster = new Rados("admin"); + System.out.println("Created cluster handle."); + + File f = new File("/etc/ceph/ceph.conf"); + cluster.confReadFile(f); + System.out.println("Read the configuration file."); + + cluster.connect(); + System.out.println("Connected to the cluster."); + + } catch (RadosException e) { + System.out.println(e.getMessage() + ": " + e.getReturnValue()); + } + } + } + + +Compile the source; then, run it. If you have copied the JAR to +``/usr/share/java`` and sym linked from your ``ext`` directory, you won't need +to specify the classpath. For example: + +.. prompt:: bash $ + + javac CephClient.java + java CephClient + + +PHP Example +------------ + +With the RADOS extension enabled in PHP you can start creating a new cluster handle very easily: + +.. code-block:: php + + <?php + + $r = rados_create(); + rados_conf_read_file($r, '/etc/ceph/ceph.conf'); + if (!rados_connect($r)) { + echo "Failed to connect to Ceph cluster"; + } else { + echo "Successfully connected to Ceph cluster"; + } + + +Save this as rados.php and run the code: + +.. prompt:: bash $ + + php rados.php + + +Step 3: Creating an I/O Context +=============================== + +Once your app has a cluster handle and a connection to a Ceph Storage Cluster, +you may create an I/O Context and begin reading and writing data. An I/O Context +binds the connection to a specific pool. The user must have appropriate +`CAPS`_ permissions to access the specified pool. For example, a user with read +access but not write access will only be able to read data. I/O Context +functionality includes: + +- Write/read data and extended attributes +- List and iterate over objects and extended attributes +- Snapshot pools, list snapshots, etc. + + +.. ditaa:: + +---------+ +---------+ +---------+ + | Client | | Monitor | | OSD | + +---------+ +---------+ +---------+ + | | | + |-----+ create | | + | | I/O | | + |<----+ context | | + | | | + | write data | | + |---------------+-------------->| + | | | + | write ack | | + |<--------------+---------------| + | | | + | write xattr | | + |---------------+-------------->| + | | | + | xattr ack | | + |<--------------+---------------| + | | | + | read data | | + |---------------+-------------->| + | | | + | read ack | | + |<--------------+---------------| + | | | + | remove data | | + |---------------+-------------->| + | | | + | remove ack | | + |<--------------+---------------| + + + +RADOS enables you to interact both synchronously and asynchronously. Once your +app has an I/O Context, read/write operations only require you to know the +object/xattr name. The CRUSH algorithm encapsulated in ``librados`` uses the +cluster map to identify the appropriate OSD. OSD daemons handle the replication, +as described in `Smart Daemons Enable Hyperscale`_. The ``librados`` library also +maps objects to placement groups, as described in `Calculating PG IDs`_. + +The following examples use the default ``data`` pool. However, you may also +use the API to list pools, ensure they exist, or create and delete pools. For +the write operations, the examples illustrate how to use synchronous mode. For +the read operations, the examples illustrate how to use asynchronous mode. + +.. important:: Use caution when deleting pools with this API. If you delete + a pool, the pool and ALL DATA in the pool will be lost. + + +C Example +--------- + + +.. code-block:: c + + #include <stdio.h> + #include <stdlib.h> + #include <string.h> + #include <rados/librados.h> + + int main (int argc, const char **argv) + { + /* + * Continued from previous C example, where cluster handle and + * connection are established. First declare an I/O Context. + */ + + rados_ioctx_t io; + char *poolname = "data"; + + err = rados_ioctx_create(cluster, poolname, &io); + if (err < 0) { + fprintf(stderr, "%s: cannot open rados pool %s: %s\n", argv[0], poolname, strerror(-err)); + rados_shutdown(cluster); + exit(EXIT_FAILURE); + } else { + printf("\nCreated I/O context.\n"); + } + + /* Write data to the cluster synchronously. */ + err = rados_write(io, "hw", "Hello World!", 12, 0); + if (err < 0) { + fprintf(stderr, "%s: Cannot write object \"hw\" to pool %s: %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nWrote \"Hello World\" to object \"hw\".\n"); + } + + char xattr[] = "en_US"; + err = rados_setxattr(io, "hw", "lang", xattr, 5); + if (err < 0) { + fprintf(stderr, "%s: Cannot write xattr to pool %s: %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nWrote \"en_US\" to xattr \"lang\" for object \"hw\".\n"); + } + + /* + * Read data from the cluster asynchronously. + * First, set up asynchronous I/O completion. + */ + rados_completion_t comp; + err = rados_aio_create_completion(NULL, NULL, NULL, &comp); + if (err < 0) { + fprintf(stderr, "%s: Could not create aio completion: %s\n", argv[0], strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nCreated AIO completion.\n"); + } + + /* Next, read data using rados_aio_read. */ + char read_res[100]; + err = rados_aio_read(io, "hw", comp, read_res, 12, 0); + if (err < 0) { + fprintf(stderr, "%s: Cannot read object. %s %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nRead object \"hw\". The contents are:\n %s \n", read_res); + } + + /* Wait for the operation to complete */ + rados_aio_wait_for_complete(comp); + + /* Release the asynchronous I/O complete handle to avoid memory leaks. */ + rados_aio_release(comp); + + + char xattr_res[100]; + err = rados_getxattr(io, "hw", "lang", xattr_res, 5); + if (err < 0) { + fprintf(stderr, "%s: Cannot read xattr. %s %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nRead xattr \"lang\" for object \"hw\". The contents are:\n %s \n", xattr_res); + } + + err = rados_rmxattr(io, "hw", "lang"); + if (err < 0) { + fprintf(stderr, "%s: Cannot remove xattr. %s %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nRemoved xattr \"lang\" for object \"hw\".\n"); + } + + err = rados_remove(io, "hw"); + if (err < 0) { + fprintf(stderr, "%s: Cannot remove object. %s %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } else { + printf("\nRemoved object \"hw\".\n"); + } + + } + + + +C++ Example +----------- + + +.. code-block:: c++ + + #include <iostream> + #include <string> + #include <rados/librados.hpp> + + int main(int argc, const char **argv) + { + + /* Continued from previous C++ example, where cluster handle and + * connection are established. First declare an I/O Context. + */ + + librados::IoCtx io_ctx; + const char *pool_name = "data"; + + { + ret = cluster.ioctx_create(pool_name, io_ctx); + if (ret < 0) { + std::cerr << "Couldn't set up ioctx! error " << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Created an ioctx for the pool." << std::endl; + } + } + + + /* Write an object synchronously. */ + { + librados::bufferlist bl; + bl.append("Hello World!"); + ret = io_ctx.write_full("hw", bl); + if (ret < 0) { + std::cerr << "Couldn't write object! error " << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Wrote new object 'hw' " << std::endl; + } + } + + + /* + * Add an xattr to the object. + */ + { + librados::bufferlist lang_bl; + lang_bl.append("en_US"); + ret = io_ctx.setxattr("hw", "lang", lang_bl); + if (ret < 0) { + std::cerr << "failed to set xattr version entry! error " + << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Set the xattr 'lang' on our object!" << std::endl; + } + } + + + /* + * Read the object back asynchronously. + */ + { + librados::bufferlist read_buf; + int read_len = 4194304; + + //Create I/O Completion. + librados::AioCompletion *read_completion = librados::Rados::aio_create_completion(); + + //Send read request. + ret = io_ctx.aio_read("hw", read_completion, &read_buf, read_len, 0); + if (ret < 0) { + std::cerr << "Couldn't start read object! error " << ret << std::endl; + exit(EXIT_FAILURE); + } + + // Wait for the request to complete, and check that it succeeded. + read_completion->wait_for_complete(); + ret = read_completion->get_return_value(); + if (ret < 0) { + std::cerr << "Couldn't read object! error " << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Read object hw asynchronously with contents.\n" + << read_buf.c_str() << std::endl; + } + } + + + /* + * Read the xattr. + */ + { + librados::bufferlist lang_res; + ret = io_ctx.getxattr("hw", "lang", lang_res); + if (ret < 0) { + std::cerr << "failed to get xattr version entry! error " + << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Got the xattr 'lang' from object hw!" + << lang_res.c_str() << std::endl; + } + } + + + /* + * Remove the xattr. + */ + { + ret = io_ctx.rmxattr("hw", "lang"); + if (ret < 0) { + std::cerr << "Failed to remove xattr! error " + << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Removed the xattr 'lang' from our object!" << std::endl; + } + } + + /* + * Remove the object. + */ + { + ret = io_ctx.remove("hw"); + if (ret < 0) { + std::cerr << "Couldn't remove object! error " << ret << std::endl; + exit(EXIT_FAILURE); + } else { + std::cout << "Removed object 'hw'." << std::endl; + } + } + } + + + +Python Example +-------------- + +.. code-block:: python + + print("\n\nI/O Context and Object Operations") + print("=================================") + + print("\nCreating a context for the 'data' pool") + if not cluster.pool_exists('data'): + raise RuntimeError('No data pool exists') + ioctx = cluster.open_ioctx('data') + + print("\nWriting object 'hw' with contents 'Hello World!' to pool 'data'.") + ioctx.write("hw", b"Hello World!") + print("Writing XATTR 'lang' with value 'en_US' to object 'hw'") + ioctx.set_xattr("hw", "lang", b"en_US") + + + print("\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool + 'data'.") + ioctx.write("bm", b"Bonjour tout le monde!") + print("Writing XATTR 'lang' with value 'fr_FR' to object 'bm'") + ioctx.set_xattr("bm", "lang", b"fr_FR") + + print("\nContents of object 'hw'\n------------------------") + print(ioctx.read("hw")) + + print("\n\nGetting XATTR 'lang' from object 'hw'") + print(ioctx.get_xattr("hw", "lang")) + + print("\nContents of object 'bm'\n------------------------") + print(ioctx.read("bm")) + + print("\n\nGetting XATTR 'lang' from object 'bm'") + print(ioctx.get_xattr("bm", "lang")) + + + print("\nRemoving object 'hw'") + ioctx.remove_object("hw") + + print("Removing object 'bm'") + ioctx.remove_object("bm") + + +Java-Example +------------ + +.. code-block:: java + + import com.ceph.rados.Rados; + import com.ceph.rados.RadosException; + + import java.io.File; + import com.ceph.rados.IoCTX; + + public class CephClient { + public static void main (String args[]){ + + try { + Rados cluster = new Rados("admin"); + System.out.println("Created cluster handle."); + + File f = new File("/etc/ceph/ceph.conf"); + cluster.confReadFile(f); + System.out.println("Read the configuration file."); + + cluster.connect(); + System.out.println("Connected to the cluster."); + + IoCTX io = cluster.ioCtxCreate("data"); + + String oidone = "hw"; + String contentone = "Hello World!"; + io.write(oidone, contentone); + + String oidtwo = "bm"; + String contenttwo = "Bonjour tout le monde!"; + io.write(oidtwo, contenttwo); + + String[] objects = io.listObjects(); + for (String object: objects) + System.out.println(object); + + io.remove(oidone); + io.remove(oidtwo); + + cluster.ioCtxDestroy(io); + + } catch (RadosException e) { + System.out.println(e.getMessage() + ": " + e.getReturnValue()); + } + } + } + + +PHP Example +----------- + +.. code-block:: php + + <?php + + $io = rados_ioctx_create($r, "mypool"); + rados_write_full($io, "oidOne", "mycontents"); + rados_remove("oidOne"); + rados_ioctx_destroy($io); + + +Step 4: Closing Sessions +======================== + +Once your app finishes with the I/O Context and cluster handle, the app should +close the connection and shutdown the handle. For asynchronous I/O, the app +should also ensure that pending asynchronous operations have completed. + + +C Example +--------- + +.. code-block:: c + + rados_ioctx_destroy(io); + rados_shutdown(cluster); + + +C++ Example +----------- + +.. code-block:: c++ + + io_ctx.close(); + cluster.shutdown(); + + +Java Example +-------------- + +.. code-block:: java + + cluster.ioCtxDestroy(io); + cluster.shutDown(); + + +Python Example +-------------- + +.. code-block:: python + + print("\nClosing the connection.") + ioctx.close() + + print("Shutting down the handle.") + cluster.shutdown() + +PHP Example +----------- + +.. code-block:: php + + rados_shutdown($r); + + + +.. _user ID: ../../operations/user-management#command-line-usage +.. _CAPS: ../../operations/user-management#authorization-capabilities +.. _Installation (Quick): ../../../start +.. _Smart Daemons Enable Hyperscale: ../../../architecture#smart-daemons-enable-hyperscale +.. _Calculating PG IDs: ../../../architecture#calculating-pg-ids +.. _computes: ../../../architecture#calculating-pg-ids +.. _OSD: ../../../architecture#mapping-pgs-to-osds diff --git a/doc/rados/api/librados.rst b/doc/rados/api/librados.rst new file mode 100644 index 000000000..3e202bd4b --- /dev/null +++ b/doc/rados/api/librados.rst @@ -0,0 +1,187 @@ +============== + Librados (C) +============== + +.. highlight:: c + +`librados` provides low-level access to the RADOS service. For an +overview of RADOS, see :doc:`../../architecture`. + + +Example: connecting and writing an object +========================================= + +To use `Librados`, you instantiate a :c:type:`rados_t` variable (a cluster handle) and +call :c:func:`rados_create()` with a pointer to it:: + + int err; + rados_t cluster; + + err = rados_create(&cluster, NULL); + if (err < 0) { + fprintf(stderr, "%s: cannot create a cluster handle: %s\n", argv[0], strerror(-err)); + exit(1); + } + +Then you configure your :c:type:`rados_t` to connect to your cluster, +either by setting individual values (:c:func:`rados_conf_set()`), +using a configuration file (:c:func:`rados_conf_read_file()`), using +command line options (:c:func:`rados_conf_parse_argv`), or an +environment variable (:c:func:`rados_conf_parse_env()`):: + + err = rados_conf_read_file(cluster, "/path/to/myceph.conf"); + if (err < 0) { + fprintf(stderr, "%s: cannot read config file: %s\n", argv[0], strerror(-err)); + exit(1); + } + +Once the cluster handle is configured, you can connect to the cluster with :c:func:`rados_connect()`:: + + err = rados_connect(cluster); + if (err < 0) { + fprintf(stderr, "%s: cannot connect to cluster: %s\n", argv[0], strerror(-err)); + exit(1); + } + +Then you open an "IO context", a :c:type:`rados_ioctx_t`, with :c:func:`rados_ioctx_create()`:: + + rados_ioctx_t io; + char *poolname = "mypool"; + + err = rados_ioctx_create(cluster, poolname, &io); + if (err < 0) { + fprintf(stderr, "%s: cannot open rados pool %s: %s\n", argv[0], poolname, strerror(-err)); + rados_shutdown(cluster); + exit(1); + } + +Note that the pool you try to access must exist. + +Then you can use the RADOS data manipulation functions, for example +write into an object called ``greeting`` with +:c:func:`rados_write_full()`:: + + err = rados_write_full(io, "greeting", "hello", 5); + if (err < 0) { + fprintf(stderr, "%s: cannot write pool %s: %s\n", argv[0], poolname, strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } + +In the end, you will want to close your IO context and connection to RADOS with :c:func:`rados_ioctx_destroy()` and :c:func:`rados_shutdown()`:: + + rados_ioctx_destroy(io); + rados_shutdown(cluster); + + +Asynchronous IO +=============== + +When doing lots of IO, you often don't need to wait for one operation +to complete before starting the next one. `Librados` provides +asynchronous versions of several operations: + +* :c:func:`rados_aio_write` +* :c:func:`rados_aio_append` +* :c:func:`rados_aio_write_full` +* :c:func:`rados_aio_read` + +For each operation, you must first create a +:c:type:`rados_completion_t` that represents what to do when the +operation is safe or complete by calling +:c:func:`rados_aio_create_completion`. If you don't need anything +special to happen, you can pass NULL:: + + rados_completion_t comp; + err = rados_aio_create_completion(NULL, NULL, NULL, &comp); + if (err < 0) { + fprintf(stderr, "%s: could not create aio completion: %s\n", argv[0], strerror(-err)); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } + +Now you can call any of the aio operations, and wait for it to +be in memory or on disk on all replicas:: + + err = rados_aio_write(io, "foo", comp, "bar", 3, 0); + if (err < 0) { + fprintf(stderr, "%s: could not schedule aio write: %s\n", argv[0], strerror(-err)); + rados_aio_release(comp); + rados_ioctx_destroy(io); + rados_shutdown(cluster); + exit(1); + } + rados_aio_wait_for_complete(comp); // in memory + rados_aio_wait_for_safe(comp); // on disk + +Finally, we need to free the memory used by the completion with :c:func:`rados_aio_release`:: + + rados_aio_release(comp); + +You can use the callbacks to tell your application when writes are +durable, or when read buffers are full. For example, if you wanted to +measure the latency of each operation when appending to several +objects, you could schedule several writes and store the ack and +commit time in the corresponding callback, then wait for all of them +to complete using :c:func:`rados_aio_flush` before analyzing the +latencies:: + + typedef struct { + struct timeval start; + struct timeval ack_end; + struct timeval commit_end; + } req_duration; + + void ack_callback(rados_completion_t comp, void *arg) { + req_duration *dur = (req_duration *) arg; + gettimeofday(&dur->ack_end, NULL); + } + + void commit_callback(rados_completion_t comp, void *arg) { + req_duration *dur = (req_duration *) arg; + gettimeofday(&dur->commit_end, NULL); + } + + int output_append_latency(rados_ioctx_t io, const char *data, size_t len, size_t num_writes) { + req_duration times[num_writes]; + rados_completion_t comps[num_writes]; + for (size_t i = 0; i < num_writes; ++i) { + gettimeofday(×[i].start, NULL); + int err = rados_aio_create_completion((void*) ×[i], ack_callback, commit_callback, &comps[i]); + if (err < 0) { + fprintf(stderr, "Error creating rados completion: %s\n", strerror(-err)); + return err; + } + char obj_name[100]; + snprintf(obj_name, sizeof(obj_name), "foo%ld", (unsigned long)i); + err = rados_aio_append(io, obj_name, comps[i], data, len); + if (err < 0) { + fprintf(stderr, "Error from rados_aio_append: %s", strerror(-err)); + return err; + } + } + // wait until all requests finish *and* the callbacks complete + rados_aio_flush(io); + // the latencies can now be analyzed + printf("Request # | Ack latency (s) | Commit latency (s)\n"); + for (size_t i = 0; i < num_writes; ++i) { + // don't forget to free the completions + rados_aio_release(comps[i]); + struct timeval ack_lat, commit_lat; + timersub(×[i].ack_end, ×[i].start, &ack_lat); + timersub(×[i].commit_end, ×[i].start, &commit_lat); + printf("%9ld | %8ld.%06ld | %10ld.%06ld\n", (unsigned long) i, ack_lat.tv_sec, ack_lat.tv_usec, commit_lat.tv_sec, commit_lat.tv_usec); + } + return 0; + } + +Note that all the :c:type:`rados_completion_t` must be freed with :c:func:`rados_aio_release` to avoid leaking memory. + + +API calls +========= + + .. autodoxygenfile:: rados_types.h + .. autodoxygenfile:: librados.h diff --git a/doc/rados/api/libradospp.rst b/doc/rados/api/libradospp.rst new file mode 100644 index 000000000..08483c8d4 --- /dev/null +++ b/doc/rados/api/libradospp.rst @@ -0,0 +1,9 @@ +================== + LibradosPP (C++) +================== + +.. note:: The librados C++ API is not guaranteed to be API+ABI stable + between major releases. All applications using the librados C++ API must + be recompiled and relinked against a specific Ceph release. + +.. todo:: write me! diff --git a/doc/rados/api/objclass-sdk.rst b/doc/rados/api/objclass-sdk.rst new file mode 100644 index 000000000..90b8eb018 --- /dev/null +++ b/doc/rados/api/objclass-sdk.rst @@ -0,0 +1,39 @@ +.. _`rados-objclass-api-sdk`: + +=========================== +SDK for Ceph Object Classes +=========================== + +`Ceph` can be extended by creating shared object classes called `Ceph Object +Classes`. The existing framework to build these object classes has dependencies +on the internal functionality of `Ceph`, which restricts users to build object +classes within the tree. The aim of this project is to create an independent +object class interface, which can be used to build object classes outside the +`Ceph` tree. This allows us to have two types of object classes, 1) those that +have in-tree dependencies and reside in the tree and 2) those that can make use +of the `Ceph Object Class SDK framework` and can be built outside of the `Ceph` +tree because they do not depend on any internal implementation of `Ceph`. This +project decouples object class development from Ceph and encourages creation +and distribution of object classes as packages. + +In order to demonstrate the use of this framework, we have provided an example +called ``cls_sdk``, which is a very simple object class that makes use of the +SDK framework. This object class resides in the ``src/cls`` directory. + +Installing objclass.h +--------------------- + +The object class interface that enables out-of-tree development of object +classes resides in ``src/include/rados/`` and gets installed with `Ceph` +installation. After running ``make install``, you should be able to see it +in ``<prefix>/include/rados``. :: + + ls /usr/local/include/rados + +Using the SDK example +--------------------- + +The ``cls_sdk`` object class resides in ``src/cls/sdk/``. This gets built and +loaded into Ceph, with the Ceph build process. You can run the +``ceph_test_cls_sdk`` unittest, which resides in ``src/test/cls_sdk/``, +to test this class. diff --git a/doc/rados/api/python.rst b/doc/rados/api/python.rst new file mode 100644 index 000000000..346653a3d --- /dev/null +++ b/doc/rados/api/python.rst @@ -0,0 +1,428 @@ +=================== + Librados (Python) +=================== + +The ``rados`` module is a thin Python wrapper for ``librados``. + +Installation +============ + +To install Python libraries for Ceph, see `Getting librados for Python`_. + + +Getting Started +=============== + +You can create your own Ceph client using Python. The following tutorial will +show you how to import the Ceph Python module, connect to a Ceph cluster, and +perform object operations as a ``client.admin`` user. + +.. note:: To use the Ceph Python bindings, you must have access to a + running Ceph cluster. To set one up quickly, see `Getting Started`_. + +First, create a Python source file for your Ceph client. + +.. prompt:: bash + + vim client.py + + +Import the Module +----------------- + +To use the ``rados`` module, import it into your source file. + +.. code-block:: python + :linenos: + + import rados + + +Configure a Cluster Handle +-------------------------- + +Before connecting to the Ceph Storage Cluster, create a cluster handle. By +default, the cluster handle assumes a cluster named ``ceph`` (i.e., the default +for deployment tools, and our Getting Started guides too), and a +``client.admin`` user name. You may change these defaults to suit your needs. + +To connect to the Ceph Storage Cluster, your application needs to know where to +find the Ceph Monitor. Provide this information to your application by +specifying the path to your Ceph configuration file, which contains the location +of the initial Ceph monitors. + +.. code-block:: python + :linenos: + + import rados, sys + + #Create Handle Examples. + cluster = rados.Rados(conffile='ceph.conf') + cluster = rados.Rados(conffile=sys.argv[1]) + cluster = rados.Rados(conffile = 'ceph.conf', conf = dict (keyring = '/path/to/keyring')) + +Ensure that the ``conffile`` argument provides the path and file name of your +Ceph configuration file. You may use the ``sys`` module to avoid hard-coding the +Ceph configuration path and file name. + +Your Python client also requires a client keyring. For this example, we use the +``client.admin`` key by default. If you would like to specify the keyring when +creating the cluster handle, you may use the ``conf`` argument. Alternatively, +you may specify the keyring path in your Ceph configuration file. For example, +you may add something like the following line to your Ceph configuration file:: + + keyring = /path/to/ceph.client.admin.keyring + +For additional details on modifying your configuration via Python, see `Configuration`_. + + +Connect to the Cluster +---------------------- + +Once you have a cluster handle configured, you may connect to the cluster. +With a connection to the cluster, you may execute methods that return +information about the cluster. + +.. code-block:: python + :linenos: + :emphasize-lines: 7 + + import rados, sys + + cluster = rados.Rados(conffile='ceph.conf') + print("\nlibrados version: {}".format(str(cluster.version()))) + print("Will attempt to connect to: {}".format(str(cluster.conf_get('mon host')))) + + cluster.connect() + print("\nCluster ID: {}".format(cluster.get_fsid())) + + print("\n\nCluster Statistics") + print("==================") + cluster_stats = cluster.get_cluster_stats() + + for key, value in cluster_stats.items(): + print(key, value) + + +By default, Ceph authentication is ``on``. Your application will need to know +the location of the keyring. The ``python-ceph`` module doesn't have the default +location, so you need to specify the keyring path. The easiest way to specify +the keyring is to add it to the Ceph configuration file. The following Ceph +configuration file example uses the ``client.admin`` keyring. + +.. code-block:: ini + :linenos: + + [global] + # ... elided configuration + keyring = /path/to/keyring/ceph.client.admin.keyring + + +Manage Pools +------------ + +When connected to the cluster, the ``Rados`` API allows you to manage pools. You +can list pools, check for the existence of a pool, create a pool and delete a +pool. + +.. code-block:: python + :linenos: + :emphasize-lines: 6, 13, 18, 25 + + print("\n\nPool Operations") + print("===============") + + print("\nAvailable Pools") + print("----------------") + pools = cluster.list_pools() + + for pool in pools: + print(pool) + + print("\nCreate 'test' Pool") + print("------------------") + cluster.create_pool('test') + + print("\nPool named 'test' exists: {}".format(str(cluster.pool_exists('test')))) + print("\nVerify 'test' Pool Exists") + print("-------------------------") + pools = cluster.list_pools() + + for pool in pools: + print(pool) + + print("\nDelete 'test' Pool") + print("------------------") + cluster.delete_pool('test') + print("\nPool named 'test' exists: {}".format(str(cluster.pool_exists('test')))) + + +Input/Output Context +-------------------- + +Reading from and writing to the Ceph Storage Cluster requires an input/output +context (ioctx). You can create an ioctx with the ``open_ioctx()`` or +``open_ioctx2()`` method of the ``Rados`` class. The ``ioctx_name`` parameter +is the name of the pool and ``pool_id`` is the ID of the pool you wish to use. + +.. code-block:: python + :linenos: + + ioctx = cluster.open_ioctx('data') + + +or + +.. code-block:: python + :linenos: + + ioctx = cluster.open_ioctx2(pool_id) + + +Once you have an I/O context, you can read/write objects, extended attributes, +and perform a number of other operations. After you complete operations, ensure +that you close the connection. For example: + +.. code-block:: python + :linenos: + + print("\nClosing the connection.") + ioctx.close() + + +Writing, Reading and Removing Objects +------------------------------------- + +Once you create an I/O context, you can write objects to the cluster. If you +write to an object that doesn't exist, Ceph creates it. If you write to an +object that exists, Ceph overwrites it (except when you specify a range, and +then it only overwrites the range). You may read objects (and object ranges) +from the cluster. You may also remove objects from the cluster. For example: + +.. code-block:: python + :linenos: + :emphasize-lines: 2, 5, 8 + + print("\nWriting object 'hw' with contents 'Hello World!' to pool 'data'.") + ioctx.write_full("hw", "Hello World!") + + print("\n\nContents of object 'hw'\n------------------------\n") + print(ioctx.read("hw")) + + print("\nRemoving object 'hw'") + ioctx.remove_object("hw") + + +Writing and Reading XATTRS +-------------------------- + +Once you create an object, you can write extended attributes (XATTRs) to +the object and read XATTRs from the object. For example: + +.. code-block:: python + :linenos: + :emphasize-lines: 2, 5 + + print("\n\nWriting XATTR 'lang' with value 'en_US' to object 'hw'") + ioctx.set_xattr("hw", "lang", "en_US") + + print("\n\nGetting XATTR 'lang' from object 'hw'\n") + print(ioctx.get_xattr("hw", "lang")) + + +Listing Objects +--------------- + +If you want to examine the list of objects in a pool, you may +retrieve the list of objects and iterate over them with the object iterator. +For example: + +.. code-block:: python + :linenos: + :emphasize-lines: 1, 6, 7, 13 + + object_iterator = ioctx.list_objects() + + while True : + + try : + rados_object = object_iterator.__next__() + print("Object contents = {}".format(rados_object.read())) + + except StopIteration : + break + + # Or alternatively + [print("Object contents = {}".format(obj.read())) for obj in ioctx.list_objects()] + +The ``Object`` class provides a file-like interface to an object, allowing +you to read and write content and extended attributes. Object operations using +the I/O context provide additional functionality and asynchronous capabilities. + + +Cluster Handle API +================== + +The ``Rados`` class provides an interface into the Ceph Storage Daemon. + + +Configuration +------------- + +The ``Rados`` class provides methods for getting and setting configuration +values, reading the Ceph configuration file, and parsing arguments. You +do not need to be connected to the Ceph Storage Cluster to invoke the following +methods. See `Storage Cluster Configuration`_ for details on settings. + +.. currentmodule:: rados +.. automethod:: Rados.conf_get(option) +.. automethod:: Rados.conf_set(option, val) +.. automethod:: Rados.conf_read_file(path=None) +.. automethod:: Rados.conf_parse_argv(args) +.. automethod:: Rados.version() + + +Connection Management +--------------------- + +Once you configure your cluster handle, you may connect to the cluster, check +the cluster ``fsid``, retrieve cluster statistics, and disconnect (shutdown) +from the cluster. You may also assert that the cluster handle is in a particular +state (e.g., "configuring", "connecting", etc.). + +.. automethod:: Rados.connect(timeout=0) +.. automethod:: Rados.shutdown() +.. automethod:: Rados.get_fsid() +.. automethod:: Rados.get_cluster_stats() + +.. documented manually because it raises warnings because of *args usage in the +.. signature + +.. py:class:: Rados + + .. py:method:: require_state(*args) + + Checks if the Rados object is in a special state + + :param args: Any number of states to check as separate arguments + :raises: :class:`RadosStateError` + + +Pool Operations +--------------- + +To use pool operation methods, you must connect to the Ceph Storage Cluster +first. You may list the available pools, create a pool, check to see if a pool +exists, and delete a pool. + +.. automethod:: Rados.list_pools() +.. automethod:: Rados.create_pool(pool_name, crush_rule=None) +.. automethod:: Rados.pool_exists() +.. automethod:: Rados.delete_pool(pool_name) + + +CLI Commands +------------ + +The Ceph CLI command is internally using the following librados Python binding methods. + +In order to send a command, choose the correct method and choose the correct target. + +.. automethod:: Rados.mon_command +.. automethod:: Rados.osd_command +.. automethod:: Rados.mgr_command +.. automethod:: Rados.pg_command + + +Input/Output Context API +======================== + +To write data to and read data from the Ceph Object Store, you must create +an Input/Output context (ioctx). The `Rados` class provides `open_ioctx()` +and `open_ioctx2()` methods. The remaining ``ioctx`` operations involve +invoking methods of the `Ioctx` and other classes. + +.. automethod:: Rados.open_ioctx(ioctx_name) +.. automethod:: Ioctx.require_ioctx_open() +.. automethod:: Ioctx.get_stats() +.. automethod:: Ioctx.get_last_version() +.. automethod:: Ioctx.close() + + +.. Pool Snapshots +.. -------------- + +.. The Ceph Storage Cluster allows you to make a snapshot of a pool's state. +.. Whereas, basic pool operations only require a connection to the cluster, +.. snapshots require an I/O context. + +.. Ioctx.create_snap(self, snap_name) +.. Ioctx.list_snaps(self) +.. SnapIterator.next(self) +.. Snap.get_timestamp(self) +.. Ioctx.lookup_snap(self, snap_name) +.. Ioctx.remove_snap(self, snap_name) + +.. not published. This doesn't seem ready yet. + +Object Operations +----------------- + +The Ceph Storage Cluster stores data as objects. You can read and write objects +synchronously or asynchronously. You can read and write from offsets. An object +has a name (or key) and data. + + +.. automethod:: Ioctx.aio_write(object_name, to_write, offset=0, oncomplete=None, onsafe=None) +.. automethod:: Ioctx.aio_write_full(object_name, to_write, oncomplete=None, onsafe=None) +.. automethod:: Ioctx.aio_append(object_name, to_append, oncomplete=None, onsafe=None) +.. automethod:: Ioctx.write(key, data, offset=0) +.. automethod:: Ioctx.write_full(key, data) +.. automethod:: Ioctx.aio_flush() +.. automethod:: Ioctx.set_locator_key(loc_key) +.. automethod:: Ioctx.aio_read(object_name, length, offset, oncomplete) +.. automethod:: Ioctx.read(key, length=8192, offset=0) +.. automethod:: Ioctx.stat(key) +.. automethod:: Ioctx.trunc(key, size) +.. automethod:: Ioctx.remove_object(key) + + +Object Extended Attributes +-------------------------- + +You may set extended attributes (XATTRs) on an object. You can retrieve a list +of objects or XATTRs and iterate over them. + +.. automethod:: Ioctx.set_xattr(key, xattr_name, xattr_value) +.. automethod:: Ioctx.get_xattrs(oid) +.. automethod:: XattrIterator.__next__() +.. automethod:: Ioctx.get_xattr(key, xattr_name) +.. automethod:: Ioctx.rm_xattr(key, xattr_name) + + + +Object Interface +================ + +From an I/O context, you can retrieve a list of objects from a pool and iterate +over them. The object interface provide makes each object look like a file, and +you may perform synchronous operations on the objects. For asynchronous +operations, you should use the I/O context methods. + +.. automethod:: Ioctx.list_objects() +.. automethod:: ObjectIterator.__next__() +.. automethod:: Object.read(length = 1024*1024) +.. automethod:: Object.write(string_to_write) +.. automethod:: Object.get_xattrs() +.. automethod:: Object.get_xattr(xattr_name) +.. automethod:: Object.set_xattr(xattr_name, xattr_value) +.. automethod:: Object.rm_xattr(xattr_name) +.. automethod:: Object.stat() +.. automethod:: Object.remove() + + + + +.. _Getting Started: ../../../start +.. _Storage Cluster Configuration: ../../configuration +.. _Getting librados for Python: ../librados-intro#getting-librados-for-python |