diff options
Diffstat (limited to 'doc/dev/developer_guide/tests-integration-tests.rst')
-rw-r--r-- | doc/dev/developer_guide/tests-integration-tests.rst | 522 |
1 files changed, 522 insertions, 0 deletions
diff --git a/doc/dev/developer_guide/tests-integration-tests.rst b/doc/dev/developer_guide/tests-integration-tests.rst new file mode 100644 index 000000000..c8e6dbcd4 --- /dev/null +++ b/doc/dev/developer_guide/tests-integration-tests.rst @@ -0,0 +1,522 @@ +.. _testing-integration-tests: + +Testing - Integration Tests +=========================== + +Ceph has two types of tests: :ref:`make check <make-check>` tests and integration tests. +When a test requires multiple machines, root access or lasts for a +longer time (for example, to simulate a realistic Ceph deployment), it +is deemed to be an integration test. Integration tests are organized into +"suites", which are defined in the `ceph/qa sub-directory`_ and run with +the ``teuthology-suite`` command. + +The ``teuthology-suite`` command is part of the `teuthology framework`_. +In the sections that follow we attempt to provide a detailed introduction +to that framework from the perspective of a beginning Ceph developer. + +Teuthology consumes packages +---------------------------- + +It may take some time to understand the significance of this fact, but it +is `very` significant. It means that automated tests can be conducted on +multiple platforms using the same packages (RPM, DEB) that can be +installed on any machine running those platforms. + +Teuthology has a `list of platforms that it supports +<https://github.com/ceph/ceph/tree/master/qa/distros/supported>`_ (as +of September 2020 the list consisted of "RHEL/CentOS 8" and "Ubuntu 18.04"). It +expects to be provided pre-built Ceph packages for these platforms. +Teuthology deploys these platforms on machines (bare-metal or +cloud-provisioned), installs the packages on them, and deploys Ceph +clusters on them - all as called for by the test. + +The Nightlies +------------- + +A number of integration tests are run on a regular basis in the `Sepia +lab`_ against the official Ceph repositories (on the ``master`` development +branch and the stable branches). Traditionally, these tests are called "the +nightlies" because the Ceph core developers used to live and work in +the same time zone and from their perspective the tests were run overnight. + +The results of the nightlies are published at http://pulpito.ceph.com/. The +developer nick shows in the +test results URL and in the first column of the Pulpito dashboard. The +results are also reported on the `ceph-qa mailing list +<https://ceph.com/irc/>`_ for analysis. + +Testing Priority +---------------- + +The ``teuthology-suite`` command includes an almost mandatory option ``-p <N>`` +which specifies the priority of the jobs submitted to the queue. The lower +the value of ``N``, the higher the priority. The option is almost mandatory +because the default is ``1000`` which matches the priority of the nightlies. +Nightlies are often half-finished and cancelled due to the volume of testing +done so your jobs may never finish. Therefore, it is common to select a +priority less than 1000. + +Job priority should be selected based on the following recommendations: + +* **Priority < 10:** Use this if the sky is falling and some group of tests + must be run ASAP. + +* **10 <= Priority < 50:** Use this if your tests are urgent and blocking + other important development. + +* **50 <= Priority < 75:** Use this if you are testing a particular + feature/fix and running fewer than about 25 jobs. This range can also be + used for urgent release testing. + +* **75 <= Priority < 100:** Tech Leads will regularly schedule integration + tests with this priority to verify pull requests against master. + +* **100 <= Priority < 150:** This priority is to be used for QE validation of + point releases. + +* **150 <= Priority < 200:** Use this priority for 100 jobs or fewer of a + particular feature/fix that you'd like results on in a day or so. + +* **200 <= Priority < 1000:** Use this priority for large test runs that can + be done over the course of a week. + +In case you don't know how many jobs would be triggered by +``teuthology-suite`` command, use ``--dry-run`` to get a count first and then +issue ``teuthology-suite`` command again, this time without ``--dry-run`` and +with ``-p`` and an appropriate number as an argument to it. + +To skip the priority check, use ``--force-priority``. In order to be sensitive +to the runs of other developers who also need to do testing, please use it in +emergency only. + +Suites Inventory +---------------- + +The ``suites`` directory of the `ceph/qa sub-directory`_ contains +all the integration tests, for all the Ceph components. + +`ceph-deploy <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy>`_ + install a Ceph cluster with ``ceph-deploy`` (:ref:`ceph-deploy man page <ceph-deploy>`) + +`dummy <https://github.com/ceph/ceph/tree/master/qa/suites/dummy>`_ + get a machine, do nothing and return success (commonly used to + verify the :ref:`testing-integration-tests` infrastructure works as expected) + +`fs <https://github.com/ceph/ceph/tree/master/qa/suites/fs>`_ + test CephFS mounted using kernel and FUSE clients, also with multiple MDSs. + +`krbd <https://github.com/ceph/ceph/tree/master/qa/suites/krbd>`_ + test the RBD kernel module + +`powercycle <https://github.com/ceph/ceph/tree/master/qa/suites/powercycle>`_ + verify the Ceph cluster behaves when machines are powered off + and on again + +`rados <https://github.com/ceph/ceph/tree/master/qa/suites/rados>`_ + run Ceph clusters including OSDs and MONs, under various conditions of + stress + +`rbd <https://github.com/ceph/ceph/tree/master/qa/suites/rbd>`_ + run RBD tests using actual Ceph clusters, with and without qemu + +`rgw <https://github.com/ceph/ceph/tree/master/qa/suites/rgw>`_ + run RGW tests using actual Ceph clusters + +`smoke <https://github.com/ceph/ceph/tree/master/qa/suites/smoke>`_ + run tests that exercise the Ceph API with an actual Ceph cluster + +`teuthology <https://github.com/ceph/ceph/tree/master/qa/suites/teuthology>`_ + verify that teuthology can run integration tests, with and without OpenStack + +`upgrade <https://github.com/ceph/ceph/tree/master/qa/suites/upgrade>`_ + for various versions of Ceph, verify that upgrades can happen + without disrupting an ongoing workload + +.. _`ceph-deploy man page`: ../../man/8/ceph-deploy + +teuthology-describe-tests +------------------------- + +In February 2016, a new feature called ``teuthology-describe-tests`` was +added to the `teuthology framework`_ to facilitate documentation and better +understanding of integration tests (`feature announcement +<http://article.gmane.org/gmane.comp.file-systems.ceph.devel/29287>`_). + +The upshot is that tests can be documented by embedding ``meta:`` +annotations in the yaml files used to define the tests. The results can be +seen in the `ceph-qa-suite wiki +<http://tracker.ceph.com/projects/ceph-qa-suite/wiki/>`_. + +Since this is a new feature, many yaml files have yet to be annotated. +Developers are encouraged to improve the documentation, in terms of both +coverage and quality. + +How integration tests are run +----------------------------- + +Given that - as a new Ceph developer - you will typically not have access +to the `Sepia lab`_, you may rightly ask how you can run the integration +tests in your own environment. + +One option is to set up a teuthology cluster on bare metal. Though this is +a non-trivial task, it `is` possible. Here are `some notes +<http://docs.ceph.com/teuthology/docs/LAB_SETUP.html>`_ to get you started +if you decide to go this route. + +If you have access to an OpenStack tenant, you have another option: the +`teuthology framework`_ has an OpenStack backend, which is documented `here +<https://github.com/dachary/teuthology/tree/openstack#openstack-backend>`__. +This OpenStack backend can build packages from a given git commit or +branch, provision VMs, install the packages and run integration tests +on those VMs. This process is controlled using a tool called +``ceph-workbench ceph-qa-suite``. This tool also automates publishing of +test results at http://teuthology-logs.public.ceph.com. + +Running integration tests on your code contributions and publishing the +results allows reviewers to verify that changes to the code base do not +cause regressions, or to analyze test failures when they do occur. + +Every teuthology cluster, whether bare-metal or cloud-provisioned, has a +so-called "teuthology machine" from which tests suites are triggered using the +``teuthology-suite`` command. + +A detailed and up-to-date description of each `teuthology-suite`_ option is +available by running the following command on the teuthology machine + +.. prompt:: bash $ + + teuthology-suite --help + +.. _teuthology-suite: http://docs.ceph.com/teuthology/docs/teuthology.suite.html + +How integration tests are defined +--------------------------------- + +Integration tests are defined by yaml files found in the ``suites`` +subdirectory of the `ceph/qa sub-directory`_ and implemented by python +code found in the ``tasks`` subdirectory. Some tests ("standalone tests") +are defined in a single yaml file, while other tests are defined by a +directory tree containing yaml files that are combined, at runtime, into a +larger yaml file. + +Reading a standalone test +------------------------- + +Let us first examine a standalone test, or "singleton". + +Here is a commented example using the integration test +`rados/singleton/all/admin-socket.yaml +<https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/admin-socket.yaml>`_ + +.. code-block:: yaml + + roles: + - - mon.a + - osd.0 + - osd.1 + tasks: + - install: + - ceph: + - admin_socket: + osd.0: + version: + git_version: + help: + config show: + config set filestore_dump_file /tmp/foo: + perf dump: + perf schema: + +The ``roles`` array determines the composition of the cluster (how +many MONs, OSDs, etc.) on which this test is designed to run, as well +as how these roles will be distributed over the machines in the +testing cluster. In this case, there is only one element in the +top-level array: therefore, only one machine is allocated to the +test. The nested array declares that this machine shall run a MON with +id ``a`` (that is the ``mon.a`` in the list of roles) and two OSDs +(``osd.0`` and ``osd.1``). + +The body of the test is in the ``tasks`` array: each element is +evaluated in order, causing the corresponding python file found in the +``tasks`` subdirectory of the `teuthology repository`_ or +`ceph/qa sub-directory`_ to be run. "Running" in this case means calling +the ``task()`` function defined in that file. + +In this case, the `install +<https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_ +task comes first. It installs the Ceph packages on each machine (as +defined by the ``roles`` array). A full description of the ``install`` +task is `found in the python file +<https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_ +(search for "def task"). + +The ``ceph`` task, which is documented `here +<https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py>`__ (again, +search for "def task"), starts OSDs and MONs (and possibly MDSs as well) +as required by the ``roles`` array. In this example, it will start one MON +(``mon.a``) and two OSDs (``osd.0`` and ``osd.1``), all on the same +machine. Control moves to the next task when the Ceph cluster reaches +``HEALTH_OK`` state. + +The next task is ``admin_socket`` (`source code +<https://github.com/ceph/ceph/blob/master/qa/tasks/admin_socket.py>`_). +The parameter of the ``admin_socket`` task (and any other task) is a +structure which is interpreted as documented in the task. In this example +the parameter is a set of commands to be sent to the admin socket of +``osd.0``. The task verifies that each of them returns on success (i.e. +exit code zero). + +This test can be run with + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados/singleton/all/admin-socket.yaml fs/ext4.yaml + +Test descriptions +----------------- + +Each test has a "test description", which is similar to a directory path, +but not the same. In the case of a standalone test, like the one in +`Reading a standalone test`_, the test description is identical to the +relative path (starting from the ``suites/`` directory of the +`ceph/qa sub-directory`_) of the yaml file defining the test. + +Much more commonly, tests are defined not by a single yaml file, but by a +`directory tree of yaml files`. At runtime, the tree is walked and all yaml +files (facets) are combined into larger yaml "programs" that define the +tests. A full listing of the yaml defining the test is included at the +beginning of every test log. + +In these cases, the description of each test consists of the +subdirectory under `suites/ +<https://github.com/ceph/ceph/tree/master/qa/suites>`_ containing the +yaml facets, followed by an expression in curly braces (``{}``) consisting of +a list of yaml facets in order of concatenation. For instance the +test description:: + + ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml} + +signifies the concatenation of two files: + +* ceph-deploy/basic/distros/centos_7.0.yaml +* ceph-deploy/basic/tasks/ceph-deploy.yaml + +How tests are built from directories +------------------------------------ + +As noted in the previous section, most tests are not defined in a single +yaml file, but rather as a `combination` of files collected from a +directory tree within the ``suites/`` subdirectory of the `ceph/qa sub-directory`_. + +The set of all tests defined by a given subdirectory of ``suites/`` is +called an "integration test suite", or a "teuthology suite". + +Combination of yaml facets is controlled by special files (``%`` and +``+``) that are placed within the directory tree and can be thought of as +operators. The ``%`` file is the "convolution" operator and ``+`` +signifies concatenation. + +Convolution operator +^^^^^^^^^^^^^^^^^^^^ + +The convolution operator, implemented as an empty file called ``%``, tells +teuthology to construct a test matrix from yaml facets found in +subdirectories below the directory containing the operator. + +For example, the `ceph-deploy suite +<https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy/>`_ is +defined by the ``suites/ceph-deploy/`` tree, which consists of the files and +subdirectories in the following structure + +.. code-block:: none + + qa/suites/ceph-deploy + ├── % + ├── distros + │ ├── centos_latest.yaml + │ └── ubuntu_latest.yaml + └── tasks + ├── ceph-admin-commands.yaml + └── rbd_import_export.yaml + +This is interpreted as a 2x1 matrix consisting of two tests: + +1. ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml} +2. ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml} + +i.e. the concatenation of centos_7.0.yaml and ceph-deploy.yaml and +the concatenation of ubuntu_16.04.yaml and ceph-deploy.yaml, respectively. +In human terms, this means that the task found in ``ceph-deploy.yaml`` is +intended to run on both CentOS 7.0 and Ubuntu 16.04. + +Without the file percent, the ``ceph-deploy`` tree would be interpreted as +three standalone tests: + +* ceph-deploy/basic/distros/centos_7.0.yaml +* ceph-deploy/basic/distros/ubuntu_16.04.yaml +* ceph-deploy/basic/tasks/ceph-deploy.yaml + +(which would of course be wrong in this case). + +Referring to the `ceph/qa sub-directory`_, you will notice that the +``centos_7.0.yaml`` and ``ubuntu_16.04.yaml`` files in the +``suites/ceph-deploy/basic/distros/`` directory are implemented as symlinks. +By using symlinks instead of copying, a single file can appear in multiple +suites. This eases the maintenance of the test framework as a whole. + +All the tests generated from the ``suites/ceph-deploy/`` directory tree +(also known as the "ceph-deploy suite") can be run with + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite ceph-deploy + +An individual test from the `ceph-deploy suite`_ can be run by adding the +``--filter`` option + +.. prompt:: bash $ + + teuthology-suite \ + --machine-type smithi \ + --suite ceph-deploy/basic \ + --filter 'ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml}' + +.. note:: To run a standalone test like the one in `Reading a standalone + test`_, ``--suite`` alone is sufficient. If you want to run a single + test from a suite that is defined as a directory tree, ``--suite`` must + be combined with ``--filter``. This is because the ``--suite`` option + understands POSIX relative paths only. + +Concatenation operator +^^^^^^^^^^^^^^^^^^^^^^ + +For even greater flexibility in sharing yaml files between suites, the +special file plus (``+``) can be used to concatenate files within a +directory. For instance, consider the `suites/rbd/thrash +<https://github.com/ceph/ceph/tree/master/qa/suites/rbd/thrash>`_ +tree + +.. code-block:: none + + qa/suites/rbd/thrash + ├── % + ├── clusters + │ ├── + + │ ├── fixed-2.yaml + │ └── openstack.yaml + └── workloads + ├── rbd_api_tests_copy_on_read.yaml + ├── rbd_api_tests.yaml + └── rbd_fsx_rate_limit.yaml + +This creates two tests: + +* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} +* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests.yaml} + +Because the ``clusters/`` subdirectory contains the special file plus +(``+``), all the other files in that subdirectory (``fixed-2.yaml`` and +``openstack.yaml`` in this case) are concatenated together +and treated as a single file. Without the special file plus, they would +have been convolved with the files from the workloads directory to create +a 2x2 matrix: + +* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} +* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests.yaml} +* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests_copy_on_read.yaml} +* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests.yaml} + +The ``clusters/fixed-2.yaml`` file is shared among many suites to +define the following ``roles`` + +.. code-block:: yaml + + roles: + - [mon.a, mon.c, osd.0, osd.1, osd.2, client.0] + - [mon.b, osd.3, osd.4, osd.5, client.1] + +The ``rbd/thrash`` suite as defined above, consisting of two tests, +can be run with + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rbd/thrash + +A single test from the rbd/thrash suite can be run by adding the +``--filter`` option + +.. prompt:: bash $ + + teuthology-suite \ + --machine-type smithi \ + --suite rbd/thrash \ + --filter 'rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}' + +Filtering tests by their description +------------------------------------ + +When a few jobs fail and need to be run again, the ``--filter`` option +can be used to select tests with a matching description. For instance, if the +``rados`` suite fails the `all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_ test, the following will only +run the tests that contain this file + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml + +The ``--filter-out`` option does the opposite (it matches tests that do `not` +contain a given string), and can be combined with the ``--filter`` option. + +Both ``--filter`` and ``--filter-out`` take a comma-separated list of strings +(which means the comma character is implicitly forbidden in filenames found in +the `ceph/qa sub-directory`_). For instance + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml,all/rest-api.yaml + +will run tests that contain either +`all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_ +or +`all/rest-api.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/rest-api.yaml>`_ + +Each string is looked up anywhere in the test description and has to +be an exact match: they are not regular expressions. + +Reducing the number of tests +---------------------------- + +The ``rados`` suite generates tens or even hundreds of thousands of tests out +of a few hundred files. This happens because teuthology constructs test +matrices from subdirectories wherever it encounters a file named ``%``. For +instance, all tests in the `rados/basic suite +<https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic>`_ run with +different messenger types: ``simple``, ``async`` and ``random``, because they +are combined (via the special file ``%``) with the `msgr directory +<https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic/msgr>`_ + +All integration tests are required to be run before a Ceph release is +published. When merely verifying whether a contribution can be merged without +risking a trivial regression, it is enough to run a subset. The ``--subset`` +option can be used to reduce the number of tests that are triggered. For +instance + +.. prompt:: bash $ + + teuthology-suite --machine-type smithi --suite rados --subset 0/4000 + +will run as few tests as possible. The tradeoff in this case is that +not all combinations of test variations will together, +but no matter how small a ratio is provided in the ``--subset``, +teuthology will still ensure that all files in the suite are in at +least one test. Understanding the actual logic that drives this +requires reading the teuthology source code. + +The ``--limit`` option only runs the first ``N`` tests in the suite: +this is rarely useful, however, because there is no way to control which +test will be first. + +.. _ceph/qa sub-directory: https://github.com/ceph/ceph/tree/master/qa +.. _Sepia Lab: https://wiki.sepia.ceph.com/doku.php +.. _teuthology repository: https://github.com/ceph/teuthology +.. _teuthology framework: https://github.com/ceph/teuthology |