diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 02:57:58 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 02:57:58 +0000 |
commit | be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch) | |
tree | 9754ff1ca740f6346cf8483ec915d4054bc5da2d /fluent-bit/lib/librdkafka-2.1.0/tests/README.md | |
parent | Initial commit. (diff) | |
download | netdata-upstream.tar.xz netdata-upstream.zip |
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'fluent-bit/lib/librdkafka-2.1.0/tests/README.md')
-rw-r--r-- | fluent-bit/lib/librdkafka-2.1.0/tests/README.md | 505 |
1 files changed, 505 insertions, 0 deletions
diff --git a/fluent-bit/lib/librdkafka-2.1.0/tests/README.md b/fluent-bit/lib/librdkafka-2.1.0/tests/README.md new file mode 100644 index 00000000..b0d99b0b --- /dev/null +++ b/fluent-bit/lib/librdkafka-2.1.0/tests/README.md @@ -0,0 +1,505 @@ +# Automated regression tests for librdkafka + + +## Supported test environments + +While the standard test suite works well on OSX and Windows, +the full test suite (which must be run for PRs and releases) will +only run on recent Linux distros due to its use of ASAN, Kerberos, etc. + + +## Automated broker cluster setup using trivup + +A local broker cluster can be set up using +[trivup](https://github.com/edenhill/trivup), which is a Python package +available on PyPi. +These self-contained clusters are used to run the librdkafka test suite +on a number of different broker versions or with specific broker configs. + +trivup will download the specified Kafka version into its root directory, +the root directory is also used for cluster instances, where Kafka will +write messages, logs, etc. +The trivup root directory is by default `tmp` in the current directory but +may be specified by setting the `TRIVUP_ROOT` environment variable +to alternate directory, e.g., `TRIVUP_ROOT=$HOME/trivup make full`. + +First install required Python packages (trivup with friends): + + $ python3 -m pip install -U -r requirements.txt + +Bring up a Kafka cluster (with the specified version) and start an interactive +shell, when the shell is exited the cluster is brought down and deleted. + + $ python3 -m trivup.clusters.KafkaCluster 2.3.0 # Broker version + # You can also try adding: + # --ssl To enable SSL listeners + # --sasl <mechanism> To enable SASL authentication + # --sr To provide a Schema-Registry instance + # .. and so on, see --help for more. + +In the trivup shell, run the test suite: + + $ make + + +If you'd rather use an existing cluster, you may omit trivup and +provide a `test.conf` file that specifies the brokers and possibly other +librdkafka configuration properties: + + $ cp test.conf.example test.conf + $ $EDITOR test.conf + + + +## Run specific tests + +To run tests: + + # Run tests in parallel (quicker, but harder to troubleshoot) + $ make + + # Run a condensed test suite (quickest) + # This is what is run on CI builds. + $ make quick + + # Run tests in sequence + $ make run_seq + + # Run specific test + $ TESTS=0004 make + + # Run test(s) with helgrind, valgrind, gdb + $ TESTS=0009 ./run-test.sh valgrind|helgrind|gdb + + +All tests in the 0000-0999 series are run automatically with `make`. + +Tests 1000-1999 are subject to specific non-standard setups or broker +configuration, these tests are run with `TESTS=1nnn make`. +See comments in the test's source file for specific requirements. + +To insert test results into SQLite database make sure the `sqlite3` utility +is installed, then add this to `test.conf`: + + test.sql.command=sqlite3 rdktests + + + +## Adding a new test + +The simplest way to add a new test is to copy one of the recent +(higher `0nnn-..` number) tests to the next free +`0nnn-<what-is-tested>` file. + +If possible and practical, try to use the C++ API in your test as that will +cover both the C and C++ APIs and thus provide better test coverage. +Do note that the C++ test framework is not as feature rich as the C one, +so if you need message verification, etc, you're better off with a C test. + +After creating your test file it needs to be added in a couple of places: + + * Add to [tests/CMakeLists.txt](tests/CMakeLists.txt) + * Add to [win32/tests/tests.vcxproj](win32/tests/tests.vcxproj) + * Add to both locations in [tests/test.c](tests/test.c) - search for an + existing test number to see what needs to be done. + +You don't need to add the test to the Makefile, it is picked up automatically. + +Some additional guidelines: + * If your test depends on a minimum broker version, make sure to specify it + in test.c using `TEST_BRKVER()` (see 0091 as an example). + * If your test can run without an active cluster, flag the test + with `TEST_F_LOCAL`. + * If your test runs for a long time or produces/consumes a lot of messages + it might not be suitable for running on CI (which should run quickly + and are bound by both time and resources). In this case it is preferred + if you modify your test to be able to run quicker and/or with less messages + if the `test_quick` variable is true. + * There's plenty of helper wrappers in test.c for common librdkafka functions + that makes tests easier to write by not having to deal with errors, etc. + * Fail fast, use `TEST_ASSERT()` et.al., the sooner an error is detected + the better since it makes troubleshooting easier. + * Use `TEST_SAY()` et.al. to inform the developer what your test is doing, + making it easier to troubleshoot upon failure. But try to keep output + down to reasonable levels. There is a `TEST_LEVEL` environment variable + that can be used with `TEST_SAYL()` to only emit certain printouts + if the test level is increased. The default test level is 2. + * The test runner will automatically adjust timeouts (it knows about) + if running under valgrind, on CI, or similar environment where the + execution speed may be slower. + To make sure your test remains sturdy in these type of environments, make + sure to use the `tmout_multip(milliseconds)` macro when passing timeout + values to non-test functions, e.g, `rd_kafka_poll(rk, tmout_multip(3000))`. + * If your test file contains multiple separate sub-tests, use the + `SUB_TEST()`, `SUB_TEST_QUICK()` and `SUB_TEST_PASS()` from inside + the test functions to help differentiate test failures. + + +## Test scenarios + +A test scenario defines the cluster configuration used by tests. +The majority of tests use the "default" scenario which matches the +Apache Kafka default broker configuration (topic auto creation enabled, etc). + +If a test relies on cluster configuration that is mutually exclusive with +the default configuration an alternate scenario must be defined in +`scenarios/<scenario>.json` which is a configuration object which +is passed to [trivup](https://github.com/edenhill/trivup). + +Try to reuse an existing test scenario as far as possible to speed up +test times, since each new scenario will require a new cluster incarnation. + + +## A guide to testing, verifying, and troubleshooting, librdkafka + + +### Creating a development build + +The [dev-conf.sh](../dev-conf.sh) script configures and builds librdkafka and +the test suite for development use, enabling extra runtime +checks (`ENABLE_DEVEL`, `rd_dassert()`, etc), disabling optimization +(to get accurate stack traces and line numbers), enable ASAN, etc. + + # Reconfigure librdkafka for development use and rebuild. + $ ./dev-conf.sh + +**NOTE**: Performance tests and benchmarks should not use a development build. + + +### Controlling the test framework + +A test run may be dynamically set up using a number of environment variables. +These environment variables work for all different ways of invocing the tests, +be it `make`, `run-test.sh`, `until-fail.sh`, etc. + + * `TESTS=0nnn` - only run a single test identified by its full number, e.g. + `TESTS=0102 make`. (Yes, the var should have been called TEST) + * `SUBTESTS=...` - only run sub-tests (tests that are using `SUB_TEST()`) + that contains this string. + * `TESTS_SKIP=...` - skip these tests. + * `TEST_DEBUG=...` - this will automatically set the `debug` config property + of all instantiated clients to the value. + E.g.. `TEST_DEBUG=broker,protocol TESTS=0001 make` + * `TEST_LEVEL=n` - controls the `TEST_SAY()` output level, a higher number + yields more test output. Default level is 2. + * `RD_UT_TEST=name` - only run unittest containing `name`, should be used + with `TESTS=0000`. + See [../src/rdunittest.c](../src/rdunittest.c) for + unit test names. + + +Let's say that you run the full test suite and get a failure in test 0061, +which is a consumer test. You want to quickly reproduce the issue +and figure out what is wrong, so limit the tests to just 0061, and provide +the relevant debug options (which is typically `cgrp,fetch` for consumers): + + $ TESTS=0061 TEST_DEBUG=cgrp,fetch make + +If the test did not fail you've found an intermittent issue, this is where +[until-fail.sh](until-fail.sh) comes in to play, so run the test until it fails: + + # bare means to run the test without valgrind + $ TESTS=0061 TEST_DEBUG=cgrp,fetch ./until-fail.sh bare + + +### How to run tests + +The standard way to run the test suite is firing up a trivup cluster +in an interactive shell: + + $ ./interactive_broker_version.py 2.3.0 # Broker version + + +And then running the test suite in parallel: + + $ make + + +Run one test at a time: + + $ make run_seq + + +Run a single test: + + $ TESTS=0034 make + + +Run test suite with valgrind (see instructions below): + + $ ./run-test.sh valgrind # memory checking + +or with helgrind (the valgrind thread checker): + + $ ./run-test.sh helgrind # thread checking + + +To run the tests in gdb: + +**NOTE**: gdb support is flaky on OSX due to signing issues. + + $ ./run-test.sh gdb + (gdb) run + + # wait for test to crash, or interrupt with Ctrl-C + + # backtrace of current thread + (gdb) bt + # move up or down a stack frame + (gdb) up + (gdb) down + # select specific stack frame + (gdb) frame 3 + # show code at location + (gdb) list + + # print variable content + (gdb) p rk.rk_conf.group_id + (gdb) p *rkb + + # continue execution (if interrupted) + (gdb) cont + + # single-step one instruction + (gdb) step + + # restart + (gdb) run + + # see all threads + (gdb) info threads + + # see backtraces of all threads + (gdb) thread apply all bt + + # exit gdb + (gdb) exit + + +If a test crashes and produces a core file (make sure your shell has +`ulimit -c unlimited` set!), do: + + # On linux + $ LD_LIBRARY_PATH=../src:../src-cpp gdb ./test-runner <core-file> + (gdb) bt + + # On OSX + $ DYLD_LIBRARY_PATH=../src:../src-cpp gdb ./test-runner /cores/core.<pid> + (gdb) bt + + +To run all tests repeatedly until one fails, this is a good way of finding +intermittent failures, race conditions, etc: + + $ ./until-fail.sh bare # bare is to run the test without valgrind, + # may also be one or more of the modes supported + # by run-test.sh: + # bare valgrind helgrind gdb, etc.. + +To run a single test repeatedly with valgrind until failure: + + $ TESTS=0103 ./until-fail.sh valgrind + + + +### Finding memory leaks, memory corruption, etc. + +There are two ways to verifying there are no memory leaks, out of bound +memory accesses, use after free, etc. ASAN or valgrind. + +#### ASAN - AddressSanitizer + +The first option is using AddressSanitizer, this is build-time instrumentation +provided by clang and gcc to insert memory checks in the build library. + +To enable AddressSanitizer (ASAN), run `./dev-conf.sh asan` from the +librdkafka root directory. +This script will rebuild librdkafka and the test suite with ASAN enabled. + +Then run tests as usual. Memory access issues will be reported on stderr +in real time as they happen (and the test will fail eventually), while +memory leaks will be reported on stderr when the test run exits successfully, +i.e., no tests failed. + +Test failures will typically cause the current test to exit hard without +cleaning up, in which case there will be a large number of reported memory +leaks, these shall be ignored. The memory leak report is only relevant +when the test suite passes. + +**NOTE**: The OSX version of ASAN does not provide memory leak protection, + you will need to run the test suite on Linux (native or in Docker). + +**NOTE**: ASAN, TSAN and valgrind are mutually exclusive. + + +#### Valgrind - memory checker + +Valgrind is a powerful virtual machine that intercepts all memory accesses +of an unmodified program, reporting memory access violations, use after free, +memory leaks, etc. + +Valgrind provides additional checks over ASAN and is mostly useful +for troubleshooting crashes, memory issues and leaks when ASAN falls short. + +To use valgrind, make sure librdkafka and the test suite is built without +ASAN or TSAN, it must be a clean build without any other instrumentation, +then simply run: + + $ ./run-test.sh valgrind + +Valgrind will report to stderr, just like ASAN. + + +**NOTE**: Valgrind only runs on Linux. + +**NOTE**: ASAN, TSAN and valgrind are mutually exclusive. + + +### TSAN - Thread and locking issues + +librdkafka uses a number of internal threads which communicate and share state +through op queues, conditional variables, mutexes and atomics. + +While the docstrings in the librdkafka source code specify what locking is +required it is very hard to manually verify that the correct locks +are acquired, and in the correct order (to avoid deadlocks). + +TSAN, ThreadSanitizer, is of great help here. As with ASAN, TSAN is a +build-time option: run `./dev-conf.sh tsan` to rebuild with TSAN. + +Run the test suite as usual, preferably in parallel. TSAN will output +thread errors to stderr and eventually fail the test run. + +If you're having threading issues and TSAN does not provide enough information +to sort it out, you can also try running the test with helgrind, which +is valgrind's thread checker (`./run-test.sh helgrind`). + + +**NOTE**: ASAN, TSAN and valgrind are mutually exclusive. + + +### Resource usage thresholds (experimental) + +**NOTE**: This is an experimental feature, some form of system-specific + calibration will be needed. + +If the `-R` option is passed to the `test-runner`, or the `make rusage` +target is used, the test framework will monitor each test's resource usage +and fail the test if the default or test-specific thresholds are exceeded. + +Per-test thresholds are specified in test.c using the `_THRES()` macro. + +Currently monitored resources are: + * `utime` - User CPU time in seconds (default 1.0s) + * `stime` - System/Kernel CPU time in seconds (default 0.5s). + * `rss` - RSS (memory) usage (default 10.0 MB) + * `ctxsw` - Number of voluntary context switches, e.g. syscalls (default 10000). + +Upon successful test completion a log line will be emitted with a resource +usage summary, e.g.: + + Test resource usage summary: 20.161s (32.3%) User CPU time, 12.976s (20.8%) Sys CPU time, 0.000MB RSS memory increase, 4980 Voluntary context switches + +The User and Sys CPU thresholds are based on observations running the +test suite on an Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (8 cores) +which define the base line system. + +Since no two development environments are identical a manual CPU calibration +value can be passed as `-R<C>`, where `C` is the CPU calibration for +the local system compared to the base line system. +The CPU threshold will be multiplied by the CPU calibration value (default 1.0), +thus a value less than 1.0 means the local system is faster than the +base line system, and a value larger than 1.0 means the local system is +slower than the base line system. +I.e., if you are on an i5 system, pass `-R2.0` to allow higher CPU usages, +or `-R0.8` if your system is faster than the base line system. +The the CPU calibration value may also be set with the +`TEST_CPU_CALIBRATION=1.5` environment variable. + +In an ideal future, the test suite would be able to auto-calibrate. + + +**NOTE**: The resource usage threshold checks will run tests in sequence, + not parallell, to be able to effectively measure per-test usage. + + +# PR and release verification + +Prior to pushing your PR you must verify that your code change has not +introduced any regression or new issues, this requires running the test +suite in multiple different modes: + + * PLAINTEXT, SSL transports + * All SASL mechanisms (PLAIN, GSSAPI, SCRAM, OAUTHBEARER) + * Idempotence enabled for all tests + * With memory checking + * With thread checking + * Compatibility with older broker versions + +These tests must also be run for each release candidate that is created. + + $ make release-test + +This will take approximately 30 minutes. + +**NOTE**: Run this on Linux (for ASAN and Kerberos tests to work properly), not OSX. + + +# Test mode specifics + +The following sections rely on trivup being installed. + + +### Compatbility tests with multiple broker versions + +To ensure compatibility across all supported broker versions the entire +test suite is run in a trivup based cluster, one test run for each +relevant broker version. + + $ ./broker_version_tests.py + + +### SASL tests + +Testing SASL requires a bit of configuration on the brokers, to automate +this the entire test suite is run on trivup based clusters. + + $ ./sasl_tests.py + + + +### Full test suite(s) run + +To run all tests, including the broker version and SASL tests, etc, use + + $ make full + +**NOTE**: `make full` is a sub-set of the more complete `make release-test` target. + + +### Idempotent Producer tests + +To run the entire test suite with `enable.idempotence=true` enabled, use +`make idempotent_seq` or `make idempotent_par` for sequencial or +parallel testing. +Some tests are skipped or slightly modified when idempotence is enabled. + + +## Manual testing notes + +The following manual tests are currently performed manually, they should be +implemented as automatic tests. + +### LZ4 interop + + $ ./interactive_broker_version.py -c ./lz4_manual_test.py 0.8.2.2 0.9.0.1 2.3.0 + +Check the output and follow the instructions. + + + + +## Test numbers + +Automated tests: 0000-0999 +Manual tests: 8000-8999 |