summaryrefslogtreecommitdiffstats
path: root/fluent-bit/lib/librdkafka-2.1.0/tests/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'fluent-bit/lib/librdkafka-2.1.0/tests/README.md')
-rw-r--r--fluent-bit/lib/librdkafka-2.1.0/tests/README.md505
1 files changed, 505 insertions, 0 deletions
diff --git a/fluent-bit/lib/librdkafka-2.1.0/tests/README.md b/fluent-bit/lib/librdkafka-2.1.0/tests/README.md
new file mode 100644
index 000000000..b0d99b0bb
--- /dev/null
+++ b/fluent-bit/lib/librdkafka-2.1.0/tests/README.md
@@ -0,0 +1,505 @@
+# Automated regression tests for librdkafka
+
+
+## Supported test environments
+
+While the standard test suite works well on OSX and Windows,
+the full test suite (which must be run for PRs and releases) will
+only run on recent Linux distros due to its use of ASAN, Kerberos, etc.
+
+
+## Automated broker cluster setup using trivup
+
+A local broker cluster can be set up using
+[trivup](https://github.com/edenhill/trivup), which is a Python package
+available on PyPi.
+These self-contained clusters are used to run the librdkafka test suite
+on a number of different broker versions or with specific broker configs.
+
+trivup will download the specified Kafka version into its root directory,
+the root directory is also used for cluster instances, where Kafka will
+write messages, logs, etc.
+The trivup root directory is by default `tmp` in the current directory but
+may be specified by setting the `TRIVUP_ROOT` environment variable
+to alternate directory, e.g., `TRIVUP_ROOT=$HOME/trivup make full`.
+
+First install required Python packages (trivup with friends):
+
+ $ python3 -m pip install -U -r requirements.txt
+
+Bring up a Kafka cluster (with the specified version) and start an interactive
+shell, when the shell is exited the cluster is brought down and deleted.
+
+ $ python3 -m trivup.clusters.KafkaCluster 2.3.0 # Broker version
+ # You can also try adding:
+ # --ssl To enable SSL listeners
+ # --sasl <mechanism> To enable SASL authentication
+ # --sr To provide a Schema-Registry instance
+ # .. and so on, see --help for more.
+
+In the trivup shell, run the test suite:
+
+ $ make
+
+
+If you'd rather use an existing cluster, you may omit trivup and
+provide a `test.conf` file that specifies the brokers and possibly other
+librdkafka configuration properties:
+
+ $ cp test.conf.example test.conf
+ $ $EDITOR test.conf
+
+
+
+## Run specific tests
+
+To run tests:
+
+ # Run tests in parallel (quicker, but harder to troubleshoot)
+ $ make
+
+ # Run a condensed test suite (quickest)
+ # This is what is run on CI builds.
+ $ make quick
+
+ # Run tests in sequence
+ $ make run_seq
+
+ # Run specific test
+ $ TESTS=0004 make
+
+ # Run test(s) with helgrind, valgrind, gdb
+ $ TESTS=0009 ./run-test.sh valgrind|helgrind|gdb
+
+
+All tests in the 0000-0999 series are run automatically with `make`.
+
+Tests 1000-1999 are subject to specific non-standard setups or broker
+configuration, these tests are run with `TESTS=1nnn make`.
+See comments in the test's source file for specific requirements.
+
+To insert test results into SQLite database make sure the `sqlite3` utility
+is installed, then add this to `test.conf`:
+
+ test.sql.command=sqlite3 rdktests
+
+
+
+## Adding a new test
+
+The simplest way to add a new test is to copy one of the recent
+(higher `0nnn-..` number) tests to the next free
+`0nnn-<what-is-tested>` file.
+
+If possible and practical, try to use the C++ API in your test as that will
+cover both the C and C++ APIs and thus provide better test coverage.
+Do note that the C++ test framework is not as feature rich as the C one,
+so if you need message verification, etc, you're better off with a C test.
+
+After creating your test file it needs to be added in a couple of places:
+
+ * Add to [tests/CMakeLists.txt](tests/CMakeLists.txt)
+ * Add to [win32/tests/tests.vcxproj](win32/tests/tests.vcxproj)
+ * Add to both locations in [tests/test.c](tests/test.c) - search for an
+ existing test number to see what needs to be done.
+
+You don't need to add the test to the Makefile, it is picked up automatically.
+
+Some additional guidelines:
+ * If your test depends on a minimum broker version, make sure to specify it
+ in test.c using `TEST_BRKVER()` (see 0091 as an example).
+ * If your test can run without an active cluster, flag the test
+ with `TEST_F_LOCAL`.
+ * If your test runs for a long time or produces/consumes a lot of messages
+ it might not be suitable for running on CI (which should run quickly
+ and are bound by both time and resources). In this case it is preferred
+ if you modify your test to be able to run quicker and/or with less messages
+ if the `test_quick` variable is true.
+ * There's plenty of helper wrappers in test.c for common librdkafka functions
+ that makes tests easier to write by not having to deal with errors, etc.
+ * Fail fast, use `TEST_ASSERT()` et.al., the sooner an error is detected
+ the better since it makes troubleshooting easier.
+ * Use `TEST_SAY()` et.al. to inform the developer what your test is doing,
+ making it easier to troubleshoot upon failure. But try to keep output
+ down to reasonable levels. There is a `TEST_LEVEL` environment variable
+ that can be used with `TEST_SAYL()` to only emit certain printouts
+ if the test level is increased. The default test level is 2.
+ * The test runner will automatically adjust timeouts (it knows about)
+ if running under valgrind, on CI, or similar environment where the
+ execution speed may be slower.
+ To make sure your test remains sturdy in these type of environments, make
+ sure to use the `tmout_multip(milliseconds)` macro when passing timeout
+ values to non-test functions, e.g, `rd_kafka_poll(rk, tmout_multip(3000))`.
+ * If your test file contains multiple separate sub-tests, use the
+ `SUB_TEST()`, `SUB_TEST_QUICK()` and `SUB_TEST_PASS()` from inside
+ the test functions to help differentiate test failures.
+
+
+## Test scenarios
+
+A test scenario defines the cluster configuration used by tests.
+The majority of tests use the "default" scenario which matches the
+Apache Kafka default broker configuration (topic auto creation enabled, etc).
+
+If a test relies on cluster configuration that is mutually exclusive with
+the default configuration an alternate scenario must be defined in
+`scenarios/<scenario>.json` which is a configuration object which
+is passed to [trivup](https://github.com/edenhill/trivup).
+
+Try to reuse an existing test scenario as far as possible to speed up
+test times, since each new scenario will require a new cluster incarnation.
+
+
+## A guide to testing, verifying, and troubleshooting, librdkafka
+
+
+### Creating a development build
+
+The [dev-conf.sh](../dev-conf.sh) script configures and builds librdkafka and
+the test suite for development use, enabling extra runtime
+checks (`ENABLE_DEVEL`, `rd_dassert()`, etc), disabling optimization
+(to get accurate stack traces and line numbers), enable ASAN, etc.
+
+ # Reconfigure librdkafka for development use and rebuild.
+ $ ./dev-conf.sh
+
+**NOTE**: Performance tests and benchmarks should not use a development build.
+
+
+### Controlling the test framework
+
+A test run may be dynamically set up using a number of environment variables.
+These environment variables work for all different ways of invocing the tests,
+be it `make`, `run-test.sh`, `until-fail.sh`, etc.
+
+ * `TESTS=0nnn` - only run a single test identified by its full number, e.g.
+ `TESTS=0102 make`. (Yes, the var should have been called TEST)
+ * `SUBTESTS=...` - only run sub-tests (tests that are using `SUB_TEST()`)
+ that contains this string.
+ * `TESTS_SKIP=...` - skip these tests.
+ * `TEST_DEBUG=...` - this will automatically set the `debug` config property
+ of all instantiated clients to the value.
+ E.g.. `TEST_DEBUG=broker,protocol TESTS=0001 make`
+ * `TEST_LEVEL=n` - controls the `TEST_SAY()` output level, a higher number
+ yields more test output. Default level is 2.
+ * `RD_UT_TEST=name` - only run unittest containing `name`, should be used
+ with `TESTS=0000`.
+ See [../src/rdunittest.c](../src/rdunittest.c) for
+ unit test names.
+
+
+Let's say that you run the full test suite and get a failure in test 0061,
+which is a consumer test. You want to quickly reproduce the issue
+and figure out what is wrong, so limit the tests to just 0061, and provide
+the relevant debug options (which is typically `cgrp,fetch` for consumers):
+
+ $ TESTS=0061 TEST_DEBUG=cgrp,fetch make
+
+If the test did not fail you've found an intermittent issue, this is where
+[until-fail.sh](until-fail.sh) comes in to play, so run the test until it fails:
+
+ # bare means to run the test without valgrind
+ $ TESTS=0061 TEST_DEBUG=cgrp,fetch ./until-fail.sh bare
+
+
+### How to run tests
+
+The standard way to run the test suite is firing up a trivup cluster
+in an interactive shell:
+
+ $ ./interactive_broker_version.py 2.3.0 # Broker version
+
+
+And then running the test suite in parallel:
+
+ $ make
+
+
+Run one test at a time:
+
+ $ make run_seq
+
+
+Run a single test:
+
+ $ TESTS=0034 make
+
+
+Run test suite with valgrind (see instructions below):
+
+ $ ./run-test.sh valgrind # memory checking
+
+or with helgrind (the valgrind thread checker):
+
+ $ ./run-test.sh helgrind # thread checking
+
+
+To run the tests in gdb:
+
+**NOTE**: gdb support is flaky on OSX due to signing issues.
+
+ $ ./run-test.sh gdb
+ (gdb) run
+
+ # wait for test to crash, or interrupt with Ctrl-C
+
+ # backtrace of current thread
+ (gdb) bt
+ # move up or down a stack frame
+ (gdb) up
+ (gdb) down
+ # select specific stack frame
+ (gdb) frame 3
+ # show code at location
+ (gdb) list
+
+ # print variable content
+ (gdb) p rk.rk_conf.group_id
+ (gdb) p *rkb
+
+ # continue execution (if interrupted)
+ (gdb) cont
+
+ # single-step one instruction
+ (gdb) step
+
+ # restart
+ (gdb) run
+
+ # see all threads
+ (gdb) info threads
+
+ # see backtraces of all threads
+ (gdb) thread apply all bt
+
+ # exit gdb
+ (gdb) exit
+
+
+If a test crashes and produces a core file (make sure your shell has
+`ulimit -c unlimited` set!), do:
+
+ # On linux
+ $ LD_LIBRARY_PATH=../src:../src-cpp gdb ./test-runner <core-file>
+ (gdb) bt
+
+ # On OSX
+ $ DYLD_LIBRARY_PATH=../src:../src-cpp gdb ./test-runner /cores/core.<pid>
+ (gdb) bt
+
+
+To run all tests repeatedly until one fails, this is a good way of finding
+intermittent failures, race conditions, etc:
+
+ $ ./until-fail.sh bare # bare is to run the test without valgrind,
+ # may also be one or more of the modes supported
+ # by run-test.sh:
+ # bare valgrind helgrind gdb, etc..
+
+To run a single test repeatedly with valgrind until failure:
+
+ $ TESTS=0103 ./until-fail.sh valgrind
+
+
+
+### Finding memory leaks, memory corruption, etc.
+
+There are two ways to verifying there are no memory leaks, out of bound
+memory accesses, use after free, etc. ASAN or valgrind.
+
+#### ASAN - AddressSanitizer
+
+The first option is using AddressSanitizer, this is build-time instrumentation
+provided by clang and gcc to insert memory checks in the build library.
+
+To enable AddressSanitizer (ASAN), run `./dev-conf.sh asan` from the
+librdkafka root directory.
+This script will rebuild librdkafka and the test suite with ASAN enabled.
+
+Then run tests as usual. Memory access issues will be reported on stderr
+in real time as they happen (and the test will fail eventually), while
+memory leaks will be reported on stderr when the test run exits successfully,
+i.e., no tests failed.
+
+Test failures will typically cause the current test to exit hard without
+cleaning up, in which case there will be a large number of reported memory
+leaks, these shall be ignored. The memory leak report is only relevant
+when the test suite passes.
+
+**NOTE**: The OSX version of ASAN does not provide memory leak protection,
+ you will need to run the test suite on Linux (native or in Docker).
+
+**NOTE**: ASAN, TSAN and valgrind are mutually exclusive.
+
+
+#### Valgrind - memory checker
+
+Valgrind is a powerful virtual machine that intercepts all memory accesses
+of an unmodified program, reporting memory access violations, use after free,
+memory leaks, etc.
+
+Valgrind provides additional checks over ASAN and is mostly useful
+for troubleshooting crashes, memory issues and leaks when ASAN falls short.
+
+To use valgrind, make sure librdkafka and the test suite is built without
+ASAN or TSAN, it must be a clean build without any other instrumentation,
+then simply run:
+
+ $ ./run-test.sh valgrind
+
+Valgrind will report to stderr, just like ASAN.
+
+
+**NOTE**: Valgrind only runs on Linux.
+
+**NOTE**: ASAN, TSAN and valgrind are mutually exclusive.
+
+
+### TSAN - Thread and locking issues
+
+librdkafka uses a number of internal threads which communicate and share state
+through op queues, conditional variables, mutexes and atomics.
+
+While the docstrings in the librdkafka source code specify what locking is
+required it is very hard to manually verify that the correct locks
+are acquired, and in the correct order (to avoid deadlocks).
+
+TSAN, ThreadSanitizer, is of great help here. As with ASAN, TSAN is a
+build-time option: run `./dev-conf.sh tsan` to rebuild with TSAN.
+
+Run the test suite as usual, preferably in parallel. TSAN will output
+thread errors to stderr and eventually fail the test run.
+
+If you're having threading issues and TSAN does not provide enough information
+to sort it out, you can also try running the test with helgrind, which
+is valgrind's thread checker (`./run-test.sh helgrind`).
+
+
+**NOTE**: ASAN, TSAN and valgrind are mutually exclusive.
+
+
+### Resource usage thresholds (experimental)
+
+**NOTE**: This is an experimental feature, some form of system-specific
+ calibration will be needed.
+
+If the `-R` option is passed to the `test-runner`, or the `make rusage`
+target is used, the test framework will monitor each test's resource usage
+and fail the test if the default or test-specific thresholds are exceeded.
+
+Per-test thresholds are specified in test.c using the `_THRES()` macro.
+
+Currently monitored resources are:
+ * `utime` - User CPU time in seconds (default 1.0s)
+ * `stime` - System/Kernel CPU time in seconds (default 0.5s).
+ * `rss` - RSS (memory) usage (default 10.0 MB)
+ * `ctxsw` - Number of voluntary context switches, e.g. syscalls (default 10000).
+
+Upon successful test completion a log line will be emitted with a resource
+usage summary, e.g.:
+
+ Test resource usage summary: 20.161s (32.3%) User CPU time, 12.976s (20.8%) Sys CPU time, 0.000MB RSS memory increase, 4980 Voluntary context switches
+
+The User and Sys CPU thresholds are based on observations running the
+test suite on an Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (8 cores)
+which define the base line system.
+
+Since no two development environments are identical a manual CPU calibration
+value can be passed as `-R<C>`, where `C` is the CPU calibration for
+the local system compared to the base line system.
+The CPU threshold will be multiplied by the CPU calibration value (default 1.0),
+thus a value less than 1.0 means the local system is faster than the
+base line system, and a value larger than 1.0 means the local system is
+slower than the base line system.
+I.e., if you are on an i5 system, pass `-R2.0` to allow higher CPU usages,
+or `-R0.8` if your system is faster than the base line system.
+The the CPU calibration value may also be set with the
+`TEST_CPU_CALIBRATION=1.5` environment variable.
+
+In an ideal future, the test suite would be able to auto-calibrate.
+
+
+**NOTE**: The resource usage threshold checks will run tests in sequence,
+ not parallell, to be able to effectively measure per-test usage.
+
+
+# PR and release verification
+
+Prior to pushing your PR you must verify that your code change has not
+introduced any regression or new issues, this requires running the test
+suite in multiple different modes:
+
+ * PLAINTEXT, SSL transports
+ * All SASL mechanisms (PLAIN, GSSAPI, SCRAM, OAUTHBEARER)
+ * Idempotence enabled for all tests
+ * With memory checking
+ * With thread checking
+ * Compatibility with older broker versions
+
+These tests must also be run for each release candidate that is created.
+
+ $ make release-test
+
+This will take approximately 30 minutes.
+
+**NOTE**: Run this on Linux (for ASAN and Kerberos tests to work properly), not OSX.
+
+
+# Test mode specifics
+
+The following sections rely on trivup being installed.
+
+
+### Compatbility tests with multiple broker versions
+
+To ensure compatibility across all supported broker versions the entire
+test suite is run in a trivup based cluster, one test run for each
+relevant broker version.
+
+ $ ./broker_version_tests.py
+
+
+### SASL tests
+
+Testing SASL requires a bit of configuration on the brokers, to automate
+this the entire test suite is run on trivup based clusters.
+
+ $ ./sasl_tests.py
+
+
+
+### Full test suite(s) run
+
+To run all tests, including the broker version and SASL tests, etc, use
+
+ $ make full
+
+**NOTE**: `make full` is a sub-set of the more complete `make release-test` target.
+
+
+### Idempotent Producer tests
+
+To run the entire test suite with `enable.idempotence=true` enabled, use
+`make idempotent_seq` or `make idempotent_par` for sequencial or
+parallel testing.
+Some tests are skipped or slightly modified when idempotence is enabled.
+
+
+## Manual testing notes
+
+The following manual tests are currently performed manually, they should be
+implemented as automatic tests.
+
+### LZ4 interop
+
+ $ ./interactive_broker_version.py -c ./lz4_manual_test.py 0.8.2.2 0.9.0.1 2.3.0
+
+Check the output and follow the instructions.
+
+
+
+
+## Test numbers
+
+Automated tests: 0000-0999
+Manual tests: 8000-8999