summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/developers/cpp/development.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/docs/source/developers/cpp/development.rst')
-rw-r--r--src/arrow/docs/source/developers/cpp/development.rst294
1 files changed, 294 insertions, 0 deletions
diff --git a/src/arrow/docs/source/developers/cpp/development.rst b/src/arrow/docs/source/developers/cpp/development.rst
new file mode 100644
index 000000000..4098f1c4e
--- /dev/null
+++ b/src/arrow/docs/source/developers/cpp/development.rst
@@ -0,0 +1,294 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+======================
+Development Guidelines
+======================
+
+This section provides information for developers who wish to contribute to the
+C++ codebase.
+
+.. note::
+
+ Since most of the project's developers work on Linux or macOS, not all
+ features or developer tools are uniformly supported on Windows. If you are
+ on Windows, have a look at :ref:`developers-cpp-windows`.
+
+Compiler warning levels
+=======================
+
+The ``BUILD_WARNING_LEVEL`` CMake option switches between sets of predetermined
+compiler warning levels that we use for code tidiness. For release builds, the
+default warning level is ``PRODUCTION``, while for debug builds the default is
+``CHECKIN``.
+
+When using ``CHECKIN`` for debug builds, ``-Werror`` is added when using gcc
+and clang, causing build failures for any warning, and ``/WX`` is set with MSVC
+having the same effect.
+
+Running unit tests
+==================
+
+The ``-DARROW_BUILD_TESTS=ON`` CMake option enables building of unit test
+executables. You can then either run them individually, by launching the
+desired executable, or run them all at once by launching the ``ctest``
+executable (which is part of the CMake suite).
+
+A possible invocation is something like::
+
+ $ ctest -j16 --output-on-failure
+
+where the ``-j16`` option runs up to 16 tests in parallel, taking advantage
+of multiple CPU cores and hardware threads.
+
+Running benchmarks
+==================
+
+The ``-DARROW_BUILD_BENCHMARKS=ON`` CMake option enables building of benchmark
+executables. You can then run benchmarks individually by launching the
+corresponding executable from the command line, e.g.::
+
+ $ ./build/release/arrow-builder-benchmark
+
+.. note::
+ For meaningful benchmark numbers, it is very strongly recommended to build
+ in ``Release`` mode, so as to enable compiler optimizations.
+
+Code Style, Linting, and CI
+===========================
+
+This project follows `Google's C++ Style Guide
+<https://google.github.io/styleguide/cppguide.html>`_ with minor exceptions:
+
+* We relax the line length restriction to 90 characters.
+* We use the ``NULLPTR`` macro in header files (instead of ``nullptr``) defined
+ in ``src/arrow/util/macros.h`` to support building C++/CLI (ARROW-1134)
+* We relax the guide's rules regarding structs. For public headers we should
+ use struct only for objects that are principally simple data containers where
+ it is OK to expose all the internal members and any methods are primarily
+ conveniences. For private headers the rules are relaxed further and structs
+ can be used where convenient for types that do not need access control even
+ though they may not be simple data containers.
+
+Our continuous integration builds on GitHub Actions run the unit test
+suites on a variety of platforms and configuration, including using
+Address Sanitizer and Undefined Behavior Sanitizer to check for various
+patterns of misbehaviour such as memory leaks. In addition, the
+codebase is subjected to a number of code style and code cleanliness checks.
+
+In order to have a passing CI build, your modified git branch must pass the
+following checks:
+
+* C++ builds with the project's active version of ``clang`` without
+ compiler warnings with ``-DBUILD_WARNING_LEVEL=CHECKIN``. Note that
+ there are classes of warnings (such as ``-Wdocumentation``, see more
+ on this below) that are not caught by ``gcc``.
+* CMake files pass style checks, can be fixed by running
+ ``archery lint --cmake-format --fix``. This requires Python
+ 3 and `cmake_format <https://github.com/cheshirekow/cmake_format>`_ (note:
+ this currently does not work on Windows)
+* Passes various C++ (and others) style checks, checked with the ``lint``
+ subcommand to :ref:`Archery <archery>`. This can also be fixed locally
+ by running ``archery lint --cpplint --fix``.
+
+In order to account for variations in the behavior of ``clang-format`` between
+major versions of LLVM, we pin the version of ``clang-format`` used (current
+LLVM 8).
+
+Depending on how you installed clang-format, the build system may not be able
+to find it. You can provide an explicit path to your LLVM installation (or the
+root path for the clang tools) with the environment variable
+`$CLANG_TOOLS_PATH` or by passing ``-DClangTools_PATH=$PATH_TO_CLANG_TOOLS`` when
+invoking CMake.
+
+To make linting more reproducible for everyone, we provide a ``docker-compose``
+target that is executable from the root of the repository:
+
+.. code-block:: shell
+
+ docker-compose run ubuntu-lint
+
+Cleaning includes with include-what-you-use (IWYU)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We occasionally use Google's `include-what-you-use
+<https://github.com/include-what-you-use/include-what-you-use>`_ tool, also
+known as IWYU, to remove unnecessary imports.
+
+To begin using IWYU, you must first build it by following the instructions in
+the project's documentation. Once the ``include-what-you-use`` executable is in
+your ``$PATH``, you must run CMake with ``-DCMAKE_EXPORT_COMPILE_COMMANDS=ON``
+in a new out-of-source CMake build directory like so:
+
+.. code-block:: shell
+
+ mkdir -p $ARROW_ROOT/cpp/iwyu
+ cd $ARROW_ROOT/cpp/iwyu
+ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
+ -DARROW_PYTHON=ON \
+ -DARROW_PARQUET=ON \
+ -DARROW_FLIGHT=ON \
+ -DARROW_PLASMA=ON \
+ -DARROW_GANDIVA=ON \
+ -DARROW_BUILD_BENCHMARKS=ON \
+ -DARROW_BUILD_BENCHMARKS_REFERENCE=ON \
+ -DARROW_BUILD_TESTS=ON \
+ -DARROW_BUILD_UTILITIES=ON \
+ -DARROW_S3=ON \
+ -DARROW_WITH_BROTLI=ON \
+ -DARROW_WITH_BZ2=ON \
+ -DARROW_WITH_LZ4=ON \
+ -DARROW_WITH_SNAPPY=ON \
+ -DARROW_WITH_ZLIB=ON \
+ -DARROW_WITH_ZSTD=ON ..
+
+In order for IWYU to run on the desired component in the codebase, it must be
+enabled by the CMake configuration flags. Once this is done, you can run IWYU
+on the whole codebase by running a helper ``iwyu.sh`` script:
+
+.. code-block:: shell
+
+ IWYU_SH=$ARROW_ROOT/cpp/build-support/iwyu/iwyu.sh
+ ./$IWYU_SH
+
+Since this is very time consuming, you can check a subset of files matching
+some string pattern with the special "match" option
+
+.. code-block:: shell
+
+ ./$IWYU_SH match $PATTERN
+
+For example, if you wanted to do IWYU checks on all files in
+``src/arrow/array``, you could run
+
+.. code-block:: shell
+
+ ./$IWYU_SH match arrow/array
+
+Checking for ABI and API stability
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To build ABI compliance reports, you need to install the two tools
+``abi-dumper`` and ``abi-compliance-checker``.
+
+Build Arrow C++ in Debug mode, alternatively you could use ``-Og`` which also
+builds with the necessary symbols but includes a bit of code optimization.
+Once the build has finished, you can generate ABI reports using:
+
+.. code-block:: shell
+
+ abi-dumper -lver 9 debug/libarrow.so -o ABI-9.dump
+
+The above version number is freely selectable. As we want to compare versions,
+you should now ``git checkout`` the version you want to compare it to and re-run
+the above command using a different version number. Once both reports are
+generated, you can build a comparison report using
+
+.. code-block:: shell
+
+ abi-compliance-checker -l libarrow -d1 ABI-PY-9.dump -d2 ABI-PY-10.dump
+
+The report is then generated in ``compat_reports/libarrow`` as a HTML.
+
+API Documentation
+=================
+
+We use Doxygen style comments (``///``) in header files for comments
+that we wish to show up in API documentation for classes and
+functions.
+
+When using ``clang`` and building with
+``-DBUILD_WARNING_LEVEL=CHECKIN``, the ``-Wdocumentation`` flag is
+used which checks for some common documentation inconsistencies, like
+documenting some, but not all function parameters with ``\param``. See
+the `LLVM documentation warnings section
+<https://releases.llvm.org/7.0.1/tools/clang/docs/DiagnosticsReference.html#wdocumentation>`_
+for more about this.
+
+While we publish the API documentation as part of the main Sphinx-based
+documentation site, you can also build the C++ API documentation anytime using
+Doxygen. Run the following command from the ``cpp/apidoc`` directory:
+
+.. code-block:: shell
+
+ doxygen Doxyfile
+
+This requires `Doxygen <https://www.doxygen.org>`_ to be installed.
+
+Apache Parquet Development
+==========================
+
+To build the C++ libraries for Apache Parquet, add the flag
+``-DARROW_PARQUET=ON`` when invoking CMake.
+To build Apache Parquet with encryption support, add the flag
+``-DPARQUET_REQUIRE_ENCRYPTION=ON`` when invoking CMake. The Parquet libraries and unit tests
+can be built with the ``parquet`` make target:
+
+.. code-block:: shell
+
+ make parquet
+
+On Linux and macOS if you do not have Apache Thrift installed on your system,
+or you are building with ``-DThrift_SOURCE=BUNDLED``, you must install
+``bison`` and ``flex`` packages. On Windows we handle these build dependencies
+automatically when building Thrift from source.
+
+Running ``ctest -L unittest`` will run all built C++ unit tests, while ``ctest -L
+parquet`` will run only the Parquet unit tests. The unit tests depend on an
+environment variable ``PARQUET_TEST_DATA`` that depends on a git submodule to the
+repository https://github.com/apache/parquet-testing:
+
+.. code-block:: shell
+
+ git submodule update --init
+ export PARQUET_TEST_DATA=$ARROW_ROOT/cpp/submodules/parquet-testing/data
+
+Here ``$ARROW_ROOT`` is the absolute path to the Arrow codebase.
+
+Arrow Flight RPC
+================
+
+In addition to the Arrow dependencies, Flight requires:
+
+* gRPC (>= 1.14, roughly)
+* Protobuf (>= 3.6, earlier versions may work)
+* c-ares (used by gRPC)
+
+By default, Arrow will try to download and build these dependencies
+when building Flight.
+
+The optional ``flight`` libraries and tests can be built by passing
+``-DARROW_FLIGHT=ON``.
+
+.. code-block:: shell
+
+ cmake .. -DARROW_FLIGHT=ON -DARROW_BUILD_TESTS=ON
+ make
+
+You can also use existing installations of the extra dependencies.
+When building, set the environment variables ``gRPC_ROOT`` and/or
+``Protobuf_ROOT`` and/or ``c-ares_ROOT``.
+
+We are developing against recent versions of gRPC, and the versions. The
+``grpc-cpp`` package available from https://conda-forge.org/ is one reliable
+way to obtain gRPC in a cross-platform way. You may try using system libraries
+for gRPC and Protobuf, but these are likely to be too old. On macOS, you can
+try `Homebrew <https://brew.sh/>`_:
+
+.. code-block:: shell
+
+ brew install grpc