From e6918187568dbd01842d8d1d2c808ce16a894239 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 21 Apr 2024 13:54:28 +0200 Subject: Adding upstream version 18.2.2. Signed-off-by: Daniel Baumann --- src/arrow/dev/conbench_envs/README.md | 214 ++++++++++++++++++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 src/arrow/dev/conbench_envs/README.md (limited to 'src/arrow/dev/conbench_envs/README.md') diff --git a/src/arrow/dev/conbench_envs/README.md b/src/arrow/dev/conbench_envs/README.md new file mode 100644 index 000000000..5a4eb58b2 --- /dev/null +++ b/src/arrow/dev/conbench_envs/README.md @@ -0,0 +1,214 @@ + +# Benchmark Builds Env and Hooks +This directory contains: +- [benchmarks.env](benchmarks.env) - list of env vars used for building Arrow C++/Python/R/Java/JavaScript and running benchmarks using [conbench](https://ursalabs.org/blog/announcing-conbench/). +- [hooks.sh](hooks.sh) - hooks used by @ursabot benchmark builds that are triggered by `@ursabot please benchmark` PR comments. + +## How to add or update Arrow build and run env vars used by `@ursabot` benchmark builds +1. Create `apache/arrow` PR +2. Update or add env var value in [benchmarks.env](../../dev/conbench_envs/benchmarks.env) +3. Add `@ursabot please benchmark` comment to PR +4. Once benchmark builds are done, benchmark results can be viewed via compare/runs links in the PR comment where +- baseline = PR base HEAD commit with unaltered `/dev/conbench_envs/benchmarks.env` +- contender = PR branch HEAD commit with overridden `/dev/conbench_envs/benchmarks.env` + +## Why do`@ursabot` benchmark builds need `hooks.sh`? +`@ursabot` benchmark builds are maintained in Ursa's private repo. +Benchmark builds use `hooks.sh` functions as hooks to create conda env with Arrow dependencies and build Arrow C++/Python/R/Java/JavaScript from source for a specific Arrow repo's commit. + +Defining hooks in Arrow repo allows benchmark builds for a specific commit to be +compatible with the files/scripts *in that commit* which are used for installing Arrow +dependencies and building Arrow. This allows Arrow contributors to asses the perfomance +implications of different build options, dependency versions, etc by updating +`hooks.sh`. + +## Can other repos and services use `benchmarks.env` and `hooks.sh`? + +Yes, other repos and services are welcome to use `benchmarks.env` and `hooks.sh` as long as +- existing hooks are not removed or renamed. +- function definitions for exiting hooks can only be updated in the Arrow commit where Arrow build scripts or files with dependencies have been renamed, moved or added. +- benchmark builds are run using `@ursabot please benchmark` PR comment to confirm that function definition updates do not break benchmark builds. + +## How can other repos and services use `benchmarks.env` and `hooks.sh` to setup benchmark env? +Here are steps how `@ursabot` benchmark builds use `benchmarks.env` and `hooks.sh` to setup benchmarking env on Ubuntu: + +### 1. Install Arrow dependencies + sudo su + apt-get update -y -q && \ + apt-get install -y -q --no-install-recommends \ + autoconf \ + ca-certificates \ + ccache \ + cmake \ + g++ \ + gcc \ + gdb \ + git \ + libbenchmark-dev \ + libboost-filesystem-dev \ + libboost-regex-dev \ + libboost-system-dev \ + libbrotli-dev \ + libbz2-dev \ + libgflags-dev \ + libcurl4-openssl-dev \ + libgoogle-glog-dev \ + liblz4-dev \ + libprotobuf-dev \ + libprotoc-dev \ + libre2-dev \ + libsnappy-dev \ + libssl-dev \ + libthrift-dev \ + libutf8proc-dev \ + libzstd-dev \ + make \ + ninja-build \ + pkg-config \ + protobuf-compiler \ + rapidjson-dev \ + tzdata \ + wget && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists* + + apt-get update -y -q && \ + apt-get install -y -q \ + python3 \ + python3-pip \ + python3-dev && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* + +### 2. Install Arrow dependencies for Java + sudo su + apt-get install openjdk-8-jdk + apt-get install maven + +Verify that you have at least these versions of `java`, `javac` and `maven`: + + # java -version + openjdk version "1.8.0_292" + .. + # javac -version + javac 1.8.0_292 + ... + # mvn -version + Apache Maven 3.6.3 + ... + +### 3. Install Arrow dependencies for Java Script + sudo apt update + sudo apt -y upgrade + sudo apt update + sudo apt -y install curl dirmngr apt-transport-https lsb-release ca-certificates + curl -fsSL https://deb.nodesource.com/setup_14.x | sudo -E bash - + sudo apt-get install -y nodejs + sudo apt -y install yarn + sudo apt -y install gcc g++ make + +Verify that you have at least these versions of `node` and `yarn`: + + # node --version + v14.17.2 + ... + # yarn --version + 1.22.5 + ... + +### 4. Install Conda + sudo apt install curl + curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh + sudo bash Miniconda3-latest-Linux-x86_64.sh + +### 5. Set env vars: + export ARROW_REPO=https://github.com/apache/arrow.git + export BENCHMARKABLE=e6e9e6ea52b7a8f2682ffc4160168c936ca1d3e6 + export BENCHMARKABLE_TYPE=arrow-commit + export PYTHON_VERSION=3.8 + export CONBENCH_EMAIL=... + export CONBENCH_URL="https://conbench.ursa.dev" + export CONBENCH_PASSWORD=... + export MACHINE=... + +### 6. Use `create_conda_env_with_arrow_python` hook to create conda env and build Arrow C++ and Arrow Python + git clone "${ARROW_REPO}" + pushd arrow + git fetch -v --prune -- origin "${BENCHMARKABLE}" + git checkout -f "${BENCHMARKABLE}" + source dev/conbench_envs/hooks.sh create_conda_env_with_arrow_python + popd + +### 7. Install conbench + git clone https://github.com/ursacomputing/conbench.git + pushd conbench + pip install -r requirements-cli.txt + pip install -U PyYAML + python setup.py install + popd + +### 8. Setup benchmarks repo + git clone https://github.com/ursacomputing/benchmarks.git + pushd benchmarks + python setup.py develop + popd + +### 9. Setup conbench credentials + pushd benchmarks + touch .conbench + echo "url: $CONBENCH_URL" >> .conbench + echo "email: $CONBENCH_EMAIL" >> .conbench + echo "password: $CONBENCH_PASSWORD" >> .conbench + echo "host_name: $MACHINE" >> .conbench + popd + +### 10. Run Python benchmarks + cd benchmarks + conbench file-read ALL --iterations=3 --all=true --drop-caches=true + +### 11. Use `install_archery` hook to setup archery and run C++ benchmarks + pushd arrow + source dev/conbench_envs/hooks.sh install_archery + popd + cd benchmarks + conbench cpp-micro --iterations=1 + +### 12. Use `build_arrow_r` hook to build Arrow R and run R benchmarks + pushd arrow + source dev/conbench_envs/hooks.sh build_arrow_r + popd + R -e "remotes::install_github('ursacomputing/arrowbench')" + cd benchmarks + conbench dataframe-to-table ALL --iterations=3 --drop-caches=true --language=R + +### 13. Use `build_arrow_java` and `install_archery` hooks to build Arrow Java and run Java benchmarks + pushd arrow + source dev/conbench_envs/hooks.sh build_arrow_java + source dev/conbench_envs/hooks.sh install_archery + popd + cd benchmarks + conbench java-micro --iterations=1 + +### 14. Use `install_java_script_project_dependencies` hook to install Java Script dependencies and run Java Script benchmarks + pushd arrow + source dev/conbench_envs/hooks.sh install_java_script_project_dependencies + popd + cd benchmarks + conbench js-micro -- cgit v1.2.3