summaryrefslogtreecommitdiffstats
path: root/src/arrow/dev/conbench_envs/README.md
blob: 5a4eb58b2447ca0cab9ea6cf03af1f2daa6e3076 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
<!---
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either express or implied.  See the License for the
  specific language governing permissions and limitations
  under the License.
-->
# Benchmark Builds Env and Hooks
This directory contains: 
- [benchmarks.env](benchmarks.env) - list of env vars used for building Arrow C++/Python/R/Java/JavaScript and running benchmarks using [conbench](https://ursalabs.org/blog/announcing-conbench/).
- [hooks.sh](hooks.sh) - hooks used by <b>@ursabot</b> benchmark builds that are triggered by `@ursabot please benchmark` PR comments. 

## How to add or update Arrow build and run env vars used by `@ursabot` benchmark builds
1. Create `apache/arrow` PR
2. Update or add env var value in [benchmarks.env](../../dev/conbench_envs/benchmarks.env)
3. Add `@ursabot please benchmark` comment to PR
4. Once benchmark builds are done, benchmark results can be viewed via compare/runs links in the PR comment where
- baseline = PR base HEAD commit with unaltered `/dev/conbench_envs/benchmarks.env`
- contender = PR branch HEAD commit with overridden `/dev/conbench_envs/benchmarks.env`

## Why do`@ursabot` benchmark builds need `hooks.sh`?
`@ursabot` benchmark builds are maintained in Ursa's private repo.
Benchmark builds use `hooks.sh` functions as hooks to create conda env with Arrow dependencies and build Arrow C++/Python/R/Java/JavaScript from source for a specific Arrow repo's commit.

Defining hooks in Arrow repo allows benchmark builds for a specific commit to be
compatible with the files/scripts *in that commit* which are used for installing Arrow
dependencies and building Arrow. This allows Arrow contributors to asses the perfomance
implications of different build options, dependency versions, etc by updating
`hooks.sh`.

## Can other repos and services use `benchmarks.env` and `hooks.sh`?

Yes, other repos and services are welcome to use `benchmarks.env` and `hooks.sh` as long as 
- existing hooks are not removed or renamed.
- function definitions for exiting hooks can only be updated in the Arrow commit where Arrow build scripts or files with dependencies have been renamed, moved or added.
- benchmark builds are run using `@ursabot please benchmark` PR comment to confirm that function definition updates do not break benchmark builds.

## How can other repos and services use `benchmarks.env` and `hooks.sh` to setup benchmark env?
Here are steps how `@ursabot` benchmark builds use `benchmarks.env` and `hooks.sh` to setup benchmarking env on Ubuntu:

### 1. Install Arrow dependencies
    sudo su
    apt-get update -y -q && \
        apt-get install -y -q --no-install-recommends \
            autoconf \
            ca-certificates \
            ccache \
            cmake \
            g++ \
            gcc \
            gdb \
            git \
            libbenchmark-dev \
            libboost-filesystem-dev \
            libboost-regex-dev \
            libboost-system-dev \
            libbrotli-dev \
            libbz2-dev \
            libgflags-dev \
            libcurl4-openssl-dev \
            libgoogle-glog-dev \
            liblz4-dev \
            libprotobuf-dev \
            libprotoc-dev \
            libre2-dev \
            libsnappy-dev \
            libssl-dev \
            libthrift-dev \
            libutf8proc-dev \
            libzstd-dev \
            make \
            ninja-build \
            pkg-config \
            protobuf-compiler \
            rapidjson-dev \
            tzdata \
            wget && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists*

    apt-get update -y -q && \
        apt-get install -y -q \
            python3 \
            python3-pip \
            python3-dev && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*

### 2. Install Arrow dependencies for Java
    sudo su
    apt-get install openjdk-8-jdk
    apt-get install maven
    
Verify that you have at least these versions of `java`, `javac` and `maven`:
    
    # java -version
    openjdk version "1.8.0_292"
    ..
    # javac -version
    javac 1.8.0_292
    ...
    # mvn -version
    Apache Maven 3.6.3
    ...

### 3. Install Arrow dependencies for Java Script
    sudo apt update
    sudo apt -y upgrade
    sudo apt update
    sudo apt -y install curl dirmngr apt-transport-https lsb-release ca-certificates
    curl -fsSL https://deb.nodesource.com/setup_14.x | sudo -E bash -
    sudo apt-get install -y nodejs
    sudo apt -y install yarn
    sudo apt -y install gcc g++ make

Verify that you have at least these versions of `node` and `yarn`:

    # node --version
    v14.17.2
    ...
    # yarn --version
    1.22.5
    ...
    
### 4. Install Conda
    sudo apt install curl
    curl -LO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    sudo bash Miniconda3-latest-Linux-x86_64.sh
    
### 5. Set env vars:
    export ARROW_REPO=https://github.com/apache/arrow.git
    export BENCHMARKABLE=e6e9e6ea52b7a8f2682ffc4160168c936ca1d3e6
    export BENCHMARKABLE_TYPE=arrow-commit
    export PYTHON_VERSION=3.8
    export CONBENCH_EMAIL=...
    export CONBENCH_URL="https://conbench.ursa.dev"
    export CONBENCH_PASSWORD=...
    export MACHINE=...

### 6. Use `create_conda_env_with_arrow_python` hook to create conda env and build Arrow C++ and Arrow Python
    git clone "${ARROW_REPO}"
    pushd arrow
    git fetch -v --prune -- origin "${BENCHMARKABLE}"
    git checkout -f "${BENCHMARKABLE}"
    source dev/conbench_envs/hooks.sh create_conda_env_with_arrow_python
    popd
    
### 7. Install conbench
    git clone https://github.com/ursacomputing/conbench.git
    pushd conbench
    pip install -r requirements-cli.txt
    pip install -U PyYAML
    python setup.py install
    popd

### 8. Setup benchmarks repo
    git clone https://github.com/ursacomputing/benchmarks.git
    pushd benchmarks
    python setup.py develop
    popd
    
### 9. Setup conbench credentials
    pushd benchmarks
    touch .conbench
    echo "url: $CONBENCH_URL" >> .conbench
    echo "email: $CONBENCH_EMAIL" >> .conbench
    echo "password: $CONBENCH_PASSWORD" >> .conbench
    echo "host_name: $MACHINE" >> .conbench
    popd
 
### 10. Run Python benchmarks
    cd benchmarks
    conbench file-read ALL --iterations=3 --all=true --drop-caches=true 

### 11. Use `install_archery` hook to setup archery and run C++ benchmarks
    pushd arrow
    source dev/conbench_envs/hooks.sh install_archery
    popd
    cd benchmarks
    conbench cpp-micro --iterations=1

### 12. Use `build_arrow_r` hook to build Arrow R and run R benchmarks
    pushd arrow
    source dev/conbench_envs/hooks.sh build_arrow_r
    popd
    R -e "remotes::install_github('ursacomputing/arrowbench')"
    cd benchmarks
    conbench dataframe-to-table ALL --iterations=3 --drop-caches=true --language=R

### 13. Use `build_arrow_java` and `install_archery` hooks to build Arrow Java and run Java benchmarks
    pushd arrow
    source dev/conbench_envs/hooks.sh build_arrow_java
    source dev/conbench_envs/hooks.sh install_archery
    popd
    cd benchmarks
    conbench java-micro --iterations=1

### 14. Use `install_java_script_project_dependencies` hook to install Java Script dependencies and run Java Script benchmarks
    pushd arrow
    source dev/conbench_envs/hooks.sh install_java_script_project_dependencies
    popd
    cd benchmarks
    conbench js-micro