1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
|
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## Using Arrow's HDFS (Apache Hadoop Distributed File System) interface
### Build requirements
To build the integration, pass the following option to CMake
```shell
-DARROW_HDFS=on
```
For convenience, we have bundled `hdfs.h` for libhdfs from Apache Hadoop in
Arrow's thirdparty. If you wish to build against the `hdfs.h` in your installed
Hadoop distribution, set the `$HADOOP_HOME` environment variable.
### Runtime requirements
By default, the HDFS client C++ class in `libarrow_io` uses the libhdfs JNI
interface to the Java Hadoop client. This library is loaded **at runtime**
(rather than at link / library load time, since the library may not be in your
LD_LIBRARY_PATH), and relies on some environment variables.
* `HADOOP_HOME`: the root of your installed Hadoop distribution. Often has
`lib/native/libhdfs.so`.
* `JAVA_HOME`: the location of your Java SDK installation.
* `CLASSPATH`: must contain the Hadoop jars. You can set these using:
```shell
export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
```
* `ARROW_LIBHDFS_DIR` (optional): explicit location of `libhdfs.so` if it is
installed somewhere other than `$HADOOP_HOME/lib/native`.
To accommodate distribution-specific nuances, the `JAVA_HOME` variable may be
set to the root path for the Java SDK, the JRE path itself, or to the directory
containing the `libjvm` library.
### Mac Specifics
The installed location of Java on OS X can vary, however the following snippet
will set it automatically for you:
```shell
export JAVA_HOME=$(/usr/libexec/java_home)
```
Homebrew's Hadoop does not have native libs. Apache doesn't build these, so
users must build Hadoop to get the native libs. See this Stack Overflow
answer for details:
http://stackoverflow.com/a/40051353/478288
Be sure to include the path to the native libs in `JAVA_LIBRARY_PATH`:
```shell
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
```
If you get an error about needing to install Java 6, then add *BundledApp* and
*JNI* to the `JVMCapabilities` in `$JAVA_HOME/../Info.plist`. See
https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/
https://derflounder.wordpress.com/2015/08/08/modifying-oracles-java-sdk-to-run-java-applications-on-os-x/
|