## Using Arrow's HDFS (Apache Hadoop Distributed File System) interface ### Build requirements To build the integration, pass the following option to CMake ```shell -DARROW_HDFS=on ``` For convenience, we have bundled `hdfs.h` for libhdfs from Apache Hadoop in Arrow's thirdparty. If you wish to build against the `hdfs.h` in your installed Hadoop distribution, set the `$HADOOP_HOME` environment variable. ### Runtime requirements By default, the HDFS client C++ class in `libarrow_io` uses the libhdfs JNI interface to the Java Hadoop client. This library is loaded **at runtime** (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables. * `HADOOP_HOME`: the root of your installed Hadoop distribution. Often has `lib/native/libhdfs.so`. * `JAVA_HOME`: the location of your Java SDK installation. * `CLASSPATH`: must contain the Hadoop jars. You can set these using: ```shell export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob` ``` * `ARROW_LIBHDFS_DIR` (optional): explicit location of `libhdfs.so` if it is installed somewhere other than `$HADOOP_HOME/lib/native`. To accommodate distribution-specific nuances, the `JAVA_HOME` variable may be set to the root path for the Java SDK, the JRE path itself, or to the directory containing the `libjvm` library. ### Mac Specifics The installed location of Java on OS X can vary, however the following snippet will set it automatically for you: ```shell export JAVA_HOME=$(/usr/libexec/java_home) ``` Homebrew's Hadoop does not have native libs. Apache doesn't build these, so users must build Hadoop to get the native libs. See this Stack Overflow answer for details: http://stackoverflow.com/a/40051353/478288 Be sure to include the path to the native libs in `JAVA_LIBRARY_PATH`: ```shell export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH ``` If you get an error about needing to install Java 6, then add *BundledApp* and *JNI* to the `JVMCapabilities` in `$JAVA_HOME/../Info.plist`. See https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/ https://derflounder.wordpress.com/2015/08/08/modifying-oracles-java-sdk-to-run-java-applications-on-os-x/