summaryrefslogtreecommitdiffstats
path: root/src/arrow/cpp/apidoc/HDFS.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/cpp/apidoc/HDFS.md')
-rw-r--r--src/arrow/cpp/apidoc/HDFS.md83
1 files changed, 83 insertions, 0 deletions
diff --git a/src/arrow/cpp/apidoc/HDFS.md b/src/arrow/cpp/apidoc/HDFS.md
new file mode 100644
index 000000000..d3671fb76
--- /dev/null
+++ b/src/arrow/cpp/apidoc/HDFS.md
@@ -0,0 +1,83 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+## Using Arrow's HDFS (Apache Hadoop Distributed File System) interface
+
+### Build requirements
+
+To build the integration, pass the following option to CMake
+
+```shell
+-DARROW_HDFS=on
+```
+
+For convenience, we have bundled `hdfs.h` for libhdfs from Apache Hadoop in
+Arrow's thirdparty. If you wish to build against the `hdfs.h` in your installed
+Hadoop distribution, set the `$HADOOP_HOME` environment variable.
+
+### Runtime requirements
+
+By default, the HDFS client C++ class in `libarrow_io` uses the libhdfs JNI
+interface to the Java Hadoop client. This library is loaded **at runtime**
+(rather than at link / library load time, since the library may not be in your
+LD_LIBRARY_PATH), and relies on some environment variables.
+
+* `HADOOP_HOME`: the root of your installed Hadoop distribution. Often has
+`lib/native/libhdfs.so`.
+* `JAVA_HOME`: the location of your Java SDK installation.
+* `CLASSPATH`: must contain the Hadoop jars. You can set these using:
+
+```shell
+export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
+```
+
+* `ARROW_LIBHDFS_DIR` (optional): explicit location of `libhdfs.so` if it is
+installed somewhere other than `$HADOOP_HOME/lib/native`.
+
+To accommodate distribution-specific nuances, the `JAVA_HOME` variable may be
+set to the root path for the Java SDK, the JRE path itself, or to the directory
+containing the `libjvm` library.
+
+### Mac Specifics
+
+The installed location of Java on OS X can vary, however the following snippet
+will set it automatically for you:
+
+```shell
+export JAVA_HOME=$(/usr/libexec/java_home)
+```
+
+Homebrew's Hadoop does not have native libs. Apache doesn't build these, so
+users must build Hadoop to get the native libs. See this Stack Overflow
+answer for details:
+
+http://stackoverflow.com/a/40051353/478288
+
+Be sure to include the path to the native libs in `JAVA_LIBRARY_PATH`:
+
+```shell
+export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
+```
+
+If you get an error about needing to install Java 6, then add *BundledApp* and
+*JNI* to the `JVMCapabilities` in `$JAVA_HOME/../Info.plist`. See
+
+https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/
+
+https://derflounder.wordpress.com/2015/08/08/modifying-oracles-java-sdk-to-run-java-applications-on-os-x/