summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/python/filesystems_deprecated.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/docs/source/python/filesystems_deprecated.rst')
-rw-r--r--src/arrow/docs/source/python/filesystems_deprecated.rst95
1 files changed, 95 insertions, 0 deletions
diff --git a/src/arrow/docs/source/python/filesystems_deprecated.rst b/src/arrow/docs/source/python/filesystems_deprecated.rst
new file mode 100644
index 000000000..04887e977
--- /dev/null
+++ b/src/arrow/docs/source/python/filesystems_deprecated.rst
@@ -0,0 +1,95 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Filesystem Interface (legacy)
+=============================
+
+.. warning::
+ This section documents the deprecated filesystem layer. You should
+ use the :ref:`new filesystem layer <filesystem>` instead.
+
+.. _hdfs:
+
+Hadoop File System (HDFS)
+-------------------------
+
+PyArrow comes with bindings to a C++-based interface to the Hadoop File
+System. You connect like so:
+
+.. code-block:: python
+
+ import pyarrow as pa
+ fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path)
+ with fs.open(path, 'rb') as f:
+ # Do something with f
+
+By default, ``pyarrow.hdfs.HadoopFileSystem`` uses libhdfs, a JNI-based
+interface to the Java Hadoop client. This library is loaded **at runtime**
+(rather than at link / library load time, since the library may not be in your
+LD_LIBRARY_PATH), and relies on some environment variables.
+
+* ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has
+ `lib/native/libhdfs.so`.
+
+* ``JAVA_HOME``: the location of your Java SDK installation.
+
+* ``ARROW_LIBHDFS_DIR`` (optional): explicit location of ``libhdfs.so`` if it is
+ installed somewhere other than ``$HADOOP_HOME/lib/native``.
+
+* ``CLASSPATH``: must contain the Hadoop jars. You can set these using:
+
+.. code-block:: shell
+
+ export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`
+
+If ``CLASSPATH`` is not set, then it will be set automatically if the
+``hadoop`` executable is in your system path, or if ``HADOOP_HOME`` is set.
+
+You can also use libhdfs3, a thirdparty C++ library for HDFS from Pivotal Labs:
+
+.. code-block:: python
+
+ fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path,
+ driver='libhdfs3')
+
+HDFS API
+~~~~~~~~
+
+.. currentmodule:: pyarrow
+
+.. autosummary::
+ :toctree: generated/
+
+ hdfs.connect
+ HadoopFileSystem.cat
+ HadoopFileSystem.chmod
+ HadoopFileSystem.chown
+ HadoopFileSystem.delete
+ HadoopFileSystem.df
+ HadoopFileSystem.disk_usage
+ HadoopFileSystem.download
+ HadoopFileSystem.exists
+ HadoopFileSystem.get_capacity
+ HadoopFileSystem.get_space_used
+ HadoopFileSystem.info
+ HadoopFileSystem.ls
+ HadoopFileSystem.mkdir
+ HadoopFileSystem.open
+ HadoopFileSystem.rename
+ HadoopFileSystem.rm
+ HadoopFileSystem.upload
+ HdfsFile