summaryrefslogtreecommitdiffstats
path: root/third_party/python/scandir/README.rst
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/python/scandir/README.rst')
-rw-r--r--third_party/python/scandir/README.rst211
1 files changed, 211 insertions, 0 deletions
diff --git a/third_party/python/scandir/README.rst b/third_party/python/scandir/README.rst
new file mode 100644
index 0000000000..a5537517dd
--- /dev/null
+++ b/third_party/python/scandir/README.rst
@@ -0,0 +1,211 @@
+
+scandir, a better directory iterator and faster os.walk()
+=========================================================
+
+.. image:: https://img.shields.io/pypi/v/scandir.svg
+ :target: https://pypi.python.org/pypi/scandir
+ :alt: scandir on PyPI (Python Package Index)
+
+.. image:: https://travis-ci.org/benhoyt/scandir.svg?branch=master
+ :target: https://travis-ci.org/benhoyt/scandir
+ :alt: Travis CI tests (Linux)
+
+.. image:: https://ci.appveyor.com/api/projects/status/github/benhoyt/scandir?branch=master&svg=true
+ :target: https://ci.appveyor.com/project/benhoyt/scandir
+ :alt: Appveyor tests (Windows)
+
+
+``scandir()`` is a directory iteration function like ``os.listdir()``,
+except that instead of returning a list of bare filenames, it yields
+``DirEntry`` objects that include file type and stat information along
+with the name. Using ``scandir()`` increases the speed of ``os.walk()``
+by 2-20 times (depending on the platform and file system) by avoiding
+unnecessary calls to ``os.stat()`` in most cases.
+
+
+Now included in a Python near you!
+----------------------------------
+
+``scandir`` has been included in the Python 3.5 standard library as
+``os.scandir()``, and the related performance improvements to
+``os.walk()`` have also been included. So if you're lucky enough to be
+using Python 3.5 (release date September 13, 2015) you get the benefit
+immediately, otherwise just
+`download this module from PyPI <https://pypi.python.org/pypi/scandir>`_,
+install it with ``pip install scandir``, and then do something like
+this in your code:
+
+.. code-block:: python
+
+ # Use the built-in version of scandir/walk if possible, otherwise
+ # use the scandir module version
+ try:
+ from os import scandir, walk
+ except ImportError:
+ from scandir import scandir, walk
+
+`PEP 471 <https://www.python.org/dev/peps/pep-0471/>`_, which is the
+PEP that proposes including ``scandir`` in the Python standard library,
+was `accepted <https://mail.python.org/pipermail/python-dev/2014-July/135561.html>`_
+in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.
+
+This ``scandir`` module is intended to work on Python 2.6+ and Python
+3.2+ (and it has been tested on those versions).
+
+
+Background
+----------
+
+Python's built-in ``os.walk()`` is significantly slower than it needs to be,
+because -- in addition to calling ``listdir()`` on each directory -- it calls
+``stat()`` on each file to determine whether the filename is a directory or not.
+But both ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux/OS
+X already tell you whether the files returned are directories or not, so
+no further ``stat`` system calls are needed. In short, you can reduce the number
+of system calls from about 2N to N, where N is the total number of files and
+directories in the tree.
+
+In practice, removing all those extra system calls makes ``os.walk()`` about
+**7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS
+X.** So we're not talking about micro-optimizations. See more benchmarks
+in the "Benchmarks" section below.
+
+Somewhat relatedly, many people have also asked for a version of
+``os.listdir()`` that yields filenames as it iterates instead of returning them
+as one big list. This improves memory efficiency for iterating very large
+directories.
+
+So as well as a faster ``walk()``, scandir adds a new ``scandir()`` function.
+They're pretty easy to use, but see "The API" below for the full docs.
+
+
+Benchmarks
+----------
+
+Below are results showing how many times as fast ``scandir.walk()`` is than
+``os.walk()`` on various systems, found by running ``benchmark.py`` with no
+arguments:
+
+==================== ============== =============
+System version Python version Times as fast
+==================== ============== =============
+Windows 7 64-bit 2.7.7 64-bit 10.4
+Windows 7 64-bit SSD 2.7.7 64-bit 10.3
+Windows 7 64-bit NFS 2.7.6 64-bit 36.8
+Windows 7 64-bit SSD 3.4.1 64-bit 9.9
+Windows 7 64-bit SSD 3.5.0 64-bit 9.5
+CentOS 6.2 64-bit 2.6.6 64-bit 3.9
+Ubuntu 14.04 64-bit 2.7.6 64-bit 5.8
+Mac OS X 10.9.3 2.7.5 64-bit 3.8
+==================== ============== =============
+
+All of the above tests were done using the fast C version of scandir
+(source code in ``_scandir.c``).
+
+Note that the gains are less than the above on smaller directories and greater
+on larger directories. This is why ``benchmark.py`` creates a test directory
+tree with a standardized size.
+
+
+The API
+-------
+
+walk()
+~~~~~~
+
+The API for ``scandir.walk()`` is exactly the same as ``os.walk()``, so just
+`read the Python docs <https://docs.python.org/3.5/library/os.html#os.walk>`_.
+
+scandir()
+~~~~~~~~~
+
+The full docs for ``scandir()`` and the ``DirEntry`` objects it yields are
+available in the `Python documentation here <https://docs.python.org/3.5/library/os.html#os.scandir>`_.
+But below is a brief summary as well.
+
+ scandir(path='.') -> iterator of DirEntry objects for given path
+
+Like ``listdir``, ``scandir`` calls the operating system's directory
+iteration system calls to get the names of the files in the given
+``path``, but it's different from ``listdir`` in two ways:
+
+* Instead of returning bare filename strings, it returns lightweight
+ ``DirEntry`` objects that hold the filename string and provide
+ simple methods that allow access to the additional data the
+ operating system may have returned.
+
+* It returns a generator instead of a list, so that ``scandir`` acts
+ as a true iterator instead of returning the full list immediately.
+
+``scandir()`` yields a ``DirEntry`` object for each file and
+sub-directory in ``path``. Just like ``listdir``, the ``'.'``
+and ``'..'`` pseudo-directories are skipped, and the entries are
+yielded in system-dependent order. Each ``DirEntry`` object has the
+following attributes and methods:
+
+* ``name``: the entry's filename, relative to the scandir ``path``
+ argument (corresponds to the return values of ``os.listdir``)
+
+* ``path``: the entry's full path name (not necessarily an absolute
+ path) -- the equivalent of ``os.path.join(scandir_path, entry.name)``
+
+* ``is_dir(*, follow_symlinks=True)``: similar to
+ ``pathlib.Path.is_dir()``, but the return value is cached on the
+ ``DirEntry`` object; doesn't require a system call in most cases;
+ don't follow symbolic links if ``follow_symlinks`` is False
+
+* ``is_file(*, follow_symlinks=True)``: similar to
+ ``pathlib.Path.is_file()``, but the return value is cached on the
+ ``DirEntry`` object; doesn't require a system call in most cases;
+ don't follow symbolic links if ``follow_symlinks`` is False
+
+* ``is_symlink()``: similar to ``pathlib.Path.is_symlink()``, but the
+ return value is cached on the ``DirEntry`` object; doesn't require a
+ system call in most cases
+
+* ``stat(*, follow_symlinks=True)``: like ``os.stat()``, but the
+ return value is cached on the ``DirEntry`` object; does not require a
+ system call on Windows (except for symlinks); don't follow symbolic links
+ (like ``os.lstat()``) if ``follow_symlinks`` is False
+
+* ``inode()``: return the inode number of the entry; the return value
+ is cached on the ``DirEntry`` object
+
+Here's a very simple example of ``scandir()`` showing use of the
+``DirEntry.name`` attribute and the ``DirEntry.is_dir()`` method:
+
+.. code-block:: python
+
+ def subdirs(path):
+ """Yield directory names not starting with '.' under given path."""
+ for entry in os.scandir(path):
+ if not entry.name.startswith('.') and entry.is_dir():
+ yield entry.name
+
+This ``subdirs()`` function will be significantly faster with scandir
+than ``os.listdir()`` and ``os.path.isdir()`` on both Windows and POSIX
+systems, especially on medium-sized or large directories.
+
+
+Further reading
+---------------
+
+* `The Python docs for scandir <https://docs.python.org/3.5/library/os.html#os.scandir>`_
+* `PEP 471 <https://www.python.org/dev/peps/pep-0471/>`_, the
+ (now-accepted) Python Enhancement Proposal that proposed adding
+ ``scandir`` to the standard library -- a lot of details here,
+ including rejected ideas and previous discussion
+
+
+Flames, comments, bug reports
+-----------------------------
+
+Please send flames, comments, and questions about scandir to Ben Hoyt:
+
+http://benhoyt.com/
+
+File bug reports for the version in the Python 3.5 standard library
+`here <https://docs.python.org/3.5/bugs.html>`_, or file bug reports
+or feature requests for this module at the GitHub project page:
+
+https://github.com/benhoyt/scandir