diff options
Diffstat (limited to 'third_party/python/diskcache/README.rst')
-rw-r--r-- | third_party/python/diskcache/README.rst | 404 |
1 files changed, 404 insertions, 0 deletions
diff --git a/third_party/python/diskcache/README.rst b/third_party/python/diskcache/README.rst new file mode 100644 index 0000000000..57b7a2e21d --- /dev/null +++ b/third_party/python/diskcache/README.rst @@ -0,0 +1,404 @@ +DiskCache: Disk Backed Cache +============================ + +`DiskCache`_ is an Apache2 licensed disk and file backed cache library, written +in pure-Python, and compatible with Django. + +The cloud-based computing of 2019 puts a premium on memory. Gigabytes of empty +space is left on disks as processes vie for memory. Among these processes is +Memcached (and sometimes Redis) which is used as a cache. Wouldn't it be nice +to leverage empty disk space for caching? + +Django is Python's most popular web framework and ships with several caching +backends. Unfortunately the file-based cache in Django is essentially +broken. The culling method is random and large caches repeatedly scan a cache +directory which slows linearly with growth. Can you really allow it to take +sixty milliseconds to store a key in a cache with a thousand items? + +In Python, we can do better. And we can do it in pure-Python! + +:: + + In [1]: import pylibmc + In [2]: client = pylibmc.Client(['127.0.0.1'], binary=True) + In [3]: client[b'key'] = b'value' + In [4]: %timeit client[b'key'] + + 10000 loops, best of 3: 25.4 µs per loop + + In [5]: import diskcache as dc + In [6]: cache = dc.Cache('tmp') + In [7]: cache[b'key'] = b'value' + In [8]: %timeit cache[b'key'] + + 100000 loops, best of 3: 11.8 µs per loop + +**Note:** Micro-benchmarks have their place but are not a substitute for real +measurements. DiskCache offers cache benchmarks to defend its performance +claims. Micro-optimizations are avoided but your mileage may vary. + +DiskCache efficiently makes gigabytes of storage space available for +caching. By leveraging rock-solid database libraries and memory-mapped files, +cache performance can match and exceed industry-standard solutions. There's no +need for a C compiler or running another process. Performance is a feature and +testing has 100% coverage with unit tests and hours of stress. + +Testimonials +------------ + +`Daren Hasenkamp`_, Founder -- + + "It's a useful, simple API, just like I love about Redis. It has reduced + the amount of queries hitting my Elasticsearch cluster by over 25% for a + website that gets over a million users/day (100+ hits/second)." + +`Mathias Petermann`_, Senior Linux System Engineer -- + + "I implemented it into a wrapper for our Ansible lookup modules and we were + able to speed up some Ansible runs by almost 3 times. DiskCache is saving + us a ton of time." + +Does your company or website use `DiskCache`_? Send us a `message +<contact@grantjenks.com>`_ and let us know. + +.. _`Daren Hasenkamp`: https://www.linkedin.com/in/daren-hasenkamp-93006438/ +.. _`Mathias Petermann`: https://www.linkedin.com/in/mathias-petermann-a8aa273b/ + +Features +-------- + +- Pure-Python +- Fully Documented +- Benchmark comparisons (alternatives, Django cache backends) +- 100% test coverage +- Hours of stress testing +- Performance matters +- Django compatible API +- Thread-safe and process-safe +- Supports multiple eviction policies (LRU and LFU included) +- Keys support "tag" metadata and eviction +- Developed on Python 3.7 +- Tested on CPython 2.7, 3.4, 3.5, 3.6, 3.7 and PyPy +- Tested on Linux, Mac OS X, and Windows +- Tested using Travis CI and AppVeyor CI + +.. image:: https://api.travis-ci.org/grantjenks/python-diskcache.svg?branch=master + :target: http://www.grantjenks.com/docs/diskcache/ + +.. image:: https://ci.appveyor.com/api/projects/status/github/grantjenks/python-diskcache?branch=master&svg=true + :target: http://www.grantjenks.com/docs/diskcache/ + +Quickstart +---------- + +Installing `DiskCache`_ is simple with `pip <http://www.pip-installer.org/>`_:: + + $ pip install diskcache + +You can access documentation in the interpreter with Python's built-in help +function:: + + >>> import diskcache + >>> help(diskcache) + +The core of `DiskCache`_ is three data types intended for caching. `Cache`_ +objects manage a SQLite database and filesystem directory to store key and +value pairs. `FanoutCache`_ provides a sharding layer to utilize multiple +caches and `DjangoCache`_ integrates that with `Django`_:: + + >>> from diskcache import Cache, FanoutCache, DjangoCache + >>> help(Cache) + >>> help(FanoutCache) + >>> help(DjangoCache) + +Built atop the caching data types, are `Deque`_ and `Index`_ which work as a +cross-process, persistent replacements for Python's ``collections.deque`` and +``dict``. These implement the sequence and mapping container base classes:: + + >>> from diskcache import Deque, Index + >>> help(Deque) + >>> help(Index) + +Finally, a number of `recipes`_ for cross-process synchronization are provided +using an underlying cache. Features like memoization with cache stampede +prevention, cross-process locking, and cross-process throttling are available:: + + >>> from diskcache import memoize_stampede, Lock, throttle + >>> help(memoize_stampede) + >>> help(Lock) + >>> help(throttle) + +Python's docstrings are a quick way to get started but not intended as a +replacement for the `DiskCache Tutorial`_ and `DiskCache API Reference`_. + +.. _`Cache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#cache +.. _`FanoutCache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#fanoutcache +.. _`DjangoCache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#djangocache +.. _`Django`: https://www.djangoproject.com/ +.. _`Deque`: http://www.grantjenks.com/docs/diskcache/tutorial.html#deque +.. _`Index`: http://www.grantjenks.com/docs/diskcache/tutorial.html#index +.. _`recipes`: http://www.grantjenks.com/docs/diskcache/tutorial.html#recipes + +User Guide +---------- + +For those wanting more details, this part of the documentation describes +tutorial, benchmarks, API, and development. + +* `DiskCache Tutorial`_ +* `DiskCache Cache Benchmarks`_ +* `DiskCache DjangoCache Benchmarks`_ +* `Case Study: Web Crawler`_ +* `Case Study: Landing Page Caching`_ +* `Talk: All Things Cached - SF Python 2017 Meetup`_ +* `DiskCache API Reference`_ +* `DiskCache Development`_ + +.. _`DiskCache Tutorial`: http://www.grantjenks.com/docs/diskcache/tutorial.html +.. _`DiskCache Cache Benchmarks`: http://www.grantjenks.com/docs/diskcache/cache-benchmarks.html +.. _`DiskCache DjangoCache Benchmarks`: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html +.. _`Talk: All Things Cached - SF Python 2017 Meetup`: http://www.grantjenks.com/docs/diskcache/sf-python-2017-meetup-talk.html +.. _`Case Study: Web Crawler`: http://www.grantjenks.com/docs/diskcache/case-study-web-crawler.html +.. _`Case Study: Landing Page Caching`: http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html +.. _`DiskCache API Reference`: http://www.grantjenks.com/docs/diskcache/api.html +.. _`DiskCache Development`: http://www.grantjenks.com/docs/diskcache/development.html + +Comparisons +----------- + +Comparisons to popular projects related to `DiskCache`_. + +Key-Value Stores +................ + +`DiskCache`_ is mostly a simple key-value store. Feature comparisons with four +other projects are shown in the tables below. + +* `dbm`_ is part of Python's standard library and implements a generic + interface to variants of the DBM database — dbm.gnu or dbm.ndbm. If none of + these modules is installed, the slow-but-simple dbm.dumb is used. +* `shelve`_ is part of Python's standard library and implements a “shelf” as a + persistent, dictionary-like object. The difference with “dbm” databases is + that the values can be anything that the pickle module can handle. +* `sqlitedict`_ is a lightweight wrapper around Python's sqlite3 database with + a simple, Pythonic dict-like interface and support for multi-thread + access. Keys are arbitrary strings, values arbitrary pickle-able objects. +* `pickleDB`_ is a lightweight and simple key-value store. It is built upon + Python's simplejson module and was inspired by Redis. It is licensed with the + BSD three-caluse license. + +.. _`dbm`: https://docs.python.org/3/library/dbm.html +.. _`shelve`: https://docs.python.org/3/library/shelve.html +.. _`sqlitedict`: https://github.com/RaRe-Technologies/sqlitedict +.. _`pickleDB`: https://pythonhosted.org/pickleDB/ + +**Features** + +================ ============= ========= ========= ============ ============ +Feature diskcache dbm shelve sqlitedict pickleDB +================ ============= ========= ========= ============ ============ +Atomic? Always Maybe Maybe Maybe No +Persistent? Yes Yes Yes Yes Yes +Thread-safe? Yes No No Yes No +Process-safe? Yes No No Maybe No +Backend? SQLite DBM DBM SQLite File +Serialization? Customizable None Pickle Customizable JSON +Data Types? Mapping/Deque Mapping Mapping Mapping Mapping +Ordering? Insert/Sorted None None None None +Eviction? LRU/LFU/more None None None None +Vacuum? Automatic Maybe Maybe Manual Automatic +Transactions? Yes No No Maybe No +Multiprocessing? Yes No No No No +Forkable? Yes No No No No +Metadata? Yes No No No No +================ ============= ========= ========= ============ ============ + +**Quality** + +================ ============= ========= ========= ============ ============ +Project diskcache dbm shelve sqlitedict pickleDB +================ ============= ========= ========= ============ ============ +Tests? Yes Yes Yes Yes Yes +Coverage? Yes Yes Yes Yes No +Stress? Yes No No No No +CI Tests? Linux/Windows Yes Yes Linux No +Python? 2/3/PyPy All All 2/3 2/3 +License? Apache2 Python Python Apache2 3-Clause BSD +Docs? Extensive Summary Summary Readme Summary +Benchmarks? Yes No No No No +Sources? GitHub GitHub GitHub GitHub GitHub +Pure-Python? Yes Yes Yes Yes Yes +Server? No No No No No +Integrations? Django None None None None +================ ============= ========= ========= ============ ============ + +**Timings** + +These are rough measurements. See `DiskCache Cache Benchmarks`_ for more +rigorous data. + +================ ============= ========= ========= ============ ============ +Project diskcache dbm shelve sqlitedict pickleDB +================ ============= ========= ========= ============ ============ +get 25 µs 36 µs 41 µs 513 µs 92 µs +set 198 µs 900 µs 928 µs 697 µs 1,020 µs +delete 248 µs 740 µs 702 µs 1,717 µs 1,020 µs +================ ============= ========= ========= ============ ============ + +Caching Libraries +................. + +* `joblib.Memory`_ provides caching functions and works by explicitly saving + the inputs and outputs to files. It is designed to work with non-hashable and + potentially large input and output data types such as numpy arrays. +* `klepto`_ extends Python’s `lru_cache` to utilize different keymaps and + alternate caching algorithms, such as `lfu_cache` and `mru_cache`. Klepto + uses a simple dictionary-sytle interface for all caches and archives. + +.. _`klepto`: https://pypi.org/project/klepto/ +.. _`joblib.Memory`: https://joblib.readthedocs.io/en/latest/memory.html + +Data Structures +............... + +* `dict`_ is a mapping object that maps hashable keys to arbitrary + values. Mappings are mutable objects. There is currently only one standard + Python mapping type, the dictionary. +* `pandas`_ is a Python package providing fast, flexible, and expressive data + structures designed to make working with “relational” or “labeled” data both + easy and intuitive. +* `Sorted Containers`_ is an Apache2 licensed sorted collections library, + written in pure-Python, and fast as C-extensions. Sorted Containers + implements sorted list, sorted dictionary, and sorted set data types. + +.. _`dict`: https://docs.python.org/3/library/stdtypes.html#typesmapping +.. _`pandas`: https://pandas.pydata.org/ +.. _`Sorted Containers`: http://www.grantjenks.com/docs/sortedcontainers/ + +Pure-Python Databases +..................... + +* `ZODB`_ supports an isomorphic interface for database operations which means + there's little impact on your code to make objects persistent and there's no + database mapper that partially hides the datbase. +* `CodernityDB`_ is an open source, pure-Python, multi-platform, schema-less, + NoSQL database and includes an HTTP server version, and a Python client + library that aims to be 100% compatible with the embedded version. +* `TinyDB`_ is a tiny, document oriented database optimized for your + happiness. If you need a simple database with a clean API that just works + without lots of configuration, TinyDB might be the right choice for you. + +.. _`ZODB`: http://www.zodb.org/ +.. _`CodernityDB`: https://pypi.org/project/CodernityDB/ +.. _`TinyDB`: https://tinydb.readthedocs.io/ + +Object Relational Mappings (ORM) +................................ + +* `Django ORM`_ provides models that are the single, definitive source of + information about data and contains the essential fields and behaviors of the + stored data. Generally, each model maps to a single SQL database table. +* `SQLAlchemy`_ is the Python SQL toolkit and Object Relational Mapper that + gives application developers the full power and flexibility of SQL. It + provides a full suite of well known enterprise-level persistence patterns. +* `Peewee`_ is a simple and small ORM. It has few (but expressive) concepts, + making it easy to learn and intuitive to use. Peewee supports Sqlite, MySQL, + and PostgreSQL with tons of extensions. +* `SQLObject`_ is a popular Object Relational Manager for providing an object + interface to your database, with tables as classes, rows as instances, and + columns as attributes. +* `Pony ORM`_ is a Python ORM with beautiful query syntax. Use Python syntax + for interacting with the database. Pony translates such queries into SQL and + executes them in the database in the most efficient way. + +.. _`Django ORM`: https://docs.djangoproject.com/en/dev/topics/db/ +.. _`SQLAlchemy`: https://www.sqlalchemy.org/ +.. _`Peewee`: http://docs.peewee-orm.com/ +.. _`dataset`: https://dataset.readthedocs.io/ +.. _`SQLObject`: http://sqlobject.org/ +.. _`Pony ORM`: https://ponyorm.com/ + +SQL Databases +............. + +* `SQLite`_ is part of Python's standard library and provides a lightweight + disk-based database that doesn’t require a separate server process and allows + accessing the database using a nonstandard variant of the SQL query language. +* `MySQL`_ is one of the world’s most popular open source databases and has + become a leading database choice for web-based applications. MySQL includes a + standardized database driver for Python platforms and development. +* `PostgreSQL`_ is a powerful, open source object-relational database system + with over 30 years of active development. Psycopg is the most popular + PostgreSQL adapter for the Python programming language. +* `Oracle DB`_ is a relational database management system (RDBMS) from the + Oracle Corporation. Originally developed in 1977, Oracle DB is one of the + most trusted and widely used enterprise relational database engines. +* `Microsoft SQL Server`_ is a relational database management system developed + by Microsoft. As a database server, it stores and retrieves data as requested + by other software applications. + +.. _`SQLite`: https://docs.python.org/3/library/sqlite3.html +.. _`MySQL`: https://dev.mysql.com/downloads/connector/python/ +.. _`PostgreSQL`: http://initd.org/psycopg/ +.. _`Oracle DB`: https://pypi.org/project/cx_Oracle/ +.. _`Microsoft SQL Server`: https://pypi.org/project/pyodbc/ + +Other Databases +............... + +* `Memcached`_ is free and open source, high-performance, distributed memory + object caching system, generic in nature, but intended for use in speeding up + dynamic web applications by alleviating database load. +* `Redis`_ is an open source, in-memory data structure store, used as a + database, cache and message broker. It supports data structures such as + strings, hashes, lists, sets, sorted sets with range queries, and more. +* `MongoDB`_ is a cross-platform document-oriented database program. Classified + as a NoSQL database program, MongoDB uses JSON-like documents with + schema. PyMongo is the recommended way to work with MongoDB from Python. +* `LMDB`_ is a lightning-fast, memory-mapped database. With memory-mapped + files, it has the read performance of a pure in-memory database while + retaining the persistence of standard disk-based databases. +* `BerkeleyDB`_ is a software library intended to provide a high-performance + embedded database for key/value data. Berkeley DB is a programmatic toolkit + that provides built-in database support for desktop and server applications. +* `LevelDB`_ is a fast key-value storage library written at Google that + provides an ordered mapping from string keys to string values. Data is stored + sorted by key and users can provide a custom comparison function. + +.. _`Memcached`: https://pypi.org/project/python-memcached/ +.. _`MongoDB`: https://api.mongodb.com/python/current/ +.. _`Redis`: https://redis.io/clients#python +.. _`LMDB`: https://lmdb.readthedocs.io/ +.. _`BerkeleyDB`: https://pypi.org/project/bsddb3/ +.. _`LevelDB`: https://plyvel.readthedocs.io/ + +Reference +--------- + +* `DiskCache Documentation`_ +* `DiskCache at PyPI`_ +* `DiskCache at GitHub`_ +* `DiskCache Issue Tracker`_ + +.. _`DiskCache Documentation`: http://www.grantjenks.com/docs/diskcache/ +.. _`DiskCache at PyPI`: https://pypi.python.org/pypi/diskcache/ +.. _`DiskCache at GitHub`: https://github.com/grantjenks/python-diskcache/ +.. _`DiskCache Issue Tracker`: https://github.com/grantjenks/python-diskcache/issues/ + +License +------- + +Copyright 2016-2019 Grant Jenks + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use +this file except in compliance with the License. You may obtain a copy of the +License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed +under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR +CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +.. _`DiskCache`: http://www.grantjenks.com/docs/diskcache/ |