summaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems/caching
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--Documentation/filesystems/caching/backend-api.rst479
-rw-r--r--Documentation/filesystems/caching/cachefiles.rst662
-rw-r--r--Documentation/filesystems/caching/fscache.rst348
-rw-r--r--Documentation/filesystems/caching/index.rst12
-rw-r--r--Documentation/filesystems/caching/netfs-api.rst452
5 files changed, 1953 insertions, 0 deletions
diff --git a/Documentation/filesystems/caching/backend-api.rst b/Documentation/filesystems/caching/backend-api.rst
new file mode 100644
index 000000000..3a199fc50
--- /dev/null
+++ b/Documentation/filesystems/caching/backend-api.rst
@@ -0,0 +1,479 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Cache Backend API
+=================
+
+The FS-Cache system provides an API by which actual caches can be supplied to
+FS-Cache for it to then serve out to network filesystems and other interested
+parties. This API is used by::
+
+ #include <linux/fscache-cache.h>.
+
+
+Overview
+========
+
+Interaction with the API is handled on three levels: cache, volume and data
+storage, and each level has its own type of cookie object:
+
+ ======================= =======================
+ COOKIE C TYPE
+ ======================= =======================
+ Cache cookie struct fscache_cache
+ Volume cookie struct fscache_volume
+ Data storage cookie struct fscache_cookie
+ ======================= =======================
+
+Cookies are used to provide some filesystem data to the cache, manage state and
+pin the cache during access in addition to acting as reference points for the
+API functions. Each cookie has a debugging ID that is included in trace points
+to make it easier to correlate traces. Note, though, that debugging IDs are
+simply allocated from incrementing counters and will eventually wrap.
+
+The cache backend and the network filesystem can both ask for cache cookies -
+and if they ask for one of the same name, they'll get the same cookie. Volume
+and data cookies, however, are created at the behest of the filesystem only.
+
+
+Cache Cookies
+=============
+
+Caches are represented in the API by cache cookies. These are objects of
+type::
+
+ struct fscache_cache {
+ void *cache_priv;
+ unsigned int debug_id;
+ char *name;
+ ...
+ };
+
+There are a few fields that the cache backend might be interested in. The
+``debug_id`` can be used in tracing to match lines referring to the same cache
+and ``name`` is the name the cache was registered with. The ``cache_priv``
+member is private data provided by the cache when it is brought online. The
+other fields are for internal use.
+
+
+Registering a Cache
+===================
+
+When a cache backend wants to bring a cache online, it should first register
+the cache name and that will get it a cache cookie. This is done with::
+
+ struct fscache_cache *fscache_acquire_cache(const char *name);
+
+This will look up and potentially create a cache cookie. The cache cookie may
+have already been created by a network filesystem looking for it, in which case
+that cache cookie will be used. If the cache cookie is not in use by another
+cache, it will be moved into the preparing state, otherwise it will return
+busy.
+
+If successful, the cache backend can then start setting up the cache. In the
+event that the initialisation fails, the cache backend should call::
+
+ void fscache_relinquish_cache(struct fscache_cache *cache);
+
+to reset and discard the cookie.
+
+
+Bringing a Cache Online
+=======================
+
+Once the cache is set up, it can be brought online by calling::
+
+ int fscache_add_cache(struct fscache_cache *cache,
+ const struct fscache_cache_ops *ops,
+ void *cache_priv);
+
+This stores the cache operations table pointer and cache private data into the
+cache cookie and moves the cache to the active state, thereby allowing accesses
+to take place.
+
+
+Withdrawing a Cache From Service
+================================
+
+The cache backend can withdraw a cache from service by calling this function::
+
+ void fscache_withdraw_cache(struct fscache_cache *cache);
+
+This moves the cache to the withdrawn state to prevent new cache- and
+volume-level accesses from starting and then waits for outstanding cache-level
+accesses to complete.
+
+The cache must then go through the data storage objects it has and tell fscache
+to withdraw them, calling::
+
+ void fscache_withdraw_cookie(struct fscache_cookie *cookie);
+
+on the cookie that each object belongs to. This schedules the specified cookie
+for withdrawal. This gets offloaded to a workqueue. The cache backend can
+wait for completion by calling::
+
+ void fscache_wait_for_objects(struct fscache_cache *cache);
+
+Once all the cookies are withdrawn, a cache backend can withdraw all the
+volumes, calling::
+
+ void fscache_withdraw_volume(struct fscache_volume *volume);
+
+to tell fscache that a volume has been withdrawn. This waits for all
+outstanding accesses on the volume to complete before returning.
+
+When the cache is completely withdrawn, fscache should be notified by
+calling::
+
+ void fscache_relinquish_cache(struct fscache_cache *cache);
+
+to clear fields in the cookie and discard the caller's ref on it.
+
+
+Volume Cookies
+==============
+
+Within a cache, the data storage objects are organised into logical volumes.
+These are represented in the API as objects of type::
+
+ struct fscache_volume {
+ struct fscache_cache *cache;
+ void *cache_priv;
+ unsigned int debug_id;
+ char *key;
+ unsigned int key_hash;
+ ...
+ u8 coherency_len;
+ u8 coherency[];
+ };
+
+There are a number of fields here that are of interest to the caching backend:
+
+ * ``cache`` - The parent cache cookie.
+
+ * ``cache_priv`` - A place for the cache to stash private data.
+
+ * ``debug_id`` - A debugging ID for logging in tracepoints.
+
+ * ``key`` - A printable string with no '/' characters in it that represents
+ the index key for the volume. The key is NUL-terminated and padded out to
+ a multiple of 4 bytes.
+
+ * ``key_hash`` - A hash of the index key. This should work out the same, no
+ matter the cpu arch and endianness.
+
+ * ``coherency`` - A piece of coherency data that should be checked when the
+ volume is bound to in the cache.
+
+ * ``coherency_len`` - The amount of data in the coherency buffer.
+
+
+Data Storage Cookies
+====================
+
+A volume is a logical group of data storage objects, each of which is
+represented to the network filesystem by a cookie. Cookies are represented in
+the API as objects of type::
+
+ struct fscache_cookie {
+ struct fscache_volume *volume;
+ void *cache_priv;
+ unsigned long flags;
+ unsigned int debug_id;
+ unsigned int inval_counter;
+ loff_t object_size;
+ u8 advice;
+ u32 key_hash;
+ u8 key_len;
+ u8 aux_len;
+ ...
+ };
+
+The fields in the cookie that are of interest to the cache backend are:
+
+ * ``volume`` - The parent volume cookie.
+
+ * ``cache_priv`` - A place for the cache to stash private data.
+
+ * ``flags`` - A collection of bit flags, including:
+
+ * FSCACHE_COOKIE_NO_DATA_TO_READ - There is no data available in the
+ cache to be read as the cookie has been created or invalidated.
+
+ * FSCACHE_COOKIE_NEEDS_UPDATE - The coherency data and/or object size has
+ been changed and needs committing.
+
+ * FSCACHE_COOKIE_LOCAL_WRITE - The netfs's data has been modified
+ locally, so the cache object may be in an incoherent state with respect
+ to the server.
+
+ * FSCACHE_COOKIE_HAVE_DATA - The backend should set this if it
+ successfully stores data into the cache.
+
+ * FSCACHE_COOKIE_RETIRED - The cookie was invalidated when it was
+ relinquished and the cached data should be discarded.
+
+ * ``debug_id`` - A debugging ID for logging in tracepoints.
+
+ * ``inval_counter`` - The number of invalidations done on the cookie.
+
+ * ``advice`` - Information about how the cookie is to be used.
+
+ * ``key_hash`` - A hash of the index key. This should work out the same, no
+ matter the cpu arch and endianness.
+
+ * ``key_len`` - The length of the index key.
+
+ * ``aux_len`` - The length of the coherency data buffer.
+
+Each cookie has an index key, which may be stored inline to the cookie or
+elsewhere. A pointer to this can be obtained by calling::
+
+ void *fscache_get_key(struct fscache_cookie *cookie);
+
+The index key is a binary blob, the storage for which is padded out to a
+multiple of 4 bytes.
+
+Each cookie also has a buffer for coherency data. This may also be inline or
+detached from the cookie and a pointer is obtained by calling::
+
+ void *fscache_get_aux(struct fscache_cookie *cookie);
+
+
+
+Cookie Accounting
+=================
+
+Data storage cookies are counted and this is used to block cache withdrawal
+completion until all objects have been destroyed. The following functions are
+provided to the cache to deal with that::
+
+ void fscache_count_object(struct fscache_cache *cache);
+ void fscache_uncount_object(struct fscache_cache *cache);
+ void fscache_wait_for_objects(struct fscache_cache *cache);
+
+The count function records the allocation of an object in a cache and the
+uncount function records its destruction. Warning: by the time the uncount
+function returns, the cache may have been destroyed.
+
+The wait function can be used during the withdrawal procedure to wait for
+fscache to finish withdrawing all the objects in the cache. When it completes,
+there will be no remaining objects referring to the cache object or any volume
+objects.
+
+
+Cache Management API
+====================
+
+The cache backend implements the cache management API by providing a table of
+operations that fscache can use to manage various aspects of the cache. These
+are held in a structure of type::
+
+ struct fscache_cache_ops {
+ const char *name;
+ ...
+ };
+
+This contains a printable name for the cache backend driver plus a number of
+pointers to methods to allow fscache to request management of the cache:
+
+ * Set up a volume cookie [optional]::
+
+ void (*acquire_volume)(struct fscache_volume *volume);
+
+ This method is called when a volume cookie is being created. The caller
+ holds a cache-level access pin to prevent the cache from going away for
+ the duration. This method should set up the resources to access a volume
+ in the cache and should not return until it has done so.
+
+ If successful, it can set ``cache_priv`` to its own data.
+
+
+ * Clean up volume cookie [optional]::
+
+ void (*free_volume)(struct fscache_volume *volume);
+
+ This method is called when a volume cookie is being released if
+ ``cache_priv`` is set.
+
+
+ * Look up a cookie in the cache [mandatory]::
+
+ bool (*lookup_cookie)(struct fscache_cookie *cookie);
+
+ This method is called to look up/create the resources needed to access the
+ data storage for a cookie. It is called from a worker thread with a
+ volume-level access pin in the cache to prevent it from being withdrawn.
+
+ True should be returned if successful and false otherwise. If false is
+ returned, the withdraw_cookie op (see below) will be called.
+
+ If lookup fails, but the object could still be created (e.g. it hasn't
+ been cached before), then::
+
+ void fscache_cookie_lookup_negative(
+ struct fscache_cookie *cookie);
+
+ can be called to let the network filesystem proceed and start downloading
+ stuff whilst the cache backend gets on with the job of creating things.
+
+ If successful, ``cookie->cache_priv`` can be set.
+
+
+ * Withdraw an object without any cookie access counts held [mandatory]::
+
+ void (*withdraw_cookie)(struct fscache_cookie *cookie);
+
+ This method is called to withdraw a cookie from service. It will be
+ called when the cookie is relinquished by the netfs, withdrawn or culled
+ by the cache backend or closed after a period of non-use by fscache.
+
+ The caller doesn't hold any access pins, but it is called from a
+ non-reentrant work item to manage races between the various ways
+ withdrawal can occur.
+
+ The cookie will have the ``FSCACHE_COOKIE_RETIRED`` flag set on it if the
+ associated data is to be removed from the cache.
+
+
+ * Change the size of a data storage object [mandatory]::
+
+ void (*resize_cookie)(struct netfs_cache_resources *cres,
+ loff_t new_size);
+
+ This method is called to inform the cache backend of a change in size of
+ the netfs file due to local truncation. The cache backend should make all
+ of the changes it needs to make before returning as this is done under the
+ netfs inode mutex.
+
+ The caller holds a cookie-level access pin to prevent a race with
+ withdrawal and the netfs must have the cookie marked in-use to prevent
+ garbage collection or culling from removing any resources.
+
+
+ * Invalidate a data storage object [mandatory]::
+
+ bool (*invalidate_cookie)(struct fscache_cookie *cookie);
+
+ This is called when the network filesystem detects a third-party
+ modification or when an O_DIRECT write is made locally. This requests
+ that the cache backend should throw away all the data in the cache for
+ this object and start afresh. It should return true if successful and
+ false otherwise.
+
+ On entry, new I O/operations are blocked. Once the cache is in a position
+ to accept I/O again, the backend should release the block by calling::
+
+ void fscache_resume_after_invalidation(struct fscache_cookie *cookie);
+
+ If the method returns false, caching will be withdrawn for this cookie.
+
+
+ * Prepare to make local modifications to the cache [mandatory]::
+
+ void (*prepare_to_write)(struct fscache_cookie *cookie);
+
+ This method is called when the network filesystem finds that it is going
+ to need to modify the contents of the cache due to local writes or
+ truncations. This gives the cache a chance to note that a cache object
+ may be incoherent with respect to the server and may need writing back
+ later. This may also cause the cached data to be scrapped on later
+ rebinding if not properly committed.
+
+
+ * Begin an operation for the netfs lib [mandatory]::
+
+ bool (*begin_operation)(struct netfs_cache_resources *cres,
+ enum fscache_want_state want_state);
+
+ This method is called when an I/O operation is being set up (read, write
+ or resize). The caller holds an access pin on the cookie and must have
+ marked the cookie as in-use.
+
+ If it can, the backend should attach any resources it needs to keep around
+ to the netfs_cache_resources object and return true.
+
+ If it can't complete the setup, it should return false.
+
+ The want_state parameter indicates the state the caller needs the cache
+ object to be in and what it wants to do during the operation:
+
+ * ``FSCACHE_WANT_PARAMS`` - The caller just wants to access cache
+ object parameters; it doesn't need to do data I/O yet.
+
+ * ``FSCACHE_WANT_READ`` - The caller wants to read data.
+
+ * ``FSCACHE_WANT_WRITE`` - The caller wants to write to or resize the
+ cache object.
+
+ Note that there won't necessarily be anything attached to the cookie's
+ cache_priv yet if the cookie is still being created.
+
+
+Data I/O API
+============
+
+A cache backend provides a data I/O API by through the netfs library's ``struct
+netfs_cache_ops`` attached to a ``struct netfs_cache_resources`` by the
+``begin_operation`` method described above.
+
+See the Documentation/filesystems/netfs_library.rst for a description.
+
+
+Miscellaneous Functions
+=======================
+
+FS-Cache provides some utilities that a cache backend may make use of:
+
+ * Note occurrence of an I/O error in a cache::
+
+ void fscache_io_error(struct fscache_cache *cache);
+
+ This tells FS-Cache that an I/O error occurred in the cache. This
+ prevents any new I/O from being started on the cache.
+
+ This does not actually withdraw the cache. That must be done separately.
+
+ * Note cessation of caching on a cookie due to failure::
+
+ void fscache_caching_failed(struct fscache_cookie *cookie);
+
+ This notes that a the caching that was being done on a cookie failed in
+ some way, for instance the backing storage failed to be created or
+ invalidation failed and that no further I/O operations should take place
+ on it until the cache is reset.
+
+ * Count I/O requests::
+
+ void fscache_count_read(void);
+ void fscache_count_write(void);
+
+ These record reads and writes from/to the cache. The numbers are
+ displayed in /proc/fs/fscache/stats.
+
+ * Count out-of-space errors::
+
+ void fscache_count_no_write_space(void);
+ void fscache_count_no_create_space(void);
+
+ These record ENOSPC errors in the cache, divided into failures of data
+ writes and failures of filesystem object creations (e.g. mkdir).
+
+ * Count objects culled::
+
+ void fscache_count_culled(void);
+
+ This records the culling of an object.
+
+ * Get the cookie from a set of cache resources::
+
+ struct fscache_cookie *fscache_cres_cookie(struct netfs_cache_resources *cres)
+
+ Pull a pointer to the cookie from the cache resources. This may return a
+ NULL cookie if no cookie was set.
+
+
+API Function Reference
+======================
+
+.. kernel-doc:: include/linux/fscache-cache.h
diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
new file mode 100644
index 000000000..fc7abf712
--- /dev/null
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -0,0 +1,662 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Cache on Already Mounted Filesystem
+===================================
+
+.. Contents:
+
+ (*) Overview.
+
+ (*) Requirements.
+
+ (*) Configuration.
+
+ (*) Starting the cache.
+
+ (*) Things to avoid.
+
+ (*) Cache culling.
+
+ (*) Cache structure.
+
+ (*) Security model and SELinux.
+
+ (*) A note on security.
+
+ (*) Statistical information.
+
+ (*) Debugging.
+
+ (*) On-demand Read.
+
+
+Overview
+========
+
+CacheFiles is a caching backend that's meant to use as a cache a directory on
+an already mounted filesystem of a local type (such as Ext3).
+
+CacheFiles uses a userspace daemon to do some of the cache management - such as
+reaping stale nodes and culling. This is called cachefilesd and lives in
+/sbin.
+
+The filesystem and data integrity of the cache are only as good as those of the
+filesystem providing the backing services. Note that CacheFiles does not
+attempt to journal anything since the journalling interfaces of the various
+filesystems are very specific in nature.
+
+CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
+to communication with the daemon. Only one thing may have this open at once,
+and while it is open, a cache is at least partially in existence. The daemon
+opens this and sends commands down it to control the cache.
+
+CacheFiles is currently limited to a single cache.
+
+CacheFiles attempts to maintain at least a certain percentage of free space on
+the filesystem, shrinking the cache by culling the objects it contains to make
+space if necessary - see the "Cache Culling" section. This means it can be
+placed on the same medium as a live set of data, and will expand to make use of
+spare space and automatically contract when the set of data requires more
+space.
+
+
+
+Requirements
+============
+
+The use of CacheFiles and its daemon requires the following features to be
+available in the system and in the cache filesystem:
+
+ - dnotify.
+
+ - extended attributes (xattrs).
+
+ - openat() and friends.
+
+ - bmap() support on files in the filesystem (FIBMAP ioctl).
+
+ - The use of bmap() to detect a partial page at the end of the file.
+
+It is strongly recommended that the "dir_index" option is enabled on Ext3
+filesystems being used as a cache.
+
+
+Configuration
+=============
+
+The cache is configured by a script in /etc/cachefilesd.conf. These commands
+set up cache ready for use. The following script commands are available:
+
+ brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
+ Configure the culling limits. Optional. See the section on culling
+ The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
+
+ The commands beginning with a 'b' are file space (block) limits, those
+ beginning with an 'f' are file count limits.
+
+ dir <path>
+ Specify the directory containing the root of the cache. Mandatory.
+
+ tag <name>
+ Specify a tag to FS-Cache to use in distinguishing multiple caches.
+ Optional. The default is "CacheFiles".
+
+ debug <mask>
+ Specify a numeric bitmask to control debugging in the kernel module.
+ Optional. The default is zero (all off). The following values can be
+ OR'd into the mask to collect various information:
+
+ == =================================================
+ 1 Turn on trace of function entry (_enter() macros)
+ 2 Turn on trace of function exit (_leave() macros)
+ 4 Turn on trace of internal debug points (_debug())
+ == =================================================
+
+ This mask can also be set through sysfs, eg::
+
+ echo 5 >/sys/modules/cachefiles/parameters/debug
+
+
+Starting the Cache
+==================
+
+The cache is started by running the daemon. The daemon opens the cache device,
+configures the cache and tells it to begin caching. At that point the cache
+binds to fscache and the cache becomes live.
+
+The daemon is run as follows::
+
+ /sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
+
+The flags are:
+
+ ``-d``
+ Increase the debugging level. This can be specified multiple times and
+ is cumulative with itself.
+
+ ``-s``
+ Send messages to stderr instead of syslog.
+
+ ``-n``
+ Don't daemonise and go into background.
+
+ ``-f <configfile>``
+ Use an alternative configuration file rather than the default one.
+
+
+Things to Avoid
+===============
+
+Do not mount other things within the cache as this will cause problems. The
+kernel module contains its own very cut-down path walking facility that ignores
+mountpoints, but the daemon can't avoid them.
+
+Do not create, rename or unlink files and directories in the cache while the
+cache is active, as this may cause the state to become uncertain.
+
+Renaming files in the cache might make objects appear to be other objects (the
+filename is part of the lookup key).
+
+Do not change or remove the extended attributes attached to cache files by the
+cache as this will cause the cache state management to get confused.
+
+Do not create files or directories in the cache, lest the cache get confused or
+serve incorrect data.
+
+Do not chmod files in the cache. The module creates things with minimal
+permissions to prevent random users being able to access them directly.
+
+
+Cache Culling
+=============
+
+The cache may need culling occasionally to make space. This involves
+discarding objects from the cache that have been used less recently than
+anything else. Culling is based on the access time of data objects. Empty
+directories are culled if not in use.
+
+Cache culling is done on the basis of the percentage of blocks and the
+percentage of files available in the underlying filesystem. There are six
+"limits":
+
+ brun, frun
+ If the amount of free space and the number of available files in the cache
+ rises above both these limits, then culling is turned off.
+
+ bcull, fcull
+ If the amount of available space or the number of available files in the
+ cache falls below either of these limits, then culling is started.
+
+ bstop, fstop
+ If the amount of available space or the number of available files in the
+ cache falls below either of these limits, then no further allocation of
+ disk space or files is permitted until culling has raised things above
+ these limits again.
+
+These must be configured thusly::
+
+ 0 <= bstop < bcull < brun < 100
+ 0 <= fstop < fcull < frun < 100
+
+Note that these are percentages of available space and available files, and do
+_not_ appear as 100 minus the percentage displayed by the "df" program.
+
+The userspace daemon scans the cache to build up a table of cullable objects.
+These are then culled in least recently used order. A new scan of the cache is
+started as soon as space is made in the table. Objects will be skipped if
+their atimes have changed or if the kernel module says it is still using them.
+
+
+Cache Structure
+===============
+
+The CacheFiles module will create two directories in the directory it was
+given:
+
+ * cache/
+ * graveyard/
+
+The active cache objects all reside in the first directory. The CacheFiles
+kernel module moves any retired or culled objects that it can't simply unlink
+to the graveyard from which the daemon will actually delete them.
+
+The daemon uses dnotify to monitor the graveyard directory, and will delete
+anything that appears therein.
+
+
+The module represents index objects as directories with the filename "I..." or
+"J...". Note that the "cache/" directory is itself a special index.
+
+Data objects are represented as files if they have no children, or directories
+if they do. Their filenames all begin "D..." or "E...". If represented as a
+directory, data objects will have a file in the directory called "data" that
+actually holds the data.
+
+Special objects are similar to data objects, except their filenames begin
+"S..." or "T...".
+
+
+If an object has children, then it will be represented as a directory.
+Immediately in the representative directory are a collection of directories
+named for hash values of the child object keys with an '@' prepended. Into
+this directory, if possible, will be placed the representations of the child
+objects::
+
+ /INDEX /INDEX /INDEX /DATA FILES
+ /=========/==========/=================================/================
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
+
+
+If the key is so long that it exceeds NAME_MAX with the decorations added on to
+it, then it will be cut into pieces, the first few of which will be used to
+make a nest of directories, and the last one of which will be the objects
+inside the last directory. The names of the intermediate directories will have
+'+' prepended::
+
+ J1223/@23/+xy...z/+kl...m/Epqr
+
+
+Note that keys are raw data, and not only may they exceed NAME_MAX in size,
+they may also contain things like '/' and NUL characters, and so they may not
+be suitable for turning directly into a filename.
+
+To handle this, CacheFiles will use a suitably printable filename directly and
+"base-64" encode ones that aren't directly suitable. The two versions of
+object filenames indicate the encoding:
+
+ =============== =============== ===============
+ OBJECT TYPE PRINTABLE ENCODED
+ =============== =============== ===============
+ Index "I..." "J..."
+ Data "D..." "E..."
+ Special "S..." "T..."
+ =============== =============== ===============
+
+Intermediate directories are always "@" or "+" as appropriate.
+
+
+Each object in the cache has an extended attribute label that holds the object
+type ID (required to distinguish special objects) and the auxiliary data from
+the netfs. The latter is used to detect stale objects in the cache and update
+or retire them.
+
+
+Note that CacheFiles will erase from the cache any file it doesn't recognise or
+any file of an incorrect type (such as a FIFO file or a device file).
+
+
+Security Model and SELinux
+==========================
+
+CacheFiles is implemented to deal properly with the LSM security features of
+the Linux kernel and the SELinux facility.
+
+One of the problems that CacheFiles faces is that it is generally acting on
+behalf of a process, and running in that process's context, and that includes a
+security context that is not appropriate for accessing the cache - either
+because the files in the cache are inaccessible to that process, or because if
+the process creates a file in the cache, that file may be inaccessible to other
+processes.
+
+The way CacheFiles works is to temporarily change the security context (fsuid,
+fsgid and actor security label) that the process acts as - without changing the
+security context of the process when it the target of an operation performed by
+some other process (so signalling and suchlike still work correctly).
+
+
+When the CacheFiles module is asked to bind to its cache, it:
+
+ (1) Finds the security label attached to the root cache directory and uses
+ that as the security label with which it will create files. By default,
+ this is::
+
+ cachefiles_var_t
+
+ (2) Finds the security label of the process which issued the bind request
+ (presumed to be the cachefilesd daemon), which by default will be::
+
+ cachefilesd_t
+
+ and asks LSM to supply a security ID as which it should act given the
+ daemon's label. By default, this will be::
+
+ cachefiles_kernel_t
+
+ SELinux transitions the daemon's security ID to the module's security ID
+ based on a rule of this form in the policy::
+
+ type_transition <daemon's-ID> kernel_t : process <module's-ID>;
+
+ For instance::
+
+ type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
+
+
+The module's security ID gives it permission to create, move and remove files
+and directories in the cache, to find and access directories and files in the
+cache, to set and access extended attributes on cache objects, and to read and
+write files in the cache.
+
+The daemon's security ID gives it only a very restricted set of permissions: it
+may scan directories, stat files and erase files and directories. It may
+not read or write files in the cache, and so it is precluded from accessing the
+data cached therein; nor is it permitted to create new files in the cache.
+
+
+There are policy source files available in:
+
+ https://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
+
+and later versions. In that tarball, see the files::
+
+ cachefilesd.te
+ cachefilesd.fc
+ cachefilesd.if
+
+They are built and installed directly by the RPM.
+
+If a non-RPM based system is being used, then copy the above files to their own
+directory and run::
+
+ make -f /usr/share/selinux/devel/Makefile
+ semodule -i cachefilesd.pp
+
+You will need checkpolicy and selinux-policy-devel installed prior to the
+build.
+
+
+By default, the cache is located in /var/fscache, but if it is desirable that
+it should be elsewhere, than either the above policy files must be altered, or
+an auxiliary policy must be installed to label the alternate location of the
+cache.
+
+For instructions on how to add an auxiliary policy to enable the cache to be
+located elsewhere when SELinux is in enforcing mode, please see::
+
+ /usr/share/doc/cachefilesd-*/move-cache.txt
+
+When the cachefilesd rpm is installed; alternatively, the document can be found
+in the sources.
+
+
+A Note on Security
+==================
+
+CacheFiles makes use of the split security in the task_struct. It allocates
+its own task_security structure, and redirects current->cred to point to it
+when it acts on behalf of another process, in that process's context.
+
+The reason it does this is that it calls vfs_mkdir() and suchlike rather than
+bypassing security and calling inode ops directly. Therefore the VFS and LSM
+may deny the CacheFiles access to the cache data because under some
+circumstances the caching code is running in the security context of whatever
+process issued the original syscall on the netfs.
+
+Furthermore, should CacheFiles create a file or directory, the security
+parameters with that object is created (UID, GID, security label) would be
+derived from that process that issued the system call, thus potentially
+preventing other processes from accessing the cache - including CacheFiles's
+cache management daemon (cachefilesd).
+
+What is required is to temporarily override the security of the process that
+issued the system call. We can't, however, just do an in-place change of the
+security data as that affects the process as an object, not just as a subject.
+This means it may lose signals or ptrace events for example, and affects what
+the process looks like in /proc.
+
+So CacheFiles makes use of a logical split in the security between the
+objective security (task->real_cred) and the subjective security (task->cred).
+The objective security holds the intrinsic security properties of a process and
+is never overridden. This is what appears in /proc, and is what is used when a
+process is the target of an operation by some other process (SIGKILL for
+example).
+
+The subjective security holds the active security properties of a process, and
+may be overridden. This is not seen externally, and is used whan a process
+acts upon another object, for example SIGKILLing another process or opening a
+file.
+
+LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
+for CacheFiles to run in a context of a specific security label, or to create
+files and directories with another security label.
+
+
+Statistical Information
+=======================
+
+If FS-Cache is compiled with the following option enabled::
+
+ CONFIG_CACHEFILES_HISTOGRAM=y
+
+then it will gather certain statistics and display them through a proc file.
+
+ /proc/fs/cachefiles/histogram
+
+ ::
+
+ cat /proc/fs/cachefiles/histogram
+ JIFS SECS LOOKUPS MKDIRS CREATES
+ ===== ===== ========= ========= =========
+
+ This shows the breakdown of the number of times each amount of time
+ between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
+ columns are as follows:
+
+ ======= =======================================================
+ COLUMN TIME MEASUREMENT
+ ======= =======================================================
+ LOOKUPS Length of time to perform a lookup on the backing fs
+ MKDIRS Length of time to perform a mkdir on the backing fs
+ CREATES Length of time to perform a create on the backing fs
+ ======= =======================================================
+
+ Each row shows the number of events that took a particular range of times.
+ Each step is 1 jiffy in size. The JIFS column indicates the particular
+ jiffy range covered, and the SECS field the equivalent number of seconds.
+
+
+Debugging
+=========
+
+If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
+debugging enabled by adjusting the value in::
+
+ /sys/module/cachefiles/parameters/debug
+
+This is a bitmask of debugging streams to enable:
+
+ ======= ======= =============================== =======================
+ BIT VALUE STREAM POINT
+ ======= ======= =============================== =======================
+ 0 1 General Function entry trace
+ 1 2 Function exit trace
+ 2 4 General
+ ======= ======= =============================== =======================
+
+The appropriate set of values should be OR'd together and the result written to
+the control file. For example::
+
+ echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
+
+will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in its original mode, CacheFiles serves as a local cache for a
+remote networking fs - while in on-demand read mode, CacheFiles can boost the
+scenario where on-demand read semantics are needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is seen when a cache miss
+occurs: In the original mode, the netfs will fetch the data from the remote
+server and then write it to the cache file; in on-demand read mode, fetching
+the data and writing it into the cache is delegated to a user daemon.
+
+``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode uses a simple protocol for communication between kernel
+and user daemon. The protocol can be modeled as::
+
+ kernel --[request]--> user daemon --[reply]--> kernel
+
+CacheFiles will send requests to the user daemon when needed. The user daemon
+should poll the devnode ('/dev/cachefiles') to check if there's a pending
+request to be processed. A POLLIN event will be returned when there's a pending
+request.
+
+The user daemon then reads the devnode to fetch a request to process. It should
+be noted that each read only gets one request. When it has finished processing
+the request, the user daemon should write the reply to the devnode.
+
+Each request starts with a message header of the form::
+
+ struct cachefiles_msg {
+ __u32 msg_id;
+ __u32 opcode;
+ __u32 len;
+ __u32 object_id;
+ __u8 data[];
+ };
+
+where:
+
+ * ``msg_id`` is a unique ID identifying this request among all pending
+ requests.
+
+ * ``opcode`` indicates the type of this request.
+
+ * ``object_id`` is a unique ID identifying the cache file operated on.
+
+ * ``data`` indicates the payload of this request.
+
+ * ``len`` indicates the whole length of this request, including the
+ header and following type-specific payload.
+
+
+Turning on On-demand Mode
+-------------------------
+
+An optional parameter becomes available to the "bind" command::
+
+ bind [ondemand]
+
+When the "bind" command is given no argument, it defaults to the original mode.
+When it is given the "ondemand" argument, i.e. "bind ondemand", on-demand read
+mode will be enabled.
+
+
+The OPEN Request
+----------------
+
+When the netfs opens a cache file for the first time, a request with the
+CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
+daemon. The payload format is of the form::
+
+ struct cachefiles_open {
+ __u32 volume_key_size;
+ __u32 cookie_key_size;
+ __u32 fd;
+ __u32 flags;
+ __u8 data[];
+ };
+
+where:
+
+ * ``data`` contains the volume_key followed directly by the cookie_key.
+ The volume key is a NUL-terminated string; the cookie key is binary
+ data.
+
+ * ``volume_key_size`` indicates the size of the volume key in bytes.
+
+ * ``cookie_key_size`` indicates the size of the cookie key in bytes.
+
+ * ``fd`` indicates an anonymous fd referring to the cache file, through
+ which the user daemon can perform write/llseek file operations on the
+ cache file.
+
+
+The user daemon can use the given (volume_key, cookie_key) pair to distinguish
+the requested cache file. With the given anonymous fd, the user daemon can
+fetch the data and write it to the cache file in the background, even when
+kernel has not triggered a cache miss yet.
+
+Be noted that each cache file has a unique object_id, while it may have multiple
+anonymous fds. The user daemon may duplicate anonymous fds from the initial
+anonymous fd indicated by the @fd field through dup(). Thus each object_id can
+be mapped to multiple anonymous fds, while the usr daemon itself needs to
+maintain the mapping.
+
+When implementing a user daemon, please be careful of RLIMIT_NOFILE,
+``/proc/sys/fs/nr_open`` and ``/proc/sys/fs/file-max``. Typically these needn't
+be huge since they're related to the number of open device blobs rather than
+open files of each individual filesystem.
+
+The user daemon should reply the OPEN request by issuing a "copen" (complete
+open) command on the devnode::
+
+ copen <msg_id>,<cache_size>
+
+where:
+
+ * ``msg_id`` must match the msg_id field of the OPEN request.
+
+ * When >= 0, ``cache_size`` indicates the size of the cache file;
+ when < 0, ``cache_size`` indicates any error code encountered by the
+ user daemon.
+
+
+The CLOSE Request
+-----------------
+
+When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
+sent to the user daemon. This tells the user daemon to close all anonymous fds
+associated with the given object_id. The CLOSE request has no extra payload,
+and shouldn't be replied.
+
+
+The READ Request
+----------------
+
+When a cache miss is encountered in on-demand read mode, CacheFiles will send a
+READ request (opcode CACHEFILES_OP_READ) to the user daemon. This tells the user
+daemon to fetch the contents of the requested file range. The payload is of the
+form::
+
+ struct cachefiles_read {
+ __u64 off;
+ __u64 len;
+ };
+
+where:
+
+ * ``off`` indicates the starting offset of the requested file range.
+
+ * ``len`` indicates the length of the requested file range.
+
+
+When it receives a READ request, the user daemon should fetch the requested data
+and write it to the cache file identified by object_id.
+
+When it has finished processing the READ request, the user daemon should reply
+by using the CACHEFILES_IOC_READ_COMPLETE ioctl on one of the anonymous fds
+associated with the object_id given in the READ request. The ioctl is of the
+form::
+
+ ioctl(fd, CACHEFILES_IOC_READ_COMPLETE, msg_id);
+
+where:
+
+ * ``fd`` is one of the anonymous fds associated with the object_id
+ given.
+
+ * ``msg_id`` must match the msg_id field of the READ request.
diff --git a/Documentation/filesystems/caching/fscache.rst b/Documentation/filesystems/caching/fscache.rst
new file mode 100644
index 000000000..a74d7b052
--- /dev/null
+++ b/Documentation/filesystems/caching/fscache.rst
@@ -0,0 +1,348 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+General Filesystem Caching
+==========================
+
+Overview
+========
+
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+
+FS-Cache mediates between cache backends (such as CacheFiles) and network
+filesystems::
+
+ +---------+
+ | | +--------------+
+ | NFS |--+ | |
+ | | | +-->| CacheFS |
+ +---------+ | +----------+ | | /dev/hda5 |
+ | | | | +--------------+
+ +---------+ +-------------->| | |
+ | | +-------+ | |--+
+ | AFS |----->| | | FS-Cache |
+ | | | netfs |-->| |--+
+ +---------+ +-->| lib | | | |
+ | | | | | | +--------------+
+ +---------+ | +-------+ +----------+ | | |
+ | | | +-->| CacheFiles |
+ | 9P |--+ | /var/cache |
+ | | +--------------+
+ +---------+
+
+Or to look at it another way, FS-Cache is a module that provides a caching
+facility to a network filesystem such that the cache is transparent to the
+user::
+
+ +---------+
+ | |
+ | Server |
+ | |
+ +---------+
+ | NETWORK
+ ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ |
+ | +----------+
+ V | |
+ +---------+ | |
+ | | | |
+ | NFS |----->| FS-Cache |
+ | | | |--+
+ +---------+ | | | +--------------+ +--------------+
+ | | | | | | | |
+ V +----------+ +-->| CacheFiles |-->| Ext3 |
+ +---------+ | /var/cache | | /dev/sda6 |
+ | | +--------------+ +--------------+
+ | VFS | ^ ^
+ | | | |
+ +---------+ +--------------+ |
+ | KERNEL SPACE | |
+ ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
+ | USER SPACE | |
+ V | |
+ +---------+ +--------------+
+ | | | |
+ | Process | | cachefilesd |
+ | | | |
+ +---------+ +--------------+
+
+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+ cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+ must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+ one-off access of a small portion of it (such as might be done with the
+ "file" program).
+
+It instead serves the cache out in chunks as and when requested by the netfs
+using it.
+
+
+FS-Cache provides the following facilities:
+
+ * More than one cache can be used at once. Caches can be selected
+ explicitly by use of tags.
+
+ * Caches can be added / removed at any time, even whilst being accessed.
+
+ * The netfs is provided with an interface that allows either party to
+ withdraw caching facilities from a file (required for (2)).
+
+ * The interface to the netfs returns as few errors as possible, preferring
+ rather to let the netfs remain oblivious.
+
+ * There are three types of cookie: cache, volume and data file cookies.
+ Cache cookies represent the cache as a whole and are not normally visible
+ to the netfs; the netfs gets a volume cookie to represent a collection of
+ files (typically something that a netfs would get for a superblock); and
+ data file cookies are used to cache data (something that would be got for
+ an inode).
+
+ * Volumes are matched using a key. This is a printable string that is used
+ to encode all the information that might be needed to distinguish one
+ superblock, say, from another. This would be a compound of things like
+ cell name or server address, volume name or share path. It must be a
+ valid pathname.
+
+ * Cookies are matched using a key. This is a binary blob and is used to
+ represent the object within a volume (so the volume key need not form
+ part of the blob). This might include things like an inode number and
+ uniquifier or a file handle.
+
+ * Cookie resources are set up and pinned by marking the cookie in-use.
+ This prevents the backing resources from being culled. Timed garbage
+ collection is employed to eliminate cookies that haven't been used for a
+ short while, thereby reducing resource overload. This is intended to be
+ used when a file is opened or closed.
+
+ A cookie can be marked in-use multiple times simultaneously; each mark
+ must be unused.
+
+ * Begin/end access functions are provided to delay cache withdrawal for the
+ duration of an operation and prevent structs from being freed whilst
+ we're looking at them.
+
+ * Data I/O is done by asynchronous DIO to/from a buffer described by the
+ netfs using an iov_iter.
+
+ * An invalidation facility is available to discard data from the cache and
+ to deal with I/O that's in progress that is accessing old data.
+
+ * Cookies can be "retired" upon release, thereby causing the object to be
+ removed from the cache.
+
+
+The netfs API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/netfs-api.rst
+
+The cache backend API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/backend-api.rst
+
+
+Statistical Information
+=======================
+
+If FS-Cache is compiled with the following options enabled::
+
+ CONFIG_FSCACHE_STATS=y
+
+then it will gather certain statistics and display them through:
+
+ /proc/fs/fscache/stats
+
+This shows counts of a number of events that can happen in FS-Cache:
+
++--------------+-------+-------------------------------------------------------+
+|CLASS |EVENT |MEANING |
++==============+=======+=======================================================+
+|Cookies |n=N |Number of data storage cookies allocated |
++ +-------+-------------------------------------------------------+
+| |v=N |Number of volume index cookies allocated |
++ +-------+-------------------------------------------------------+
+| |vcol=N |Number of volume index key collisions |
++ +-------+-------------------------------------------------------+
+| |voom=N |Number of OOM events when allocating volume cookies |
++--------------+-------+-------------------------------------------------------+
+|Acquire |n=N |Number of acquire cookie requests seen |
++ +-------+-------------------------------------------------------+
+| |ok=N |Number of acq reqs succeeded |
++ +-------+-------------------------------------------------------+
+| |oom=N |Number of acq reqs failed on ENOMEM |
++--------------+-------+-------------------------------------------------------+
+|LRU |n=N |Number of cookies currently on the LRU |
++ +-------+-------------------------------------------------------+
+| |exp=N |Number of cookies expired off of the LRU |
++ +-------+-------------------------------------------------------+
+| |rmv=N |Number of cookies removed from the LRU |
++ +-------+-------------------------------------------------------+
+| |drp=N |Number of LRU'd cookies relinquished/withdrawn |
++ +-------+-------------------------------------------------------+
+| |at=N |Time till next LRU cull (jiffies) |
++--------------+-------+-------------------------------------------------------+
+|Invals |n=N |Number of invalidations |
++--------------+-------+-------------------------------------------------------+
+|Updates |n=N |Number of update cookie requests seen |
++ +-------+-------------------------------------------------------+
+| |rsz=N |Number of resize requests |
++ +-------+-------------------------------------------------------+
+| |rsn=N |Number of skipped resize requests |
++--------------+-------+-------------------------------------------------------+
+|Relinqs |n=N |Number of relinquish cookie requests seen |
++ +-------+-------------------------------------------------------+
+| |rtr=N |Number of rlq reqs with retire=true |
++ +-------+-------------------------------------------------------+
+| |drop=N |Number of cookies no longer blocking re-acquisition |
++--------------+-------+-------------------------------------------------------+
+|NoSpace |nwr=N |Number of write requests refused due to lack of space |
++ +-------+-------------------------------------------------------+
+| |ncr=N |Number of create requests refused due to lack of space |
++ +-------+-------------------------------------------------------+
+| |cull=N |Number of objects culled to make space |
++--------------+-------+-------------------------------------------------------+
+|IO |rd=N |Number of read operations in the cache |
++ +-------+-------------------------------------------------------+
+| |wr=N |Number of write operations in the cache |
++--------------+-------+-------------------------------------------------------+
+
+Netfslib will also add some stats counters of its own.
+
+
+Cache List
+==========
+
+FS-Cache provides a list of cache cookies:
+
+ /proc/fs/fscache/cookies
+
+This will look something like::
+
+ # cat /proc/fs/fscache/caches
+ CACHE REF VOLS OBJS ACCES S NAME
+ ======== ===== ===== ===== ===== = ===============
+ 00000001 2 1 2123 1 A default
+
+where the columns are:
+
+ ======= ===============================================================
+ COLUMN DESCRIPTION
+ ======= ===============================================================
+ CACHE Cache cookie debug ID (also appears in traces)
+ REF Number of references on the cache cookie
+ VOLS Number of volumes cookies in this cache
+ OBJS Number of cache objects in use
+ ACCES Number of accesses pinning the cache
+ S State
+ NAME Name of the cache.
+ ======= ===============================================================
+
+The state can be (-) Inactive, (P)reparing, (A)ctive, (E)rror or (W)ithdrawing.
+
+
+Volume List
+===========
+
+FS-Cache provides a list of volume cookies:
+
+ /proc/fs/fscache/volumes
+
+This will look something like::
+
+ VOLUME REF nCOOK ACC FL CACHE KEY
+ ======== ===== ===== === == =============== ================
+ 00000001 55 54 1 00 default afs,example.com,100058
+
+where the columns are:
+
+ ======= ===============================================================
+ COLUMN DESCRIPTION
+ ======= ===============================================================
+ VOLUME The volume cookie debug ID (also appears in traces)
+ REF Number of references on the volume cookie
+ nCOOK Number of cookies in the volume
+ ACC Number of accesses pinning the cache
+ FL Flags on the volume cookie
+ CACHE Name of the cache or "-"
+ KEY The indexing key for the volume
+ ======= ===============================================================
+
+
+Cookie List
+===========
+
+FS-Cache provides a list of cookies:
+
+ /proc/fs/fscache/cookies
+
+This will look something like::
+
+ # head /proc/fs/fscache/cookies
+ COOKIE VOLUME REF ACT ACC S FL DEF
+ ======== ======== === === === = == ================
+ 00000435 00000001 1 0 -1 - 08 0000000201d080070000000000000000, 0000000000000000
+ 00000436 00000001 1 0 -1 - 00 0000005601d080080000000000000000, 0000000000000051
+ 00000437 00000001 1 0 -1 - 08 00023b3001d0823f0000000000000000, 0000000000000000
+ 00000438 00000001 1 0 -1 - 08 0000005801d0807b0000000000000000, 0000000000000000
+ 00000439 00000001 1 0 -1 - 08 00023b3201d080a10000000000000000, 0000000000000000
+ 0000043a 00000001 1 0 -1 - 08 00023b3401d080a30000000000000000, 0000000000000000
+ 0000043b 00000001 1 0 -1 - 08 00023b3601d080b30000000000000000, 0000000000000000
+ 0000043c 00000001 1 0 -1 - 08 00023b3801d080b40000000000000000, 0000000000000000
+
+where the columns are:
+
+ ======= ===============================================================
+ COLUMN DESCRIPTION
+ ======= ===============================================================
+ COOKIE The cookie debug ID (also appears in traces)
+ VOLUME The parent volume cookie debug ID
+ REF Number of references on the volume cookie
+ ACT Number of times the cookie is marked for in use
+ ACC Number of access pins in the cookie
+ S State of the cookie
+ FL Flags on the cookie
+ DEF Key, auxiliary data
+ ======= ===============================================================
+
+
+Debugging
+=========
+
+If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
+debugging enabled by adjusting the value in::
+
+ /sys/module/fscache/parameters/debug
+
+This is a bitmask of debugging streams to enable:
+
+ ======= ======= =============================== =======================
+ BIT VALUE STREAM POINT
+ ======= ======= =============================== =======================
+ 0 1 Cache management Function entry trace
+ 1 2 Function exit trace
+ 2 4 General
+ 3 8 Cookie management Function entry trace
+ 4 16 Function exit trace
+ 5 32 General
+ 6-8 (Not used)
+ 9 512 I/O operation management Function entry trace
+ 10 1024 Function exit trace
+ 11 2048 General
+ ======= ======= =============================== =======================
+
+The appropriate set of values should be OR'd together and the result written to
+the control file. For example::
+
+ echo $((1|8|512)) >/sys/module/fscache/parameters/debug
+
+will turn on all function entry debugging.
diff --git a/Documentation/filesystems/caching/index.rst b/Documentation/filesystems/caching/index.rst
new file mode 100644
index 000000000..df4307124
--- /dev/null
+++ b/Documentation/filesystems/caching/index.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Filesystem Caching
+==================
+
+.. toctree::
+ :maxdepth: 2
+
+ fscache
+ netfs-api
+ backend-api
+ cachefiles
diff --git a/Documentation/filesystems/caching/netfs-api.rst b/Documentation/filesystems/caching/netfs-api.rst
new file mode 100644
index 000000000..1d18e9def
--- /dev/null
+++ b/Documentation/filesystems/caching/netfs-api.rst
@@ -0,0 +1,452 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Network Filesystem Caching API
+==============================
+
+Fscache provides an API by which a network filesystem can make use of local
+caching facilities. The API is arranged around a number of principles:
+
+ (1) A cache is logically organised into volumes and data storage objects
+ within those volumes.
+
+ (2) Volumes and data storage objects are represented by various types of
+ cookie.
+
+ (3) Cookies have keys that distinguish them from their peers.
+
+ (4) Cookies have coherency data that allows a cache to determine if the
+ cached data is still valid.
+
+ (5) I/O is done asynchronously where possible.
+
+This API is used by::
+
+ #include <linux/fscache.h>.
+
+.. This document contains the following sections:
+
+ (1) Overview
+ (2) Volume registration
+ (3) Data file registration
+ (4) Declaring a cookie to be in use
+ (5) Resizing a data file (truncation)
+ (6) Data I/O API
+ (7) Data file coherency
+ (8) Data file invalidation
+ (9) Write back resource management
+ (10) Caching of local modifications
+ (11) Page release and invalidation
+
+
+Overview
+========
+
+The fscache hierarchy is organised on two levels from a network filesystem's
+point of view. The upper level represents "volumes" and the lower level
+represents "data storage objects". These are represented by two types of
+cookie, hereafter referred to as "volume cookies" and "cookies".
+
+A network filesystem acquires a volume cookie for a volume using a volume key,
+which represents all the information that defines that volume (e.g. cell name
+or server address, volume ID or share name). This must be rendered as a
+printable string that can be used as a directory name (ie. no '/' characters
+and shouldn't begin with a '.'). The maximum name length is one less than the
+maximum size of a filename component (allowing the cache backend one char for
+its own purposes).
+
+A filesystem would typically have a volume cookie for each superblock.
+
+The filesystem then acquires a cookie for each file within that volume using an
+object key. Object keys are binary blobs and only need to be unique within
+their parent volume. The cache backend is reponsible for rendering the binary
+blob into something it can use and may employ hash tables, trees or whatever to
+improve its ability to find an object. This is transparent to the network
+filesystem.
+
+A filesystem would typically have a cookie for each inode, and would acquire it
+in iget and relinquish it when evicting the cookie.
+
+Once it has a cookie, the filesystem needs to mark the cookie as being in use.
+This causes fscache to send the cache backend off to look up/create resources
+for the cookie in the background, to check its coherency and, if necessary, to
+mark the object as being under modification.
+
+A filesystem would typically "use" the cookie in its file open routine and
+unuse it in file release and it needs to use the cookie around calls to
+truncate the cookie locally. It *also* needs to use the cookie when the
+pagecache becomes dirty and unuse it when writeback is complete. This is
+slightly tricky, and provision is made for it.
+
+When performing a read, write or resize on a cookie, the filesystem must first
+begin an operation. This copies the resources into a holding struct and puts
+extra pins into the cache to stop cache withdrawal from tearing down the
+structures being used. The actual operation can then be issued and conflicting
+invalidations can be detected upon completion.
+
+The filesystem is expected to use netfslib to access the cache, but that's not
+actually required and it can use the fscache I/O API directly.
+
+
+Volume Registration
+===================
+
+The first step for a network filsystem is to acquire a volume cookie for the
+volume it wants to access::
+
+ struct fscache_volume *
+ fscache_acquire_volume(const char *volume_key,
+ const char *cache_name,
+ const void *coherency_data,
+ size_t coherency_len);
+
+This function creates a volume cookie with the specified volume key as its name
+and notes the coherency data.
+
+The volume key must be a printable string with no '/' characters in it. It
+should begin with the name of the filesystem and should be no longer than 254
+characters. It should uniquely represent the volume and will be matched with
+what's stored in the cache.
+
+The caller may also specify the name of the cache to use. If specified,
+fscache will look up or create a cache cookie of that name and will use a cache
+of that name if it is online or comes online. If no cache name is specified,
+it will use the first cache that comes to hand and set the name to that.
+
+The specified coherency data is stored in the cookie and will be matched
+against coherency data stored on disk. The data pointer may be NULL if no data
+is provided. If the coherency data doesn't match, the entire cache volume will
+be invalidated.
+
+This function can return errors such as EBUSY if the volume key is already in
+use by an acquired volume or ENOMEM if an allocation failure occured. It may
+also return a NULL volume cookie if fscache is not enabled. It is safe to
+pass a NULL cookie to any function that takes a volume cookie. This will
+cause that function to do nothing.
+
+
+When the network filesystem has finished with a volume, it should relinquish it
+by calling::
+
+ void fscache_relinquish_volume(struct fscache_volume *volume,
+ const void *coherency_data,
+ bool invalidate);
+
+This will cause the volume to be committed or removed, and if sealed the
+coherency data will be set to the value supplied. The amount of coherency data
+must match the length specified when the volume was acquired. Note that all
+data cookies obtained in this volume must be relinquished before the volume is
+relinquished.
+
+
+Data File Registration
+======================
+
+Once it has a volume cookie, a network filesystem can use it to acquire a
+cookie for data storage::
+
+ struct fscache_cookie *
+ fscache_acquire_cookie(struct fscache_volume *volume,
+ u8 advice,
+ const void *index_key,
+ size_t index_key_len,
+ const void *aux_data,
+ size_t aux_data_len,
+ loff_t object_size)
+
+This creates the cookie in the volume using the specified index key. The index
+key is a binary blob of the given length and must be unique for the volume.
+This is saved into the cookie. There are no restrictions on the content, but
+its length shouldn't exceed about three quarters of the maximum filename length
+to allow for encoding.
+
+The caller should also pass in a piece of coherency data in aux_data. A buffer
+of size aux_data_len will be allocated and the coherency data copied in. It is
+assumed that the size is invariant over time. The coherency data is used to
+check the validity of data in the cache. Functions are provided by which the
+coherency data can be updated.
+
+The file size of the object being cached should also be provided. This may be
+used to trim the data and will be stored with the coherency data.
+
+This function never returns an error, though it may return a NULL cookie on
+allocation failure or if fscache is not enabled. It is safe to pass in a NULL
+volume cookie and pass the NULL cookie returned to any function that takes it.
+This will cause that function to do nothing.
+
+
+When the network filesystem has finished with a cookie, it should relinquish it
+by calling::
+
+ void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+ bool retire);
+
+This will cause fscache to either commit the storage backing the cookie or
+delete it.
+
+
+Marking A Cookie In-Use
+=======================
+
+Once a cookie has been acquired by a network filesystem, the filesystem should
+tell fscache when it intends to use the cookie (typically done on file open)
+and should say when it has finished with it (typically on file close)::
+
+ void fscache_use_cookie(struct fscache_cookie *cookie,
+ bool will_modify);
+ void fscache_unuse_cookie(struct fscache_cookie *cookie,
+ const void *aux_data,
+ const loff_t *object_size);
+
+The *use* function tells fscache that it will use the cookie and, additionally,
+indicate if the user is intending to modify the contents locally. If not yet
+done, this will trigger the cache backend to go and gather the resources it
+needs to access/store data in the cache. This is done in the background, and
+so may not be complete by the time the function returns.
+
+The *unuse* function indicates that a filesystem has finished using a cookie.
+It optionally updates the stored coherency data and object size and then
+decreases the in-use counter. When the last user unuses the cookie, it is
+scheduled for garbage collection. If not reused within a short time, the
+resources will be released to reduce system resource consumption.
+
+A cookie must be marked in-use before it can be accessed for read, write or
+resize - and an in-use mark must be kept whilst there is dirty data in the
+pagecache in order to avoid an oops due to trying to open a file during process
+exit.
+
+Note that in-use marks are cumulative. For each time a cookie is marked
+in-use, it must be unused.
+
+
+Resizing A Data File (Truncation)
+=================================
+
+If a network filesystem file is resized locally by truncation, the following
+should be called to notify the cache::
+
+ void fscache_resize_cookie(struct fscache_cookie *cookie,
+ loff_t new_size);
+
+The caller must have first marked the cookie in-use. The cookie and the new
+size are passed in and the cache is synchronously resized. This is expected to
+be called from ``->setattr()`` inode operation under the inode lock.
+
+
+Data I/O API
+============
+
+To do data I/O operations directly through a cookie, the following functions
+are available::
+
+ int fscache_begin_read_operation(struct netfs_cache_resources *cres,
+ struct fscache_cookie *cookie);
+ int fscache_read(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ enum netfs_read_from_hole read_hole,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv);
+ int fscache_write(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv);
+
+The *begin* function sets up an operation, attaching the resources required to
+the cache resources block from the cookie. Assuming it doesn't return an error
+(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do
+nothing), then one of the other two functions can be issued.
+
+The *read* and *write* functions initiate a direct-IO operation. Both take the
+previously set up cache resources block, an indication of the start file
+position, and an I/O iterator that describes buffer and indicates the amount of
+data.
+
+The read function also takes a parameter to indicate how it should handle a
+partially populated region (a hole) in the disk content. This may be to ignore
+it, skip over an initial hole and place zeros in the buffer or give an error.
+
+The read and write functions can be given an optional termination function that
+will be run on completion::
+
+ typedef
+ void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,
+ bool was_async);
+
+If a termination function is given, the operation will be run asynchronously
+and the termination function will be called upon completion. If not given, the
+operation will be run synchronously. Note that in the asynchronous case, it is
+possible for the operation to complete before the function returns.
+
+Both the read and write functions end the operation when they complete,
+detaching any pinned resources.
+
+The read operation will fail with ESTALE if invalidation occurred whilst the
+operation was ongoing.
+
+
+Data File Coherency
+===================
+
+To request an update of the coherency data and file size on a cookie, the
+following should be called::
+
+ void fscache_update_cookie(struct fscache_cookie *cookie,
+ const void *aux_data,
+ const loff_t *object_size);
+
+This will update the cookie's coherency data and/or file size.
+
+
+Data File Invalidation
+======================
+
+Sometimes it will be necessary to invalidate an object that contains data.
+Typically this will be necessary when the server informs the network filesystem
+of a remote third-party change - at which point the filesystem has to throw
+away the state and cached data that it had for an file and reload from the
+server.
+
+To indicate that a cache object should be invalidated, the following should be
+called::
+
+ void fscache_invalidate(struct fscache_cookie *cookie,
+ const void *aux_data,
+ loff_t size,
+ unsigned int flags);
+
+This increases the invalidation counter in the cookie to cause outstanding
+reads to fail with -ESTALE, sets the coherency data and file size from the
+information supplied, blocks new I/O on the cookie and dispatches the cache to
+go and get rid of the old data.
+
+Invalidation runs asynchronously in a worker thread so that it doesn't block
+too much.
+
+
+Write-Back Resource Management
+==============================
+
+To write data to the cache from network filesystem writeback, the cache
+resources required need to be pinned at the point the modification is made (for
+instance when the page is marked dirty) as it's not possible to open a file in
+a thread that's exiting.
+
+The following facilities are provided to manage this:
+
+ * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an
+ in-use is held on the cookie for this inode. It can only be changed if the
+ the inode lock is held.
+
+ * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control``
+ struct that gets set if ``__writeback_single_inode()`` clears
+ ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared.
+
+To support this, the following functions are provided::
+
+ bool fscache_dirty_folio(struct address_space *mapping,
+ struct folio *folio,
+ struct fscache_cookie *cookie);
+ void fscache_unpin_writeback(struct writeback_control *wbc,
+ struct fscache_cookie *cookie);
+ void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
+ struct inode *inode,
+ const void *aux);
+
+The *set* function is intended to be called from the filesystem's
+``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
+set, it sets that flag and increments the use count on the cookie (the caller
+must already have called ``fscache_use_cookie()``).
+
+The *unpin* function is intended to be called from the filesystem's
+``write_inode`` superblock operation. It cleans up after writing by unusing
+the cookie if unpinned_fscache_wb is set in the writeback_control struct.
+
+The *clear* function is intended to be called from the netfs's ``evict_inode``
+superblock operation. It must be called *after*
+``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans
+up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to
+be updated.
+
+
+Caching of Local Modifications
+==============================
+
+If a network filesystem has locally modified data that it wants to write to the
+cache, it needs to mark the pages to indicate that a write is in progress, and
+if the mark is already present, it needs to wait for it to be removed first
+(presumably due to an already in-progress operation). This prevents multiple
+competing DIO writes to the same storage in the cache.
+
+Firstly, the netfs should determine if caching is available by doing something
+like::
+
+ bool caching = fscache_cookie_enabled(cookie);
+
+If caching is to be attempted, pages should be waited for and then marked using
+the following functions provided by the netfs helper library::
+
+ void set_page_fscache(struct page *page);
+ void wait_on_page_fscache(struct page *page);
+ int wait_on_page_fscache_killable(struct page *page);
+
+Once all the pages in the span are marked, the netfs can ask fscache to
+schedule a write of that region::
+
+ void fscache_write_to_cache(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ loff_t start, size_t len, loff_t i_size,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv,
+ bool caching)
+
+And if an error occurs before that point is reached, the marks can be removed
+by calling::
+
+ void fscache_clear_page_bits(struct address_space *mapping,
+ loff_t start, size_t len,
+ bool caching)
+
+In these functions, a pointer to the mapping to which the source pages are
+attached is passed in and start and len indicate the size of the region that's
+going to be written (it doesn't have to align to page boundaries necessarily,
+but it does have to align to DIO boundaries on the backing filesystem). The
+caching parameter indicates if caching should be skipped, and if false, the
+functions do nothing.
+
+The write function takes some additional parameters: the cookie representing
+the cache object to be written to, i_size indicates the size of the netfs file
+and term_func indicates an optional completion function, to which
+term_func_priv will be passed, along with the error or amount written.
+
+Note that the write function will always run asynchronously and will unmark all
+the pages upon completion before calling term_func.
+
+
+Page Release and Invalidation
+=============================
+
+Fscache keeps track of whether we have any data in the cache yet for a cache
+object we've just created. It knows it doesn't have to do any reading until it
+has done a write and then the page it wrote from has been released by the VM,
+after which it *has* to look in the cache.
+
+To inform fscache that a page might now be in the cache, the following function
+should be called from the ``release_folio`` address space op::
+
+ void fscache_note_page_release(struct fscache_cookie *cookie);
+
+if the page has been released (ie. release_folio returned true).
+
+Page release and page invalidation should also wait for any mark left on the
+page to say that a DIO write is underway from that page::
+
+ void wait_on_page_fscache(struct page *page);
+ int wait_on_page_fscache_killable(struct page *page);
+
+
+API Function Reference
+======================
+
+.. kernel-doc:: include/linux/fscache.h