diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-11 08:27:49 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-11 08:27:49 +0000 |
commit | ace9429bb58fd418f0c81d4c2835699bddf6bde6 (patch) | |
tree | b2d64bc10158fdd5497876388cd68142ca374ed3 /Documentation/filesystems/caching | |
parent | Initial commit. (diff) | |
download | linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.tar.xz linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.zip |
Adding upstream version 6.6.15.upstream/6.6.15
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/filesystems/caching')
-rw-r--r-- | Documentation/filesystems/caching/backend-api.rst | 479 | ||||
-rw-r--r-- | Documentation/filesystems/caching/cachefiles.rst | 662 | ||||
-rw-r--r-- | Documentation/filesystems/caching/fscache.rst | 348 | ||||
-rw-r--r-- | Documentation/filesystems/caching/index.rst | 12 | ||||
-rw-r--r-- | Documentation/filesystems/caching/netfs-api.rst | 452 |
5 files changed, 1953 insertions, 0 deletions
diff --git a/Documentation/filesystems/caching/backend-api.rst b/Documentation/filesystems/caching/backend-api.rst new file mode 100644 index 0000000000..3a199fc508 --- /dev/null +++ b/Documentation/filesystems/caching/backend-api.rst @@ -0,0 +1,479 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Cache Backend API +================= + +The FS-Cache system provides an API by which actual caches can be supplied to +FS-Cache for it to then serve out to network filesystems and other interested +parties. This API is used by:: + + #include <linux/fscache-cache.h>. + + +Overview +======== + +Interaction with the API is handled on three levels: cache, volume and data +storage, and each level has its own type of cookie object: + + ======================= ======================= + COOKIE C TYPE + ======================= ======================= + Cache cookie struct fscache_cache + Volume cookie struct fscache_volume + Data storage cookie struct fscache_cookie + ======================= ======================= + +Cookies are used to provide some filesystem data to the cache, manage state and +pin the cache during access in addition to acting as reference points for the +API functions. Each cookie has a debugging ID that is included in trace points +to make it easier to correlate traces. Note, though, that debugging IDs are +simply allocated from incrementing counters and will eventually wrap. + +The cache backend and the network filesystem can both ask for cache cookies - +and if they ask for one of the same name, they'll get the same cookie. Volume +and data cookies, however, are created at the behest of the filesystem only. + + +Cache Cookies +============= + +Caches are represented in the API by cache cookies. These are objects of +type:: + + struct fscache_cache { + void *cache_priv; + unsigned int debug_id; + char *name; + ... + }; + +There are a few fields that the cache backend might be interested in. The +``debug_id`` can be used in tracing to match lines referring to the same cache +and ``name`` is the name the cache was registered with. The ``cache_priv`` +member is private data provided by the cache when it is brought online. The +other fields are for internal use. + + +Registering a Cache +=================== + +When a cache backend wants to bring a cache online, it should first register +the cache name and that will get it a cache cookie. This is done with:: + + struct fscache_cache *fscache_acquire_cache(const char *name); + +This will look up and potentially create a cache cookie. The cache cookie may +have already been created by a network filesystem looking for it, in which case +that cache cookie will be used. If the cache cookie is not in use by another +cache, it will be moved into the preparing state, otherwise it will return +busy. + +If successful, the cache backend can then start setting up the cache. In the +event that the initialisation fails, the cache backend should call:: + + void fscache_relinquish_cache(struct fscache_cache *cache); + +to reset and discard the cookie. + + +Bringing a Cache Online +======================= + +Once the cache is set up, it can be brought online by calling:: + + int fscache_add_cache(struct fscache_cache *cache, + const struct fscache_cache_ops *ops, + void *cache_priv); + +This stores the cache operations table pointer and cache private data into the +cache cookie and moves the cache to the active state, thereby allowing accesses +to take place. + + +Withdrawing a Cache From Service +================================ + +The cache backend can withdraw a cache from service by calling this function:: + + void fscache_withdraw_cache(struct fscache_cache *cache); + +This moves the cache to the withdrawn state to prevent new cache- and +volume-level accesses from starting and then waits for outstanding cache-level +accesses to complete. + +The cache must then go through the data storage objects it has and tell fscache +to withdraw them, calling:: + + void fscache_withdraw_cookie(struct fscache_cookie *cookie); + +on the cookie that each object belongs to. This schedules the specified cookie +for withdrawal. This gets offloaded to a workqueue. The cache backend can +wait for completion by calling:: + + void fscache_wait_for_objects(struct fscache_cache *cache); + +Once all the cookies are withdrawn, a cache backend can withdraw all the +volumes, calling:: + + void fscache_withdraw_volume(struct fscache_volume *volume); + +to tell fscache that a volume has been withdrawn. This waits for all +outstanding accesses on the volume to complete before returning. + +When the cache is completely withdrawn, fscache should be notified by +calling:: + + void fscache_relinquish_cache(struct fscache_cache *cache); + +to clear fields in the cookie and discard the caller's ref on it. + + +Volume Cookies +============== + +Within a cache, the data storage objects are organised into logical volumes. +These are represented in the API as objects of type:: + + struct fscache_volume { + struct fscache_cache *cache; + void *cache_priv; + unsigned int debug_id; + char *key; + unsigned int key_hash; + ... + u8 coherency_len; + u8 coherency[]; + }; + +There are a number of fields here that are of interest to the caching backend: + + * ``cache`` - The parent cache cookie. + + * ``cache_priv`` - A place for the cache to stash private data. + + * ``debug_id`` - A debugging ID for logging in tracepoints. + + * ``key`` - A printable string with no '/' characters in it that represents + the index key for the volume. The key is NUL-terminated and padded out to + a multiple of 4 bytes. + + * ``key_hash`` - A hash of the index key. This should work out the same, no + matter the cpu arch and endianness. + + * ``coherency`` - A piece of coherency data that should be checked when the + volume is bound to in the cache. + + * ``coherency_len`` - The amount of data in the coherency buffer. + + +Data Storage Cookies +==================== + +A volume is a logical group of data storage objects, each of which is +represented to the network filesystem by a cookie. Cookies are represented in +the API as objects of type:: + + struct fscache_cookie { + struct fscache_volume *volume; + void *cache_priv; + unsigned long flags; + unsigned int debug_id; + unsigned int inval_counter; + loff_t object_size; + u8 advice; + u32 key_hash; + u8 key_len; + u8 aux_len; + ... + }; + +The fields in the cookie that are of interest to the cache backend are: + + * ``volume`` - The parent volume cookie. + + * ``cache_priv`` - A place for the cache to stash private data. + + * ``flags`` - A collection of bit flags, including: + + * FSCACHE_COOKIE_NO_DATA_TO_READ - There is no data available in the + cache to be read as the cookie has been created or invalidated. + + * FSCACHE_COOKIE_NEEDS_UPDATE - The coherency data and/or object size has + been changed and needs committing. + + * FSCACHE_COOKIE_LOCAL_WRITE - The netfs's data has been modified + locally, so the cache object may be in an incoherent state with respect + to the server. + + * FSCACHE_COOKIE_HAVE_DATA - The backend should set this if it + successfully stores data into the cache. + + * FSCACHE_COOKIE_RETIRED - The cookie was invalidated when it was + relinquished and the cached data should be discarded. + + * ``debug_id`` - A debugging ID for logging in tracepoints. + + * ``inval_counter`` - The number of invalidations done on the cookie. + + * ``advice`` - Information about how the cookie is to be used. + + * ``key_hash`` - A hash of the index key. This should work out the same, no + matter the cpu arch and endianness. + + * ``key_len`` - The length of the index key. + + * ``aux_len`` - The length of the coherency data buffer. + +Each cookie has an index key, which may be stored inline to the cookie or +elsewhere. A pointer to this can be obtained by calling:: + + void *fscache_get_key(struct fscache_cookie *cookie); + +The index key is a binary blob, the storage for which is padded out to a +multiple of 4 bytes. + +Each cookie also has a buffer for coherency data. This may also be inline or +detached from the cookie and a pointer is obtained by calling:: + + void *fscache_get_aux(struct fscache_cookie *cookie); + + + +Cookie Accounting +================= + +Data storage cookies are counted and this is used to block cache withdrawal +completion until all objects have been destroyed. The following functions are +provided to the cache to deal with that:: + + void fscache_count_object(struct fscache_cache *cache); + void fscache_uncount_object(struct fscache_cache *cache); + void fscache_wait_for_objects(struct fscache_cache *cache); + +The count function records the allocation of an object in a cache and the +uncount function records its destruction. Warning: by the time the uncount +function returns, the cache may have been destroyed. + +The wait function can be used during the withdrawal procedure to wait for +fscache to finish withdrawing all the objects in the cache. When it completes, +there will be no remaining objects referring to the cache object or any volume +objects. + + +Cache Management API +==================== + +The cache backend implements the cache management API by providing a table of +operations that fscache can use to manage various aspects of the cache. These +are held in a structure of type:: + + struct fscache_cache_ops { + const char *name; + ... + }; + +This contains a printable name for the cache backend driver plus a number of +pointers to methods to allow fscache to request management of the cache: + + * Set up a volume cookie [optional]:: + + void (*acquire_volume)(struct fscache_volume *volume); + + This method is called when a volume cookie is being created. The caller + holds a cache-level access pin to prevent the cache from going away for + the duration. This method should set up the resources to access a volume + in the cache and should not return until it has done so. + + If successful, it can set ``cache_priv`` to its own data. + + + * Clean up volume cookie [optional]:: + + void (*free_volume)(struct fscache_volume *volume); + + This method is called when a volume cookie is being released if + ``cache_priv`` is set. + + + * Look up a cookie in the cache [mandatory]:: + + bool (*lookup_cookie)(struct fscache_cookie *cookie); + + This method is called to look up/create the resources needed to access the + data storage for a cookie. It is called from a worker thread with a + volume-level access pin in the cache to prevent it from being withdrawn. + + True should be returned if successful and false otherwise. If false is + returned, the withdraw_cookie op (see below) will be called. + + If lookup fails, but the object could still be created (e.g. it hasn't + been cached before), then:: + + void fscache_cookie_lookup_negative( + struct fscache_cookie *cookie); + + can be called to let the network filesystem proceed and start downloading + stuff whilst the cache backend gets on with the job of creating things. + + If successful, ``cookie->cache_priv`` can be set. + + + * Withdraw an object without any cookie access counts held [mandatory]:: + + void (*withdraw_cookie)(struct fscache_cookie *cookie); + + This method is called to withdraw a cookie from service. It will be + called when the cookie is relinquished by the netfs, withdrawn or culled + by the cache backend or closed after a period of non-use by fscache. + + The caller doesn't hold any access pins, but it is called from a + non-reentrant work item to manage races between the various ways + withdrawal can occur. + + The cookie will have the ``FSCACHE_COOKIE_RETIRED`` flag set on it if the + associated data is to be removed from the cache. + + + * Change the size of a data storage object [mandatory]:: + + void (*resize_cookie)(struct netfs_cache_resources *cres, + loff_t new_size); + + This method is called to inform the cache backend of a change in size of + the netfs file due to local truncation. The cache backend should make all + of the changes it needs to make before returning as this is done under the + netfs inode mutex. + + The caller holds a cookie-level access pin to prevent a race with + withdrawal and the netfs must have the cookie marked in-use to prevent + garbage collection or culling from removing any resources. + + + * Invalidate a data storage object [mandatory]:: + + bool (*invalidate_cookie)(struct fscache_cookie *cookie); + + This is called when the network filesystem detects a third-party + modification or when an O_DIRECT write is made locally. This requests + that the cache backend should throw away all the data in the cache for + this object and start afresh. It should return true if successful and + false otherwise. + + On entry, new I O/operations are blocked. Once the cache is in a position + to accept I/O again, the backend should release the block by calling:: + + void fscache_resume_after_invalidation(struct fscache_cookie *cookie); + + If the method returns false, caching will be withdrawn for this cookie. + + + * Prepare to make local modifications to the cache [mandatory]:: + + void (*prepare_to_write)(struct fscache_cookie *cookie); + + This method is called when the network filesystem finds that it is going + to need to modify the contents of the cache due to local writes or + truncations. This gives the cache a chance to note that a cache object + may be incoherent with respect to the server and may need writing back + later. This may also cause the cached data to be scrapped on later + rebinding if not properly committed. + + + * Begin an operation for the netfs lib [mandatory]:: + + bool (*begin_operation)(struct netfs_cache_resources *cres, + enum fscache_want_state want_state); + + This method is called when an I/O operation is being set up (read, write + or resize). The caller holds an access pin on the cookie and must have + marked the cookie as in-use. + + If it can, the backend should attach any resources it needs to keep around + to the netfs_cache_resources object and return true. + + If it can't complete the setup, it should return false. + + The want_state parameter indicates the state the caller needs the cache + object to be in and what it wants to do during the operation: + + * ``FSCACHE_WANT_PARAMS`` - The caller just wants to access cache + object parameters; it doesn't need to do data I/O yet. + + * ``FSCACHE_WANT_READ`` - The caller wants to read data. + + * ``FSCACHE_WANT_WRITE`` - The caller wants to write to or resize the + cache object. + + Note that there won't necessarily be anything attached to the cookie's + cache_priv yet if the cookie is still being created. + + +Data I/O API +============ + +A cache backend provides a data I/O API by through the netfs library's ``struct +netfs_cache_ops`` attached to a ``struct netfs_cache_resources`` by the +``begin_operation`` method described above. + +See the Documentation/filesystems/netfs_library.rst for a description. + + +Miscellaneous Functions +======================= + +FS-Cache provides some utilities that a cache backend may make use of: + + * Note occurrence of an I/O error in a cache:: + + void fscache_io_error(struct fscache_cache *cache); + + This tells FS-Cache that an I/O error occurred in the cache. This + prevents any new I/O from being started on the cache. + + This does not actually withdraw the cache. That must be done separately. + + * Note cessation of caching on a cookie due to failure:: + + void fscache_caching_failed(struct fscache_cookie *cookie); + + This notes that a the caching that was being done on a cookie failed in + some way, for instance the backing storage failed to be created or + invalidation failed and that no further I/O operations should take place + on it until the cache is reset. + + * Count I/O requests:: + + void fscache_count_read(void); + void fscache_count_write(void); + + These record reads and writes from/to the cache. The numbers are + displayed in /proc/fs/fscache/stats. + + * Count out-of-space errors:: + + void fscache_count_no_write_space(void); + void fscache_count_no_create_space(void); + + These record ENOSPC errors in the cache, divided into failures of data + writes and failures of filesystem object creations (e.g. mkdir). + + * Count objects culled:: + + void fscache_count_culled(void); + + This records the culling of an object. + + * Get the cookie from a set of cache resources:: + + struct fscache_cookie *fscache_cres_cookie(struct netfs_cache_resources *cres) + + Pull a pointer to the cookie from the cache resources. This may return a + NULL cookie if no cookie was set. + + +API Function Reference +====================== + +.. kernel-doc:: include/linux/fscache-cache.h diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst new file mode 100644 index 0000000000..e04a27bdbe --- /dev/null +++ b/Documentation/filesystems/caching/cachefiles.rst @@ -0,0 +1,662 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Cache on Already Mounted Filesystem +=================================== + +.. Contents: + + (*) Overview. + + (*) Requirements. + + (*) Configuration. + + (*) Starting the cache. + + (*) Things to avoid. + + (*) Cache culling. + + (*) Cache structure. + + (*) Security model and SELinux. + + (*) A note on security. + + (*) Statistical information. + + (*) Debugging. + + (*) On-demand Read. + + +Overview +======== + +CacheFiles is a caching backend that's meant to use as a cache a directory on +an already mounted filesystem of a local type (such as Ext3). + +CacheFiles uses a userspace daemon to do some of the cache management - such as +reaping stale nodes and culling. This is called cachefilesd and lives in +/sbin. + +The filesystem and data integrity of the cache are only as good as those of the +filesystem providing the backing services. Note that CacheFiles does not +attempt to journal anything since the journalling interfaces of the various +filesystems are very specific in nature. + +CacheFiles creates a misc character device - "/dev/cachefiles" - that is used +to communication with the daemon. Only one thing may have this open at once, +and while it is open, a cache is at least partially in existence. The daemon +opens this and sends commands down it to control the cache. + +CacheFiles is currently limited to a single cache. + +CacheFiles attempts to maintain at least a certain percentage of free space on +the filesystem, shrinking the cache by culling the objects it contains to make +space if necessary - see the "Cache Culling" section. This means it can be +placed on the same medium as a live set of data, and will expand to make use of +spare space and automatically contract when the set of data requires more +space. + + + +Requirements +============ + +The use of CacheFiles and its daemon requires the following features to be +available in the system and in the cache filesystem: + + - dnotify. + + - extended attributes (xattrs). + + - openat() and friends. + + - bmap() support on files in the filesystem (FIBMAP ioctl). + + - The use of bmap() to detect a partial page at the end of the file. + +It is strongly recommended that the "dir_index" option is enabled on Ext3 +filesystems being used as a cache. + + +Configuration +============= + +The cache is configured by a script in /etc/cachefilesd.conf. These commands +set up cache ready for use. The following script commands are available: + + brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>% + Configure the culling limits. Optional. See the section on culling + The defaults are 7% (run), 5% (cull) and 1% (stop) respectively. + + The commands beginning with a 'b' are file space (block) limits, those + beginning with an 'f' are file count limits. + + dir <path> + Specify the directory containing the root of the cache. Mandatory. + + tag <name> + Specify a tag to FS-Cache to use in distinguishing multiple caches. + Optional. The default is "CacheFiles". + + debug <mask> + Specify a numeric bitmask to control debugging in the kernel module. + Optional. The default is zero (all off). The following values can be + OR'd into the mask to collect various information: + + == ================================================= + 1 Turn on trace of function entry (_enter() macros) + 2 Turn on trace of function exit (_leave() macros) + 4 Turn on trace of internal debug points (_debug()) + == ================================================= + + This mask can also be set through sysfs, eg:: + + echo 5 >/sys/modules/cachefiles/parameters/debug + + +Starting the Cache +================== + +The cache is started by running the daemon. The daemon opens the cache device, +configures the cache and tells it to begin caching. At that point the cache +binds to fscache and the cache becomes live. + +The daemon is run as follows:: + + /sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>] + +The flags are: + + ``-d`` + Increase the debugging level. This can be specified multiple times and + is cumulative with itself. + + ``-s`` + Send messages to stderr instead of syslog. + + ``-n`` + Don't daemonise and go into background. + + ``-f <configfile>`` + Use an alternative configuration file rather than the default one. + + +Things to Avoid +=============== + +Do not mount other things within the cache as this will cause problems. The +kernel module contains its own very cut-down path walking facility that ignores +mountpoints, but the daemon can't avoid them. + +Do not create, rename or unlink files and directories in the cache while the +cache is active, as this may cause the state to become uncertain. + +Renaming files in the cache might make objects appear to be other objects (the +filename is part of the lookup key). + +Do not change or remove the extended attributes attached to cache files by the +cache as this will cause the cache state management to get confused. + +Do not create files or directories in the cache, lest the cache get confused or +serve incorrect data. + +Do not chmod files in the cache. The module creates things with minimal +permissions to prevent random users being able to access them directly. + + +Cache Culling +============= + +The cache may need culling occasionally to make space. This involves +discarding objects from the cache that have been used less recently than +anything else. Culling is based on the access time of data objects. Empty +directories are culled if not in use. + +Cache culling is done on the basis of the percentage of blocks and the +percentage of files available in the underlying filesystem. There are six +"limits": + + brun, frun + If the amount of free space and the number of available files in the cache + rises above both these limits, then culling is turned off. + + bcull, fcull + If the amount of available space or the number of available files in the + cache falls below either of these limits, then culling is started. + + bstop, fstop + If the amount of available space or the number of available files in the + cache falls below either of these limits, then no further allocation of + disk space or files is permitted until culling has raised things above + these limits again. + +These must be configured thusly:: + + 0 <= bstop < bcull < brun < 100 + 0 <= fstop < fcull < frun < 100 + +Note that these are percentages of available space and available files, and do +_not_ appear as 100 minus the percentage displayed by the "df" program. + +The userspace daemon scans the cache to build up a table of cullable objects. +These are then culled in least recently used order. A new scan of the cache is +started as soon as space is made in the table. Objects will be skipped if +their atimes have changed or if the kernel module says it is still using them. + + +Cache Structure +=============== + +The CacheFiles module will create two directories in the directory it was +given: + + * cache/ + * graveyard/ + +The active cache objects all reside in the first directory. The CacheFiles +kernel module moves any retired or culled objects that it can't simply unlink +to the graveyard from which the daemon will actually delete them. + +The daemon uses dnotify to monitor the graveyard directory, and will delete +anything that appears therein. + + +The module represents index objects as directories with the filename "I..." or +"J...". Note that the "cache/" directory is itself a special index. + +Data objects are represented as files if they have no children, or directories +if they do. Their filenames all begin "D..." or "E...". If represented as a +directory, data objects will have a file in the directory called "data" that +actually holds the data. + +Special objects are similar to data objects, except their filenames begin +"S..." or "T...". + + +If an object has children, then it will be represented as a directory. +Immediately in the representative directory are a collection of directories +named for hash values of the child object keys with an '@' prepended. Into +this directory, if possible, will be placed the representations of the child +objects:: + + /INDEX /INDEX /INDEX /DATA FILES + /=========/==========/=================================/================ + cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400 + cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry + cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry + cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry + + +If the key is so long that it exceeds NAME_MAX with the decorations added on to +it, then it will be cut into pieces, the first few of which will be used to +make a nest of directories, and the last one of which will be the objects +inside the last directory. The names of the intermediate directories will have +'+' prepended:: + + J1223/@23/+xy...z/+kl...m/Epqr + + +Note that keys are raw data, and not only may they exceed NAME_MAX in size, +they may also contain things like '/' and NUL characters, and so they may not +be suitable for turning directly into a filename. + +To handle this, CacheFiles will use a suitably printable filename directly and +"base-64" encode ones that aren't directly suitable. The two versions of +object filenames indicate the encoding: + + =============== =============== =============== + OBJECT TYPE PRINTABLE ENCODED + =============== =============== =============== + Index "I..." "J..." + Data "D..." "E..." + Special "S..." "T..." + =============== =============== =============== + +Intermediate directories are always "@" or "+" as appropriate. + + +Each object in the cache has an extended attribute label that holds the object +type ID (required to distinguish special objects) and the auxiliary data from +the netfs. The latter is used to detect stale objects in the cache and update +or retire them. + + +Note that CacheFiles will erase from the cache any file it doesn't recognise or +any file of an incorrect type (such as a FIFO file or a device file). + + +Security Model and SELinux +========================== + +CacheFiles is implemented to deal properly with the LSM security features of +the Linux kernel and the SELinux facility. + +One of the problems that CacheFiles faces is that it is generally acting on +behalf of a process, and running in that process's context, and that includes a +security context that is not appropriate for accessing the cache - either +because the files in the cache are inaccessible to that process, or because if +the process creates a file in the cache, that file may be inaccessible to other +processes. + +The way CacheFiles works is to temporarily change the security context (fsuid, +fsgid and actor security label) that the process acts as - without changing the +security context of the process when it the target of an operation performed by +some other process (so signalling and suchlike still work correctly). + + +When the CacheFiles module is asked to bind to its cache, it: + + (1) Finds the security label attached to the root cache directory and uses + that as the security label with which it will create files. By default, + this is:: + + cachefiles_var_t + + (2) Finds the security label of the process which issued the bind request + (presumed to be the cachefilesd daemon), which by default will be:: + + cachefilesd_t + + and asks LSM to supply a security ID as which it should act given the + daemon's label. By default, this will be:: + + cachefiles_kernel_t + + SELinux transitions the daemon's security ID to the module's security ID + based on a rule of this form in the policy:: + + type_transition <daemon's-ID> kernel_t : process <module's-ID>; + + For instance:: + + type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t; + + +The module's security ID gives it permission to create, move and remove files +and directories in the cache, to find and access directories and files in the +cache, to set and access extended attributes on cache objects, and to read and +write files in the cache. + +The daemon's security ID gives it only a very restricted set of permissions: it +may scan directories, stat files and erase files and directories. It may +not read or write files in the cache, and so it is precluded from accessing the +data cached therein; nor is it permitted to create new files in the cache. + + +There are policy source files available in: + + https://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2 + +and later versions. In that tarball, see the files:: + + cachefilesd.te + cachefilesd.fc + cachefilesd.if + +They are built and installed directly by the RPM. + +If a non-RPM based system is being used, then copy the above files to their own +directory and run:: + + make -f /usr/share/selinux/devel/Makefile + semodule -i cachefilesd.pp + +You will need checkpolicy and selinux-policy-devel installed prior to the +build. + + +By default, the cache is located in /var/fscache, but if it is desirable that +it should be elsewhere, than either the above policy files must be altered, or +an auxiliary policy must be installed to label the alternate location of the +cache. + +For instructions on how to add an auxiliary policy to enable the cache to be +located elsewhere when SELinux is in enforcing mode, please see:: + + /usr/share/doc/cachefilesd-*/move-cache.txt + +When the cachefilesd rpm is installed; alternatively, the document can be found +in the sources. + + +A Note on Security +================== + +CacheFiles makes use of the split security in the task_struct. It allocates +its own task_security structure, and redirects current->cred to point to it +when it acts on behalf of another process, in that process's context. + +The reason it does this is that it calls vfs_mkdir() and suchlike rather than +bypassing security and calling inode ops directly. Therefore the VFS and LSM +may deny the CacheFiles access to the cache data because under some +circumstances the caching code is running in the security context of whatever +process issued the original syscall on the netfs. + +Furthermore, should CacheFiles create a file or directory, the security +parameters with that object is created (UID, GID, security label) would be +derived from that process that issued the system call, thus potentially +preventing other processes from accessing the cache - including CacheFiles's +cache management daemon (cachefilesd). + +What is required is to temporarily override the security of the process that +issued the system call. We can't, however, just do an in-place change of the +security data as that affects the process as an object, not just as a subject. +This means it may lose signals or ptrace events for example, and affects what +the process looks like in /proc. + +So CacheFiles makes use of a logical split in the security between the +objective security (task->real_cred) and the subjective security (task->cred). +The objective security holds the intrinsic security properties of a process and +is never overridden. This is what appears in /proc, and is what is used when a +process is the target of an operation by some other process (SIGKILL for +example). + +The subjective security holds the active security properties of a process, and +may be overridden. This is not seen externally, and is used when a process +acts upon another object, for example SIGKILLing another process or opening a +file. + +LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request +for CacheFiles to run in a context of a specific security label, or to create +files and directories with another security label. + + +Statistical Information +======================= + +If FS-Cache is compiled with the following option enabled:: + + CONFIG_CACHEFILES_HISTOGRAM=y + +then it will gather certain statistics and display them through a proc file. + + /proc/fs/cachefiles/histogram + + :: + + cat /proc/fs/cachefiles/histogram + JIFS SECS LOOKUPS MKDIRS CREATES + ===== ===== ========= ========= ========= + + This shows the breakdown of the number of times each amount of time + between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The + columns are as follows: + + ======= ======================================================= + COLUMN TIME MEASUREMENT + ======= ======================================================= + LOOKUPS Length of time to perform a lookup on the backing fs + MKDIRS Length of time to perform a mkdir on the backing fs + CREATES Length of time to perform a create on the backing fs + ======= ======================================================= + + Each row shows the number of events that took a particular range of times. + Each step is 1 jiffy in size. The JIFS column indicates the particular + jiffy range covered, and the SECS field the equivalent number of seconds. + + +Debugging +========= + +If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime +debugging enabled by adjusting the value in:: + + /sys/module/cachefiles/parameters/debug + +This is a bitmask of debugging streams to enable: + + ======= ======= =============================== ======================= + BIT VALUE STREAM POINT + ======= ======= =============================== ======================= + 0 1 General Function entry trace + 1 2 Function exit trace + 2 4 General + ======= ======= =============================== ======================= + +The appropriate set of values should be OR'd together and the result written to +the control file. For example:: + + echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug + +will turn on all function entry debugging. + + +On-demand Read +============== + +When working in its original mode, CacheFiles serves as a local cache for a +remote networking fs - while in on-demand read mode, CacheFiles can boost the +scenario where on-demand read semantics are needed, e.g. container image +distribution. + +The essential difference between these two modes is seen when a cache miss +occurs: In the original mode, the netfs will fetch the data from the remote +server and then write it to the cache file; in on-demand read mode, fetching +the data and writing it into the cache is delegated to a user daemon. + +``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode. + + +Protocol Communication +---------------------- + +The on-demand read mode uses a simple protocol for communication between kernel +and user daemon. The protocol can be modeled as:: + + kernel --[request]--> user daemon --[reply]--> kernel + +CacheFiles will send requests to the user daemon when needed. The user daemon +should poll the devnode ('/dev/cachefiles') to check if there's a pending +request to be processed. A POLLIN event will be returned when there's a pending +request. + +The user daemon then reads the devnode to fetch a request to process. It should +be noted that each read only gets one request. When it has finished processing +the request, the user daemon should write the reply to the devnode. + +Each request starts with a message header of the form:: + + struct cachefiles_msg { + __u32 msg_id; + __u32 opcode; + __u32 len; + __u32 object_id; + __u8 data[]; + }; + +where: + + * ``msg_id`` is a unique ID identifying this request among all pending + requests. + + * ``opcode`` indicates the type of this request. + + * ``object_id`` is a unique ID identifying the cache file operated on. + + * ``data`` indicates the payload of this request. + + * ``len`` indicates the whole length of this request, including the + header and following type-specific payload. + + +Turning on On-demand Mode +------------------------- + +An optional parameter becomes available to the "bind" command:: + + bind [ondemand] + +When the "bind" command is given no argument, it defaults to the original mode. +When it is given the "ondemand" argument, i.e. "bind ondemand", on-demand read +mode will be enabled. + + +The OPEN Request +---------------- + +When the netfs opens a cache file for the first time, a request with the +CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user +daemon. The payload format is of the form:: + + struct cachefiles_open { + __u32 volume_key_size; + __u32 cookie_key_size; + __u32 fd; + __u32 flags; + __u8 data[]; + }; + +where: + + * ``data`` contains the volume_key followed directly by the cookie_key. + The volume key is a NUL-terminated string; the cookie key is binary + data. + + * ``volume_key_size`` indicates the size of the volume key in bytes. + + * ``cookie_key_size`` indicates the size of the cookie key in bytes. + + * ``fd`` indicates an anonymous fd referring to the cache file, through + which the user daemon can perform write/llseek file operations on the + cache file. + + +The user daemon can use the given (volume_key, cookie_key) pair to distinguish +the requested cache file. With the given anonymous fd, the user daemon can +fetch the data and write it to the cache file in the background, even when +kernel has not triggered a cache miss yet. + +Be noted that each cache file has a unique object_id, while it may have multiple +anonymous fds. The user daemon may duplicate anonymous fds from the initial +anonymous fd indicated by the @fd field through dup(). Thus each object_id can +be mapped to multiple anonymous fds, while the usr daemon itself needs to +maintain the mapping. + +When implementing a user daemon, please be careful of RLIMIT_NOFILE, +``/proc/sys/fs/nr_open`` and ``/proc/sys/fs/file-max``. Typically these needn't +be huge since they're related to the number of open device blobs rather than +open files of each individual filesystem. + +The user daemon should reply the OPEN request by issuing a "copen" (complete +open) command on the devnode:: + + copen <msg_id>,<cache_size> + +where: + + * ``msg_id`` must match the msg_id field of the OPEN request. + + * When >= 0, ``cache_size`` indicates the size of the cache file; + when < 0, ``cache_size`` indicates any error code encountered by the + user daemon. + + +The CLOSE Request +----------------- + +When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be +sent to the user daemon. This tells the user daemon to close all anonymous fds +associated with the given object_id. The CLOSE request has no extra payload, +and shouldn't be replied. + + +The READ Request +---------------- + +When a cache miss is encountered in on-demand read mode, CacheFiles will send a +READ request (opcode CACHEFILES_OP_READ) to the user daemon. This tells the user +daemon to fetch the contents of the requested file range. The payload is of the +form:: + + struct cachefiles_read { + __u64 off; + __u64 len; + }; + +where: + + * ``off`` indicates the starting offset of the requested file range. + + * ``len`` indicates the length of the requested file range. + + +When it receives a READ request, the user daemon should fetch the requested data +and write it to the cache file identified by object_id. + +When it has finished processing the READ request, the user daemon should reply +by using the CACHEFILES_IOC_READ_COMPLETE ioctl on one of the anonymous fds +associated with the object_id given in the READ request. The ioctl is of the +form:: + + ioctl(fd, CACHEFILES_IOC_READ_COMPLETE, msg_id); + +where: + + * ``fd`` is one of the anonymous fds associated with the object_id + given. + + * ``msg_id`` must match the msg_id field of the READ request. diff --git a/Documentation/filesystems/caching/fscache.rst b/Documentation/filesystems/caching/fscache.rst new file mode 100644 index 0000000000..a74d7b052d --- /dev/null +++ b/Documentation/filesystems/caching/fscache.rst @@ -0,0 +1,348 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +General Filesystem Caching +========================== + +Overview +======== + +This facility is a general purpose cache for network filesystems, though it +could be used for caching other things such as ISO9660 filesystems too. + +FS-Cache mediates between cache backends (such as CacheFiles) and network +filesystems:: + + +---------+ + | | +--------------+ + | NFS |--+ | | + | | | +-->| CacheFS | + +---------+ | +----------+ | | /dev/hda5 | + | | | | +--------------+ + +---------+ +-------------->| | | + | | +-------+ | |--+ + | AFS |----->| | | FS-Cache | + | | | netfs |-->| |--+ + +---------+ +-->| lib | | | | + | | | | | | +--------------+ + +---------+ | +-------+ +----------+ | | | + | | | +-->| CacheFiles | + | 9P |--+ | /var/cache | + | | +--------------+ + +---------+ + +Or to look at it another way, FS-Cache is a module that provides a caching +facility to a network filesystem such that the cache is transparent to the +user:: + + +---------+ + | | + | Server | + | | + +---------+ + | NETWORK + ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + | + | +----------+ + V | | + +---------+ | | + | | | | + | NFS |----->| FS-Cache | + | | | |--+ + +---------+ | | | +--------------+ +--------------+ + | | | | | | | | + V +----------+ +-->| CacheFiles |-->| Ext3 | + +---------+ | /var/cache | | /dev/sda6 | + | | +--------------+ +--------------+ + | VFS | ^ ^ + | | | | + +---------+ +--------------+ | + | KERNEL SPACE | | + ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~ + | USER SPACE | | + V | | + +---------+ +--------------+ + | | | | + | Process | | cachefilesd | + | | | | + +---------+ +--------------+ + + +FS-Cache does not follow the idea of completely loading every netfs file +opened in its entirety into a cache before permitting it to be accessed and +then serving the pages out of that cache rather than the netfs inode because: + + (1) It must be practical to operate without a cache. + + (2) The size of any accessible file must not be limited to the size of the + cache. + + (3) The combined size of all opened files (this includes mapped libraries) + must not be limited to the size of the cache. + + (4) The user should not be forced to download an entire file just to do a + one-off access of a small portion of it (such as might be done with the + "file" program). + +It instead serves the cache out in chunks as and when requested by the netfs +using it. + + +FS-Cache provides the following facilities: + + * More than one cache can be used at once. Caches can be selected + explicitly by use of tags. + + * Caches can be added / removed at any time, even whilst being accessed. + + * The netfs is provided with an interface that allows either party to + withdraw caching facilities from a file (required for (2)). + + * The interface to the netfs returns as few errors as possible, preferring + rather to let the netfs remain oblivious. + + * There are three types of cookie: cache, volume and data file cookies. + Cache cookies represent the cache as a whole and are not normally visible + to the netfs; the netfs gets a volume cookie to represent a collection of + files (typically something that a netfs would get for a superblock); and + data file cookies are used to cache data (something that would be got for + an inode). + + * Volumes are matched using a key. This is a printable string that is used + to encode all the information that might be needed to distinguish one + superblock, say, from another. This would be a compound of things like + cell name or server address, volume name or share path. It must be a + valid pathname. + + * Cookies are matched using a key. This is a binary blob and is used to + represent the object within a volume (so the volume key need not form + part of the blob). This might include things like an inode number and + uniquifier or a file handle. + + * Cookie resources are set up and pinned by marking the cookie in-use. + This prevents the backing resources from being culled. Timed garbage + collection is employed to eliminate cookies that haven't been used for a + short while, thereby reducing resource overload. This is intended to be + used when a file is opened or closed. + + A cookie can be marked in-use multiple times simultaneously; each mark + must be unused. + + * Begin/end access functions are provided to delay cache withdrawal for the + duration of an operation and prevent structs from being freed whilst + we're looking at them. + + * Data I/O is done by asynchronous DIO to/from a buffer described by the + netfs using an iov_iter. + + * An invalidation facility is available to discard data from the cache and + to deal with I/O that's in progress that is accessing old data. + + * Cookies can be "retired" upon release, thereby causing the object to be + removed from the cache. + + +The netfs API to FS-Cache can be found in: + + Documentation/filesystems/caching/netfs-api.rst + +The cache backend API to FS-Cache can be found in: + + Documentation/filesystems/caching/backend-api.rst + + +Statistical Information +======================= + +If FS-Cache is compiled with the following options enabled:: + + CONFIG_FSCACHE_STATS=y + +then it will gather certain statistics and display them through: + + /proc/fs/fscache/stats + +This shows counts of a number of events that can happen in FS-Cache: + ++--------------+-------+-------------------------------------------------------+ +|CLASS |EVENT |MEANING | ++==============+=======+=======================================================+ +|Cookies |n=N |Number of data storage cookies allocated | ++ +-------+-------------------------------------------------------+ +| |v=N |Number of volume index cookies allocated | ++ +-------+-------------------------------------------------------+ +| |vcol=N |Number of volume index key collisions | ++ +-------+-------------------------------------------------------+ +| |voom=N |Number of OOM events when allocating volume cookies | ++--------------+-------+-------------------------------------------------------+ +|Acquire |n=N |Number of acquire cookie requests seen | ++ +-------+-------------------------------------------------------+ +| |ok=N |Number of acq reqs succeeded | ++ +-------+-------------------------------------------------------+ +| |oom=N |Number of acq reqs failed on ENOMEM | ++--------------+-------+-------------------------------------------------------+ +|LRU |n=N |Number of cookies currently on the LRU | ++ +-------+-------------------------------------------------------+ +| |exp=N |Number of cookies expired off of the LRU | ++ +-------+-------------------------------------------------------+ +| |rmv=N |Number of cookies removed from the LRU | ++ +-------+-------------------------------------------------------+ +| |drp=N |Number of LRU'd cookies relinquished/withdrawn | ++ +-------+-------------------------------------------------------+ +| |at=N |Time till next LRU cull (jiffies) | ++--------------+-------+-------------------------------------------------------+ +|Invals |n=N |Number of invalidations | ++--------------+-------+-------------------------------------------------------+ +|Updates |n=N |Number of update cookie requests seen | ++ +-------+-------------------------------------------------------+ +| |rsz=N |Number of resize requests | ++ +-------+-------------------------------------------------------+ +| |rsn=N |Number of skipped resize requests | ++--------------+-------+-------------------------------------------------------+ +|Relinqs |n=N |Number of relinquish cookie requests seen | ++ +-------+-------------------------------------------------------+ +| |rtr=N |Number of rlq reqs with retire=true | ++ +-------+-------------------------------------------------------+ +| |drop=N |Number of cookies no longer blocking re-acquisition | ++--------------+-------+-------------------------------------------------------+ +|NoSpace |nwr=N |Number of write requests refused due to lack of space | ++ +-------+-------------------------------------------------------+ +| |ncr=N |Number of create requests refused due to lack of space | ++ +-------+-------------------------------------------------------+ +| |cull=N |Number of objects culled to make space | ++--------------+-------+-------------------------------------------------------+ +|IO |rd=N |Number of read operations in the cache | ++ +-------+-------------------------------------------------------+ +| |wr=N |Number of write operations in the cache | ++--------------+-------+-------------------------------------------------------+ + +Netfslib will also add some stats counters of its own. + + +Cache List +========== + +FS-Cache provides a list of cache cookies: + + /proc/fs/fscache/cookies + +This will look something like:: + + # cat /proc/fs/fscache/caches + CACHE REF VOLS OBJS ACCES S NAME + ======== ===== ===== ===== ===== = =============== + 00000001 2 1 2123 1 A default + +where the columns are: + + ======= =============================================================== + COLUMN DESCRIPTION + ======= =============================================================== + CACHE Cache cookie debug ID (also appears in traces) + REF Number of references on the cache cookie + VOLS Number of volumes cookies in this cache + OBJS Number of cache objects in use + ACCES Number of accesses pinning the cache + S State + NAME Name of the cache. + ======= =============================================================== + +The state can be (-) Inactive, (P)reparing, (A)ctive, (E)rror or (W)ithdrawing. + + +Volume List +=========== + +FS-Cache provides a list of volume cookies: + + /proc/fs/fscache/volumes + +This will look something like:: + + VOLUME REF nCOOK ACC FL CACHE KEY + ======== ===== ===== === == =============== ================ + 00000001 55 54 1 00 default afs,example.com,100058 + +where the columns are: + + ======= =============================================================== + COLUMN DESCRIPTION + ======= =============================================================== + VOLUME The volume cookie debug ID (also appears in traces) + REF Number of references on the volume cookie + nCOOK Number of cookies in the volume + ACC Number of accesses pinning the cache + FL Flags on the volume cookie + CACHE Name of the cache or "-" + KEY The indexing key for the volume + ======= =============================================================== + + +Cookie List +=========== + +FS-Cache provides a list of cookies: + + /proc/fs/fscache/cookies + +This will look something like:: + + # head /proc/fs/fscache/cookies + COOKIE VOLUME REF ACT ACC S FL DEF + ======== ======== === === === = == ================ + 00000435 00000001 1 0 -1 - 08 0000000201d080070000000000000000, 0000000000000000 + 00000436 00000001 1 0 -1 - 00 0000005601d080080000000000000000, 0000000000000051 + 00000437 00000001 1 0 -1 - 08 00023b3001d0823f0000000000000000, 0000000000000000 + 00000438 00000001 1 0 -1 - 08 0000005801d0807b0000000000000000, 0000000000000000 + 00000439 00000001 1 0 -1 - 08 00023b3201d080a10000000000000000, 0000000000000000 + 0000043a 00000001 1 0 -1 - 08 00023b3401d080a30000000000000000, 0000000000000000 + 0000043b 00000001 1 0 -1 - 08 00023b3601d080b30000000000000000, 0000000000000000 + 0000043c 00000001 1 0 -1 - 08 00023b3801d080b40000000000000000, 0000000000000000 + +where the columns are: + + ======= =============================================================== + COLUMN DESCRIPTION + ======= =============================================================== + COOKIE The cookie debug ID (also appears in traces) + VOLUME The parent volume cookie debug ID + REF Number of references on the volume cookie + ACT Number of times the cookie is marked for in use + ACC Number of access pins in the cookie + S State of the cookie + FL Flags on the cookie + DEF Key, auxiliary data + ======= =============================================================== + + +Debugging +========= + +If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime +debugging enabled by adjusting the value in:: + + /sys/module/fscache/parameters/debug + +This is a bitmask of debugging streams to enable: + + ======= ======= =============================== ======================= + BIT VALUE STREAM POINT + ======= ======= =============================== ======================= + 0 1 Cache management Function entry trace + 1 2 Function exit trace + 2 4 General + 3 8 Cookie management Function entry trace + 4 16 Function exit trace + 5 32 General + 6-8 (Not used) + 9 512 I/O operation management Function entry trace + 10 1024 Function exit trace + 11 2048 General + ======= ======= =============================== ======================= + +The appropriate set of values should be OR'd together and the result written to +the control file. For example:: + + echo $((1|8|512)) >/sys/module/fscache/parameters/debug + +will turn on all function entry debugging. diff --git a/Documentation/filesystems/caching/index.rst b/Documentation/filesystems/caching/index.rst new file mode 100644 index 0000000000..df4307124b --- /dev/null +++ b/Documentation/filesystems/caching/index.rst @@ -0,0 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Filesystem Caching +================== + +.. toctree:: + :maxdepth: 2 + + fscache + netfs-api + backend-api + cachefiles diff --git a/Documentation/filesystems/caching/netfs-api.rst b/Documentation/filesystems/caching/netfs-api.rst new file mode 100644 index 0000000000..665b27f155 --- /dev/null +++ b/Documentation/filesystems/caching/netfs-api.rst @@ -0,0 +1,452 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +Network Filesystem Caching API +============================== + +Fscache provides an API by which a network filesystem can make use of local +caching facilities. The API is arranged around a number of principles: + + (1) A cache is logically organised into volumes and data storage objects + within those volumes. + + (2) Volumes and data storage objects are represented by various types of + cookie. + + (3) Cookies have keys that distinguish them from their peers. + + (4) Cookies have coherency data that allows a cache to determine if the + cached data is still valid. + + (5) I/O is done asynchronously where possible. + +This API is used by:: + + #include <linux/fscache.h>. + +.. This document contains the following sections: + + (1) Overview + (2) Volume registration + (3) Data file registration + (4) Declaring a cookie to be in use + (5) Resizing a data file (truncation) + (6) Data I/O API + (7) Data file coherency + (8) Data file invalidation + (9) Write back resource management + (10) Caching of local modifications + (11) Page release and invalidation + + +Overview +======== + +The fscache hierarchy is organised on two levels from a network filesystem's +point of view. The upper level represents "volumes" and the lower level +represents "data storage objects". These are represented by two types of +cookie, hereafter referred to as "volume cookies" and "cookies". + +A network filesystem acquires a volume cookie for a volume using a volume key, +which represents all the information that defines that volume (e.g. cell name +or server address, volume ID or share name). This must be rendered as a +printable string that can be used as a directory name (ie. no '/' characters +and shouldn't begin with a '.'). The maximum name length is one less than the +maximum size of a filename component (allowing the cache backend one char for +its own purposes). + +A filesystem would typically have a volume cookie for each superblock. + +The filesystem then acquires a cookie for each file within that volume using an +object key. Object keys are binary blobs and only need to be unique within +their parent volume. The cache backend is responsible for rendering the binary +blob into something it can use and may employ hash tables, trees or whatever to +improve its ability to find an object. This is transparent to the network +filesystem. + +A filesystem would typically have a cookie for each inode, and would acquire it +in iget and relinquish it when evicting the cookie. + +Once it has a cookie, the filesystem needs to mark the cookie as being in use. +This causes fscache to send the cache backend off to look up/create resources +for the cookie in the background, to check its coherency and, if necessary, to +mark the object as being under modification. + +A filesystem would typically "use" the cookie in its file open routine and +unuse it in file release and it needs to use the cookie around calls to +truncate the cookie locally. It *also* needs to use the cookie when the +pagecache becomes dirty and unuse it when writeback is complete. This is +slightly tricky, and provision is made for it. + +When performing a read, write or resize on a cookie, the filesystem must first +begin an operation. This copies the resources into a holding struct and puts +extra pins into the cache to stop cache withdrawal from tearing down the +structures being used. The actual operation can then be issued and conflicting +invalidations can be detected upon completion. + +The filesystem is expected to use netfslib to access the cache, but that's not +actually required and it can use the fscache I/O API directly. + + +Volume Registration +=================== + +The first step for a network filesystem is to acquire a volume cookie for the +volume it wants to access:: + + struct fscache_volume * + fscache_acquire_volume(const char *volume_key, + const char *cache_name, + const void *coherency_data, + size_t coherency_len); + +This function creates a volume cookie with the specified volume key as its name +and notes the coherency data. + +The volume key must be a printable string with no '/' characters in it. It +should begin with the name of the filesystem and should be no longer than 254 +characters. It should uniquely represent the volume and will be matched with +what's stored in the cache. + +The caller may also specify the name of the cache to use. If specified, +fscache will look up or create a cache cookie of that name and will use a cache +of that name if it is online or comes online. If no cache name is specified, +it will use the first cache that comes to hand and set the name to that. + +The specified coherency data is stored in the cookie and will be matched +against coherency data stored on disk. The data pointer may be NULL if no data +is provided. If the coherency data doesn't match, the entire cache volume will +be invalidated. + +This function can return errors such as EBUSY if the volume key is already in +use by an acquired volume or ENOMEM if an allocation failure occurred. It may +also return a NULL volume cookie if fscache is not enabled. It is safe to +pass a NULL cookie to any function that takes a volume cookie. This will +cause that function to do nothing. + + +When the network filesystem has finished with a volume, it should relinquish it +by calling:: + + void fscache_relinquish_volume(struct fscache_volume *volume, + const void *coherency_data, + bool invalidate); + +This will cause the volume to be committed or removed, and if sealed the +coherency data will be set to the value supplied. The amount of coherency data +must match the length specified when the volume was acquired. Note that all +data cookies obtained in this volume must be relinquished before the volume is +relinquished. + + +Data File Registration +====================== + +Once it has a volume cookie, a network filesystem can use it to acquire a +cookie for data storage:: + + struct fscache_cookie * + fscache_acquire_cookie(struct fscache_volume *volume, + u8 advice, + const void *index_key, + size_t index_key_len, + const void *aux_data, + size_t aux_data_len, + loff_t object_size) + +This creates the cookie in the volume using the specified index key. The index +key is a binary blob of the given length and must be unique for the volume. +This is saved into the cookie. There are no restrictions on the content, but +its length shouldn't exceed about three quarters of the maximum filename length +to allow for encoding. + +The caller should also pass in a piece of coherency data in aux_data. A buffer +of size aux_data_len will be allocated and the coherency data copied in. It is +assumed that the size is invariant over time. The coherency data is used to +check the validity of data in the cache. Functions are provided by which the +coherency data can be updated. + +The file size of the object being cached should also be provided. This may be +used to trim the data and will be stored with the coherency data. + +This function never returns an error, though it may return a NULL cookie on +allocation failure or if fscache is not enabled. It is safe to pass in a NULL +volume cookie and pass the NULL cookie returned to any function that takes it. +This will cause that function to do nothing. + + +When the network filesystem has finished with a cookie, it should relinquish it +by calling:: + + void fscache_relinquish_cookie(struct fscache_cookie *cookie, + bool retire); + +This will cause fscache to either commit the storage backing the cookie or +delete it. + + +Marking A Cookie In-Use +======================= + +Once a cookie has been acquired by a network filesystem, the filesystem should +tell fscache when it intends to use the cookie (typically done on file open) +and should say when it has finished with it (typically on file close):: + + void fscache_use_cookie(struct fscache_cookie *cookie, + bool will_modify); + void fscache_unuse_cookie(struct fscache_cookie *cookie, + const void *aux_data, + const loff_t *object_size); + +The *use* function tells fscache that it will use the cookie and, additionally, +indicate if the user is intending to modify the contents locally. If not yet +done, this will trigger the cache backend to go and gather the resources it +needs to access/store data in the cache. This is done in the background, and +so may not be complete by the time the function returns. + +The *unuse* function indicates that a filesystem has finished using a cookie. +It optionally updates the stored coherency data and object size and then +decreases the in-use counter. When the last user unuses the cookie, it is +scheduled for garbage collection. If not reused within a short time, the +resources will be released to reduce system resource consumption. + +A cookie must be marked in-use before it can be accessed for read, write or +resize - and an in-use mark must be kept whilst there is dirty data in the +pagecache in order to avoid an oops due to trying to open a file during process +exit. + +Note that in-use marks are cumulative. For each time a cookie is marked +in-use, it must be unused. + + +Resizing A Data File (Truncation) +================================= + +If a network filesystem file is resized locally by truncation, the following +should be called to notify the cache:: + + void fscache_resize_cookie(struct fscache_cookie *cookie, + loff_t new_size); + +The caller must have first marked the cookie in-use. The cookie and the new +size are passed in and the cache is synchronously resized. This is expected to +be called from ``->setattr()`` inode operation under the inode lock. + + +Data I/O API +============ + +To do data I/O operations directly through a cookie, the following functions +are available:: + + int fscache_begin_read_operation(struct netfs_cache_resources *cres, + struct fscache_cookie *cookie); + int fscache_read(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + enum netfs_read_from_hole read_hole, + netfs_io_terminated_t term_func, + void *term_func_priv); + int fscache_write(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv); + +The *begin* function sets up an operation, attaching the resources required to +the cache resources block from the cookie. Assuming it doesn't return an error +(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do +nothing), then one of the other two functions can be issued. + +The *read* and *write* functions initiate a direct-IO operation. Both take the +previously set up cache resources block, an indication of the start file +position, and an I/O iterator that describes buffer and indicates the amount of +data. + +The read function also takes a parameter to indicate how it should handle a +partially populated region (a hole) in the disk content. This may be to ignore +it, skip over an initial hole and place zeros in the buffer or give an error. + +The read and write functions can be given an optional termination function that +will be run on completion:: + + typedef + void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, + bool was_async); + +If a termination function is given, the operation will be run asynchronously +and the termination function will be called upon completion. If not given, the +operation will be run synchronously. Note that in the asynchronous case, it is +possible for the operation to complete before the function returns. + +Both the read and write functions end the operation when they complete, +detaching any pinned resources. + +The read operation will fail with ESTALE if invalidation occurred whilst the +operation was ongoing. + + +Data File Coherency +=================== + +To request an update of the coherency data and file size on a cookie, the +following should be called:: + + void fscache_update_cookie(struct fscache_cookie *cookie, + const void *aux_data, + const loff_t *object_size); + +This will update the cookie's coherency data and/or file size. + + +Data File Invalidation +====================== + +Sometimes it will be necessary to invalidate an object that contains data. +Typically this will be necessary when the server informs the network filesystem +of a remote third-party change - at which point the filesystem has to throw +away the state and cached data that it had for an file and reload from the +server. + +To indicate that a cache object should be invalidated, the following should be +called:: + + void fscache_invalidate(struct fscache_cookie *cookie, + const void *aux_data, + loff_t size, + unsigned int flags); + +This increases the invalidation counter in the cookie to cause outstanding +reads to fail with -ESTALE, sets the coherency data and file size from the +information supplied, blocks new I/O on the cookie and dispatches the cache to +go and get rid of the old data. + +Invalidation runs asynchronously in a worker thread so that it doesn't block +too much. + + +Write-Back Resource Management +============================== + +To write data to the cache from network filesystem writeback, the cache +resources required need to be pinned at the point the modification is made (for +instance when the page is marked dirty) as it's not possible to open a file in +a thread that's exiting. + +The following facilities are provided to manage this: + + * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an + in-use is held on the cookie for this inode. It can only be changed if the + the inode lock is held. + + * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control`` + struct that gets set if ``__writeback_single_inode()`` clears + ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared. + +To support this, the following functions are provided:: + + bool fscache_dirty_folio(struct address_space *mapping, + struct folio *folio, + struct fscache_cookie *cookie); + void fscache_unpin_writeback(struct writeback_control *wbc, + struct fscache_cookie *cookie); + void fscache_clear_inode_writeback(struct fscache_cookie *cookie, + struct inode *inode, + const void *aux); + +The *set* function is intended to be called from the filesystem's +``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not +set, it sets that flag and increments the use count on the cookie (the caller +must already have called ``fscache_use_cookie()``). + +The *unpin* function is intended to be called from the filesystem's +``write_inode`` superblock operation. It cleans up after writing by unusing +the cookie if unpinned_fscache_wb is set in the writeback_control struct. + +The *clear* function is intended to be called from the netfs's ``evict_inode`` +superblock operation. It must be called *after* +``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans +up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to +be updated. + + +Caching of Local Modifications +============================== + +If a network filesystem has locally modified data that it wants to write to the +cache, it needs to mark the pages to indicate that a write is in progress, and +if the mark is already present, it needs to wait for it to be removed first +(presumably due to an already in-progress operation). This prevents multiple +competing DIO writes to the same storage in the cache. + +Firstly, the netfs should determine if caching is available by doing something +like:: + + bool caching = fscache_cookie_enabled(cookie); + +If caching is to be attempted, pages should be waited for and then marked using +the following functions provided by the netfs helper library:: + + void set_page_fscache(struct page *page); + void wait_on_page_fscache(struct page *page); + int wait_on_page_fscache_killable(struct page *page); + +Once all the pages in the span are marked, the netfs can ask fscache to +schedule a write of that region:: + + void fscache_write_to_cache(struct fscache_cookie *cookie, + struct address_space *mapping, + loff_t start, size_t len, loff_t i_size, + netfs_io_terminated_t term_func, + void *term_func_priv, + bool caching) + +And if an error occurs before that point is reached, the marks can be removed +by calling:: + + void fscache_clear_page_bits(struct address_space *mapping, + loff_t start, size_t len, + bool caching) + +In these functions, a pointer to the mapping to which the source pages are +attached is passed in and start and len indicate the size of the region that's +going to be written (it doesn't have to align to page boundaries necessarily, +but it does have to align to DIO boundaries on the backing filesystem). The +caching parameter indicates if caching should be skipped, and if false, the +functions do nothing. + +The write function takes some additional parameters: the cookie representing +the cache object to be written to, i_size indicates the size of the netfs file +and term_func indicates an optional completion function, to which +term_func_priv will be passed, along with the error or amount written. + +Note that the write function will always run asynchronously and will unmark all +the pages upon completion before calling term_func. + + +Page Release and Invalidation +============================= + +Fscache keeps track of whether we have any data in the cache yet for a cache +object we've just created. It knows it doesn't have to do any reading until it +has done a write and then the page it wrote from has been released by the VM, +after which it *has* to look in the cache. + +To inform fscache that a page might now be in the cache, the following function +should be called from the ``release_folio`` address space op:: + + void fscache_note_page_release(struct fscache_cookie *cookie); + +if the page has been released (ie. release_folio returned true). + +Page release and page invalidation should also wait for any mark left on the +page to say that a DIO write is underway from that page:: + + void wait_on_page_fscache(struct page *page); + int wait_on_page_fscache_killable(struct page *page); + + +API Function Reference +====================== + +.. kernel-doc:: include/linux/fscache.h |