diff options
Diffstat (limited to '')
-rw-r--r-- | doc/cephfs/app-best-practices.rst | 86 |
1 files changed, 86 insertions, 0 deletions
diff --git a/doc/cephfs/app-best-practices.rst b/doc/cephfs/app-best-practices.rst new file mode 100644 index 00000000..d916e184 --- /dev/null +++ b/doc/cephfs/app-best-practices.rst @@ -0,0 +1,86 @@ + +Application best practices for distributed filesystems +====================================================== + +CephFS is POSIX compatible, and therefore should work with any existing +applications that expect a POSIX filesystem. However, because it is a +network filesystem (unlike e.g. XFS) and it is highly consistent (unlike +e.g. NFS), there are some consequences that application authors may +benefit from knowing about. + +The following sections describe some areas where distributed filesystems +may have noticeably different performance behaviours compared with +local filesystems. + + +ls -l +----- + +When you run "ls -l", the ``ls`` program +is first doing a directory listing, and then calling ``stat`` on every +file in the directory. + +This is usually far in excess of what an application really needs, and +it can be slow for large directories. If you don't really need all +this metadata for each file, then use a plain ``ls``. + +ls/stat on files being extended +------------------------------- + +If another client is currently extending files in the listed directory, +then an ``ls -l`` may take an exceptionally long time to complete, as +the lister must wait for the writer to flush data in order to do a valid +read of the every file's size. So unless you *really* need to know the +exact size of every file in the directory, just don't do it! + +This would also apply to any application code that was directly +issuing ``stat`` system calls on files being appended from +another node. + +Very large directories +---------------------- + +Do you really need that 10,000,000 file directory? While directory +fragmentation enables CephFS to handle it, it is always going to be +less efficient than splitting your files into more modest-sized directories. + +Even standard userspace tools can become quite slow when operating on very +large directories. For example, the default behaviour of ``ls`` +is to give an alphabetically ordered result, but ``readdir`` system +calls do not give an ordered result (this is true in general, not just +with CephFS). So when you ``ls`` on a million file directory, it is +loading a list of a million names into memory, sorting the list, then writing +it out to the display. + +Hard links +---------- + +Hard links have an intrinsic cost in terms of the internal housekeeping +that a filesystem has to do to keep two references to the same data. In +CephFS there is a particular performance cost, because with normal files +the inode is embedded in the directory (i.e. there is no extra fetch of +the inode after looking up the path). + +Working set size +---------------- + +The MDS acts as a cache for the metadata stored in RADOS. Metadata +performance is very different for workloads whose metadata fits within +that cache. + +If your workload has more files than fit in your cache (configured using +``mds_cache_memory_limit`` or ``mds_cache_size`` settings), then +make sure you test it appropriately: don't test your system with a small +number of files and then expect equivalent performance when you move +to a much larger number of files. + +Do you need a filesystem? +------------------------- + +Remember that Ceph also includes an object storage interface. If your +application needs to store huge flat collections of files where you just +read and write whole files at once, then you might well be better off +using the :ref:`Object Gateway <object-gateway>` + + + |