summaryrefslogtreecommitdiffstats
path: root/man7/capabilities.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/capabilities.7')
-rw-r--r--man7/capabilities.71872
1 files changed, 1872 insertions, 0 deletions
diff --git a/man7/capabilities.7 b/man7/capabilities.7
new file mode 100644
index 0000000..c8766d2
--- /dev/null
+++ b/man7/capabilities.7
@@ -0,0 +1,1872 @@
+.\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.\" 6 Aug 2002 - Initial Creation
+.\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER
+.\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
+.\" 2008-07-15, Serge Hallyn <serue@us.bbm.com>
+.\" Document file capabilities, per-process capability
+.\" bounding set, changed semantics for CAP_SETPCAP,
+.\" and other changes in Linux 2.6.2[45].
+.\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP.
+.\" 2008-07-15, mtk
+.\" Add text describing circumstances in which CAP_SETPCAP
+.\" (theoretically) permits a thread to change the
+.\" capability sets of another thread.
+.\" Add section describing rules for programmatically
+.\" adjusting thread capability sets.
+.\" Describe rationale for capability bounding set.
+.\" Document "securebits" flags.
+.\" Add text noting that if we set the effective flag for one file
+.\" capability, then we must also set the effective flag for all
+.\" other capabilities where the permitted or inheritable bit is set.
+.\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG
+.\"
+.TH Capabilities 7 2023-05-03 "Linux man-pages 6.05.01"
+.SH NAME
+capabilities \- overview of Linux capabilities
+.SH DESCRIPTION
+For the purpose of performing permission checks,
+traditional UNIX implementations distinguish two categories of processes:
+.I privileged
+processes (whose effective user ID is 0, referred to as superuser or root),
+and
+.I unprivileged
+processes (whose effective UID is nonzero).
+Privileged processes bypass all kernel permission checks,
+while unprivileged processes are subject to full permission
+checking based on the process's credentials
+(usually: effective UID, effective GID, and supplementary group list).
+.PP
+Starting with Linux 2.2, Linux divides the privileges traditionally
+associated with superuser into distinct units, known as
+.IR capabilities ,
+which can be independently enabled and disabled.
+Capabilities are a per-thread attribute.
+.\"
+.SS Capabilities list
+The following list shows the capabilities implemented on Linux,
+and the operations or behaviors that each capability permits:
+.TP
+.BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)"
+Enable and disable kernel auditing; change auditing filter rules;
+retrieve auditing status and filtering rules.
+.TP
+.BR CAP_AUDIT_READ " (since Linux 3.16)"
+.\" commit a29b694aa1739f9d76538e34ae25524f9c549d59
+.\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48
+Allow reading the audit log via a multicast netlink socket.
+.TP
+.BR CAP_AUDIT_WRITE " (since Linux 2.6.11)"
+Write records to kernel auditing log.
+.\" FIXME Add FAN_ENABLE_AUDIT
+.TP
+.BR CAP_BLOCK_SUSPEND " (since Linux 3.5)"
+Employ features that can block system suspend
+.RB ( epoll (7)
+.BR EPOLLWAKEUP ,
+.IR /proc/sys/wake_lock ).
+.TP
+.BR CAP_BPF " (since Linux 5.8)"
+Employ privileged BPF operations; see
+.BR bpf (2)
+and
+.BR bpf\-helpers (7).
+.IP
+This capability was added in Linux 5.8 to separate out
+BPF functionality from the overloaded
+.B CAP_SYS_ADMIN
+capability.
+.TP
+.BR CAP_CHECKPOINT_RESTORE " (since Linux 5.9)"
+.\" commit 124ea650d3072b005457faed69909221c2905a1f
+.PD 0
+.RS
+.IP \[bu] 3
+Update
+.I /proc/sys/kernel/ns_last_pid
+(see
+.BR pid_namespaces (7));
+.IP \[bu]
+employ the
+.I set_tid
+feature of
+.BR clone3 (2);
+.\" FIXME There is also some use case relating to
+.\" prctl_set_mm_exe_file(); in the 5.9 sources, see
+.\" prctl_set_mm_map().
+.IP \[bu]
+read the contents of the symbolic links in
+.IR /proc/ pid /map_files
+for other processes.
+.RE
+.PD
+.IP
+This capability was added in Linux 5.9 to separate out
+checkpoint/restore functionality from the overloaded
+.B CAP_SYS_ADMIN
+capability.
+.TP
+.B CAP_CHOWN
+Make arbitrary changes to file UIDs and GIDs (see
+.BR chown (2)).
+.TP
+.B CAP_DAC_OVERRIDE
+Bypass file read, write, and execute permission checks.
+(DAC is an abbreviation of "discretionary access control".)
+.TP
+.B CAP_DAC_READ_SEARCH
+.PD 0
+.RS
+.IP \[bu] 3
+Bypass file read permission checks and
+directory read and execute permission checks;
+.IP \[bu]
+invoke
+.BR open_by_handle_at (2);
+.IP \[bu]
+use the
+.BR linkat (2)
+.B AT_EMPTY_PATH
+flag to create a link to a file referred to by a file descriptor.
+.RE
+.PD
+.TP
+.B CAP_FOWNER
+.PD 0
+.RS
+.IP \[bu] 3
+Bypass permission checks on operations that normally
+require the filesystem UID of the process to match the UID of
+the file (e.g.,
+.BR chmod (2),
+.BR utime (2)),
+excluding those operations covered by
+.B CAP_DAC_OVERRIDE
+and
+.BR CAP_DAC_READ_SEARCH ;
+.IP \[bu]
+set inode flags (see
+.BR ioctl_iflags (2))
+on arbitrary files;
+.IP \[bu]
+set Access Control Lists (ACLs) on arbitrary files;
+.IP \[bu]
+ignore directory sticky bit on file deletion;
+.IP \[bu]
+modify
+.I user
+extended attributes on sticky directory owned by any user;
+.IP \[bu]
+specify
+.B O_NOATIME
+for arbitrary files in
+.BR open (2)
+and
+.BR fcntl (2).
+.RE
+.PD
+.TP
+.B CAP_FSETID
+.PD 0
+.RS
+.IP \[bu] 3
+Don't clear set-user-ID and set-group-ID mode
+bits when a file is modified;
+.IP \[bu]
+set the set-group-ID bit for a file whose GID does not match
+the filesystem or any of the supplementary GIDs of the calling process.
+.RE
+.PD
+.TP
+.B CAP_IPC_LOCK
+.\" FIXME . As at Linux 3.2, there are some strange uses of this capability
+.\" in other places; they probably should be replaced with something else.
+.PD 0
+.RS
+.IP \[bu] 3
+Lock memory
+.RB ( mlock (2),
+.BR mlockall (2),
+.BR mmap (2),
+.BR shmctl (2));
+.IP \[bu]
+Allocate memory using huge pages
+.RB ( memfd_create (2),
+.BR mmap (2),
+.BR shmctl (2)).
+.RE
+.PD
+.TP
+.B CAP_IPC_OWNER
+Bypass permission checks for operations on System V IPC objects.
+.TP
+.B CAP_KILL
+Bypass permission checks for sending signals (see
+.BR kill (2)).
+This includes use of the
+.BR ioctl (2)
+.B KDSIGACCEPT
+operation.
+.\" FIXME . CAP_KILL also has an effect for threads + setting child
+.\" termination signal to other than SIGCHLD: without this
+.\" capability, the termination signal reverts to SIGCHLD
+.\" if the child does an exec(). What is the rationale
+.\" for this?
+.TP
+.BR CAP_LEASE " (since Linux 2.4)"
+Establish leases on arbitrary files (see
+.BR fcntl (2)).
+.TP
+.B CAP_LINUX_IMMUTABLE
+Set the
+.B FS_APPEND_FL
+and
+.B FS_IMMUTABLE_FL
+inode flags (see
+.BR ioctl_iflags (2)).
+.TP
+.BR CAP_MAC_ADMIN " (since Linux 2.6.25)"
+Allow MAC configuration or state changes.
+Implemented for the Smack Linux Security Module (LSM).
+.TP
+.BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)"
+Override Mandatory Access Control (MAC).
+Implemented for the Smack LSM.
+.TP
+.BR CAP_MKNOD " (since Linux 2.4)"
+Create special files using
+.BR mknod (2).
+.TP
+.B CAP_NET_ADMIN
+Perform various network-related operations:
+.PD 0
+.RS
+.IP \[bu] 3
+interface configuration;
+.IP \[bu]
+administration of IP firewall, masquerading, and accounting;
+.IP \[bu]
+modify routing tables;
+.IP \[bu]
+bind to any address for transparent proxying;
+.IP \[bu]
+set type-of-service (TOS);
+.IP \[bu]
+clear driver statistics;
+.IP \[bu]
+set promiscuous mode;
+.IP \[bu]
+enabling multicasting;
+.IP \[bu]
+use
+.BR setsockopt (2)
+to set the following socket options:
+.BR SO_DEBUG ,
+.BR SO_MARK ,
+.B SO_PRIORITY
+(for a priority outside the range 0 to 6),
+.BR SO_RCVBUFFORCE ,
+and
+.BR SO_SNDBUFFORCE .
+.RE
+.PD
+.TP
+.B CAP_NET_BIND_SERVICE
+Bind a socket to Internet domain privileged ports
+(port numbers less than 1024).
+.TP
+.B CAP_NET_BROADCAST
+(Unused) Make socket broadcasts, and listen to multicasts.
+.\" FIXME Since Linux 4.2, there are use cases for netlink sockets
+.\" commit 59324cf35aba5336b611074028777838a963d03b
+.TP
+.B CAP_NET_RAW
+.PD 0
+.RS
+.IP \[bu] 3
+Use RAW and PACKET sockets;
+.IP \[bu]
+bind to any address for transparent proxying.
+.RE
+.PD
+.\" Also various IP options and setsockopt(SO_BINDTODEVICE)
+.TP
+.BR CAP_PERFMON " (since Linux 5.8)"
+Employ various performance-monitoring mechanisms, including:
+.RS
+.IP \[bu] 3
+.PD 0
+call
+.BR perf_event_open (2);
+.IP \[bu]
+employ various BPF operations that have performance implications.
+.RE
+.PD
+.IP
+This capability was added in Linux 5.8 to separate out
+performance monitoring functionality from the overloaded
+.B CAP_SYS_ADMIN
+capability.
+See also the kernel source file
+.IR Documentation/admin\-guide/perf\-security.rst .
+.TP
+.B CAP_SETGID
+.RS
+.PD 0
+.IP \[bu] 3
+Make arbitrary manipulations of process GIDs and supplementary GID list;
+.IP \[bu]
+forge GID when passing socket credentials via UNIX domain sockets;
+.IP \[bu]
+write a group ID mapping in a user namespace (see
+.BR user_namespaces (7)).
+.PD
+.RE
+.TP
+.BR CAP_SETFCAP " (since Linux 2.6.24)"
+Set arbitrary capabilities on a file.
+.IP
+.\" commit db2e718a47984b9d71ed890eb2ea36ecf150de18
+Since Linux 5.12, this capability is
+also needed to map user ID 0 in a new user namespace; see
+.BR user_namespaces (7)
+for details.
+.TP
+.B CAP_SETPCAP
+If file capabilities are supported (i.e., since Linux 2.6.24):
+add any capability from the calling thread's bounding set
+to its inheritable set;
+drop capabilities from the bounding set (via
+.BR prctl (2)
+.BR PR_CAPBSET_DROP );
+make changes to the
+.I securebits
+flags.
+.IP
+If file capabilities are not supported (i.e., before Linux 2.6.24):
+grant or remove any capability in the
+caller's permitted capability set to or from any other process.
+(This property of
+.B CAP_SETPCAP
+is not available when the kernel is configured to support
+file capabilities, since
+.B CAP_SETPCAP
+has entirely different semantics for such kernels.)
+.TP
+.B CAP_SETUID
+.RS
+.PD 0
+.IP \[bu] 3
+Make arbitrary manipulations of process UIDs
+.RB ( setuid (2),
+.BR setreuid (2),
+.BR setresuid (2),
+.BR setfsuid (2));
+.IP \[bu]
+forge UID when passing socket credentials via UNIX domain sockets;
+.IP \[bu]
+write a user ID mapping in a user namespace (see
+.BR user_namespaces (7)).
+.PD
+.RE
+.\" FIXME CAP_SETUID also an effect in exec(); document this.
+.TP
+.B CAP_SYS_ADMIN
+.IR Note :
+this capability is overloaded; see
+.I Notes to kernel developers
+below.
+.IP
+.PD 0
+.RS
+.IP \[bu] 3
+Perform a range of system administration operations including:
+.BR quotactl (2),
+.BR mount (2),
+.BR umount (2),
+.BR pivot_root (2),
+.BR swapon (2),
+.BR swapoff (2),
+.BR sethostname (2),
+and
+.BR setdomainname (2);
+.IP \[bu]
+perform privileged
+.BR syslog (2)
+operations (since Linux 2.6.37,
+.B CAP_SYSLOG
+should be used to permit such operations);
+.IP \[bu]
+perform
+.B VM86_REQUEST_IRQ
+.BR vm86 (2)
+command;
+.IP \[bu]
+access the same checkpoint/restore functionality that is governed by
+.B CAP_CHECKPOINT_RESTORE
+(but the latter, weaker capability is preferred for accessing
+that functionality).
+.IP \[bu]
+perform the same BPF operations as are governed by
+.B CAP_BPF
+(but the latter, weaker capability is preferred for accessing
+that functionality).
+.IP \[bu]
+employ the same performance monitoring mechanisms as are governed by
+.B CAP_PERFMON
+(but the latter, weaker capability is preferred for accessing
+that functionality).
+.IP \[bu]
+perform
+.B IPC_SET
+and
+.B IPC_RMID
+operations on arbitrary System V IPC objects;
+.IP \[bu]
+override
+.B RLIMIT_NPROC
+resource limit;
+.IP \[bu]
+perform operations on
+.I trusted
+and
+.I security
+extended attributes (see
+.BR xattr (7));
+.IP \[bu]
+use
+.BR lookup_dcookie (2);
+.IP \[bu]
+use
+.BR ioprio_set (2)
+to assign
+.B IOPRIO_CLASS_RT
+and (before Linux 2.6.25)
+.B IOPRIO_CLASS_IDLE
+I/O scheduling classes;
+.IP \[bu]
+forge PID when passing socket credentials via UNIX domain sockets;
+.IP \[bu]
+exceed
+.IR /proc/sys/fs/file\-max ,
+the system-wide limit on the number of open files,
+in system calls that open files (e.g.,
+.BR accept (2),
+.BR execve (2),
+.BR open (2),
+.BR pipe (2));
+.IP \[bu]
+employ
+.B CLONE_*
+flags that create new namespaces with
+.BR clone (2)
+and
+.BR unshare (2)
+(but, since Linux 3.8,
+creating user namespaces does not require any capability);
+.IP \[bu]
+access privileged
+.I perf
+event information;
+.IP \[bu]
+call
+.BR setns (2)
+(requires
+.B CAP_SYS_ADMIN
+in the
+.I target
+namespace);
+.IP \[bu]
+call
+.BR fanotify_init (2);
+.IP \[bu]
+perform privileged
+.B KEYCTL_CHOWN
+and
+.B KEYCTL_SETPERM
+.BR keyctl (2)
+operations;
+.IP \[bu]
+perform
+.BR madvise (2)
+.B MADV_HWPOISON
+operation;
+.IP \[bu]
+employ the
+.B TIOCSTI
+.BR ioctl (2)
+to insert characters into the input queue of a terminal other than
+the caller's controlling terminal;
+.IP \[bu]
+employ the obsolete
+.BR nfsservctl (2)
+system call;
+.IP \[bu]
+employ the obsolete
+.BR bdflush (2)
+system call;
+.IP \[bu]
+perform various privileged block-device
+.BR ioctl (2)
+operations;
+.IP \[bu]
+perform various privileged filesystem
+.BR ioctl (2)
+operations;
+.IP \[bu]
+perform privileged
+.BR ioctl (2)
+operations on the
+.I /dev/random
+device (see
+.BR random (4));
+.IP \[bu]
+install a
+.BR seccomp (2)
+filter without first having to set the
+.I no_new_privs
+thread attribute;
+.IP \[bu]
+modify allow/deny rules for device control groups;
+.IP \[bu]
+employ the
+.BR ptrace (2)
+.B PTRACE_SECCOMP_GET_FILTER
+operation to dump tracee's seccomp filters;
+.IP \[bu]
+employ the
+.BR ptrace (2)
+.B PTRACE_SETOPTIONS
+operation to suspend the tracee's seccomp protections (i.e., the
+.B PTRACE_O_SUSPEND_SECCOMP
+flag);
+.IP \[bu]
+perform administrative operations on many device drivers;
+.IP \[bu]
+modify autogroup nice values by writing to
+.IR /proc/ pid /autogroup
+(see
+.BR sched (7)).
+.RE
+.PD
+.TP
+.B CAP_SYS_BOOT
+Use
+.BR reboot (2)
+and
+.BR kexec_load (2).
+.TP
+.B CAP_SYS_CHROOT
+.RS
+.PD 0
+.IP \[bu] 3
+Use
+.BR chroot (2);
+.IP \[bu]
+change mount namespaces using
+.BR setns (2).
+.PD
+.RE
+.TP
+.B CAP_SYS_MODULE
+.RS
+.PD 0
+.IP \[bu] 3
+Load and unload kernel modules
+(see
+.BR init_module (2)
+and
+.BR delete_module (2));
+.IP \[bu]
+before Linux 2.6.25:
+drop capabilities from the system-wide capability bounding set.
+.PD
+.RE
+.TP
+.B CAP_SYS_NICE
+.PD 0
+.RS
+.IP \[bu] 3
+Lower the process nice value
+.RB ( nice (2),
+.BR setpriority (2))
+and change the nice value for arbitrary processes;
+.IP \[bu]
+set real-time scheduling policies for calling process,
+and set scheduling policies and priorities for arbitrary processes
+.RB ( sched_setscheduler (2),
+.BR sched_setparam (2),
+.BR sched_setattr (2));
+.IP \[bu]
+set CPU affinity for arbitrary processes
+.RB ( sched_setaffinity (2));
+.IP \[bu]
+set I/O scheduling class and priority for arbitrary processes
+.RB ( ioprio_set (2));
+.IP \[bu]
+apply
+.BR migrate_pages (2)
+to arbitrary processes and allow processes
+to be migrated to arbitrary nodes;
+.\" FIXME CAP_SYS_NICE also has the following effect for
+.\" migrate_pages(2):
+.\" do_migrate_pages(mm, &old, &new,
+.\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+.\"
+.\" Document this.
+.IP \[bu]
+apply
+.BR move_pages (2)
+to arbitrary processes;
+.IP \[bu]
+use the
+.B MPOL_MF_MOVE_ALL
+flag with
+.BR mbind (2)
+and
+.BR move_pages (2).
+.RE
+.PD
+.TP
+.B CAP_SYS_PACCT
+Use
+.BR acct (2).
+.TP
+.B CAP_SYS_PTRACE
+.PD 0
+.RS
+.IP \[bu] 3
+Trace arbitrary processes using
+.BR ptrace (2);
+.IP \[bu]
+apply
+.BR get_robust_list (2)
+to arbitrary processes;
+.IP \[bu]
+transfer data to or from the memory of arbitrary processes using
+.BR process_vm_readv (2)
+and
+.BR process_vm_writev (2);
+.IP \[bu]
+inspect processes using
+.BR kcmp (2).
+.RE
+.PD
+.TP
+.B CAP_SYS_RAWIO
+.PD 0
+.RS
+.IP \[bu] 3
+Perform I/O port operations
+.RB ( iopl (2)
+and
+.BR ioperm (2));
+.IP \[bu]
+access
+.IR /proc/kcore ;
+.IP \[bu]
+employ the
+.B FIBMAP
+.BR ioctl (2)
+operation;
+.IP \[bu]
+open devices for accessing x86 model-specific registers (MSRs, see
+.BR msr (4));
+.IP \[bu]
+update
+.IR /proc/sys/vm/mmap_min_addr ;
+.IP \[bu]
+create memory mappings at addresses below the value specified by
+.IR /proc/sys/vm/mmap_min_addr ;
+.IP \[bu]
+map files in
+.IR /proc/bus/pci ;
+.IP \[bu]
+open
+.I /dev/mem
+and
+.IR /dev/kmem ;
+.IP \[bu]
+perform various SCSI device commands;
+.IP \[bu]
+perform certain operations on
+.BR hpsa (4)
+and
+.BR cciss (4)
+devices;
+.IP \[bu]
+perform a range of device-specific operations on other devices.
+.RE
+.PD
+.TP
+.B CAP_SYS_RESOURCE
+.PD 0
+.RS
+.IP \[bu] 3
+Use reserved space on ext2 filesystems;
+.IP \[bu]
+make
+.BR ioctl (2)
+calls controlling ext3 journaling;
+.IP \[bu]
+override disk quota limits;
+.IP \[bu]
+increase resource limits (see
+.BR setrlimit (2));
+.IP \[bu]
+override
+.B RLIMIT_NPROC
+resource limit;
+.IP \[bu]
+override maximum number of consoles on console allocation;
+.IP \[bu]
+override maximum number of keymaps;
+.IP \[bu]
+allow more than 64hz interrupts from the real-time clock;
+.IP \[bu]
+raise
+.I msg_qbytes
+limit for a System V message queue above the limit in
+.I /proc/sys/kernel/msgmnb
+(see
+.BR msgop (2)
+and
+.BR msgctl (2));
+.IP \[bu]
+allow the
+.B RLIMIT_NOFILE
+resource limit on the number of "in-flight" file descriptors
+to be bypassed when passing file descriptors to another process
+via a UNIX domain socket (see
+.BR unix (7));
+.IP \[bu]
+override the
+.I /proc/sys/fs/pipe\-size\-max
+limit when setting the capacity of a pipe using the
+.B F_SETPIPE_SZ
+.BR fcntl (2)
+command;
+.IP \[bu]
+use
+.B F_SETPIPE_SZ
+to increase the capacity of a pipe above the limit specified by
+.IR /proc/sys/fs/pipe\-max\-size ;
+.IP \[bu]
+override
+.IR /proc/sys/fs/mqueue/queues_max ,
+.IR /proc/sys/fs/mqueue/msg_max ,
+and
+.I /proc/sys/fs/mqueue/msgsize_max
+limits when creating POSIX message queues (see
+.BR mq_overview (7));
+.IP \[bu]
+employ the
+.BR prctl (2)
+.B PR_SET_MM
+operation;
+.IP \[bu]
+set
+.IR /proc/ pid /oom_score_adj
+to a value lower than the value last set by a process with
+.BR CAP_SYS_RESOURCE .
+.RE
+.PD
+.TP
+.B CAP_SYS_TIME
+Set system clock
+.RB ( settimeofday (2),
+.BR stime (2),
+.BR adjtimex (2));
+set real-time (hardware) clock.
+.TP
+.B CAP_SYS_TTY_CONFIG
+Use
+.BR vhangup (2);
+employ various privileged
+.BR ioctl (2)
+operations on virtual terminals.
+.TP
+.BR CAP_SYSLOG " (since Linux 2.6.37)"
+.RS
+.PD 0
+.IP \[bu] 3
+Perform privileged
+.BR syslog (2)
+operations.
+See
+.BR syslog (2)
+for information on which operations require privilege.
+.IP \[bu]
+View kernel addresses exposed via
+.I /proc
+and other interfaces when
+.I /proc/sys/kernel/kptr_restrict
+has the value 1.
+(See the discussion of the
+.I kptr_restrict
+in
+.BR proc (5).)
+.PD
+.RE
+.TP
+.BR CAP_WAKE_ALARM " (since Linux 3.0)"
+Trigger something that will wake up the system (set
+.B CLOCK_REALTIME_ALARM
+and
+.B CLOCK_BOOTTIME_ALARM
+timers).
+.\"
+.SS Past and current implementation
+A full implementation of capabilities requires that:
+.IP \[bu] 3
+For all privileged operations,
+the kernel must check whether the thread has the required
+capability in its effective set.
+.IP \[bu]
+The kernel must provide system calls allowing a thread's capability sets to
+be changed and retrieved.
+.IP \[bu]
+The filesystem must support attaching capabilities to an executable file,
+so that a process gains those capabilities when the file is executed.
+.PP
+Before Linux 2.6.24, only the first two of these requirements are met;
+since Linux 2.6.24, all three requirements are met.
+.\"
+.SS Notes to kernel developers
+When adding a new kernel feature that should be governed by a capability,
+consider the following points.
+.IP \[bu] 3
+The goal of capabilities is divide the power of superuser into pieces,
+such that if a program that has one or more capabilities is compromised,
+its power to do damage to the system would be less than the same program
+running with root privilege.
+.IP \[bu]
+You have the choice of either creating a new capability for your new feature,
+or associating the feature with one of the existing capabilities.
+In order to keep the set of capabilities to a manageable size,
+the latter option is preferable,
+unless there are compelling reasons to take the former option.
+(There is also a technical limit:
+the size of capability sets is currently limited to 64 bits.)
+.IP \[bu]
+To determine which existing capability might best be associated
+with your new feature, review the list of capabilities above in order
+to find a "silo" into which your new feature best fits.
+One approach to take is to determine if there are other features
+requiring capabilities that will always be used along with the new feature.
+If the new feature is useless without these other features,
+you should use the same capability as the other features.
+.IP \[bu]
+.I Don't
+choose
+.B CAP_SYS_ADMIN
+if you can possibly avoid it!
+A vast proportion of existing capability checks are associated
+with this capability (see the partial list above).
+It can plausibly be called "the new root",
+since on the one hand, it confers a wide range of powers,
+and on the other hand,
+its broad scope means that this is the capability
+that is required by many privileged programs.
+Don't make the problem worse.
+The only new features that should be associated with
+.B CAP_SYS_ADMIN
+are ones that
+.I closely
+match existing uses in that silo.
+.IP \[bu]
+If you have determined that it really is necessary to create
+a new capability for your feature,
+don't make or name it as a "single-use" capability.
+Thus, for example, the addition of the highly specific
+.B CAP_SYS_PACCT
+was probably a mistake.
+Instead, try to identify and name your new capability as a broader
+silo into which other related future use cases might fit.
+.\"
+.SS Thread capability sets
+Each thread has the following capability sets containing zero or more
+of the above capabilities:
+.TP
+.I Permitted
+This is a limiting superset for the effective
+capabilities that the thread may assume.
+It is also a limiting superset for the capabilities that
+may be added to the inheritable set by a thread that does not have the
+.B CAP_SETPCAP
+capability in its effective set.
+.IP
+If a thread drops a capability from its permitted set,
+it can never reacquire that capability (unless it
+.BR execve (2)s
+either a set-user-ID-root program, or
+a program whose associated file capabilities grant that capability).
+.TP
+.I Inheritable
+This is a set of capabilities preserved across an
+.BR execve (2).
+Inheritable capabilities remain inheritable when executing any program,
+and inheritable capabilities are added to the permitted set when executing
+a program that has the corresponding bits set in the file inheritable set.
+.IP
+Because inheritable capabilities are not generally preserved across
+.BR execve (2)
+when running as a non-root user, applications that wish to run helper
+programs with elevated capabilities should consider using
+ambient capabilities, described below.
+.TP
+.I Effective
+This is the set of capabilities used by the kernel to
+perform permission checks for the thread.
+.TP
+.IR Bounding " (per-thread since Linux 2.6.25)"
+The capability bounding set is a mechanism that can be used
+to limit the capabilities that are gained during
+.BR execve (2).
+.IP
+Since Linux 2.6.25, this is a per-thread capability set.
+In older kernels, the capability bounding set was a system wide attribute
+shared by all threads on the system.
+.IP
+For more details, see
+.I Capability bounding set
+below.
+.TP
+.IR Ambient " (since Linux 4.3)"
+.\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08
+This is a set of capabilities that are preserved across an
+.BR execve (2)
+of a program that is not privileged.
+The ambient capability set obeys the invariant that no capability
+can ever be ambient if it is not both permitted and inheritable.
+.IP
+The ambient capability set can be directly modified using
+.BR prctl (2).
+Ambient capabilities are automatically lowered if either of
+the corresponding permitted or inheritable capabilities is lowered.
+.IP
+Executing a program that changes UID or GID due to the
+set-user-ID or set-group-ID bits or executing a program that has
+any file capabilities set will clear the ambient set.
+Ambient capabilities are added to the permitted set and
+assigned to the effective set when
+.BR execve (2)
+is called.
+If ambient capabilities cause a process's permitted and effective
+capabilities to increase during an
+.BR execve (2),
+this does not trigger the secure-execution mode described in
+.BR ld.so (8).
+.PP
+A child created via
+.BR fork (2)
+inherits copies of its parent's capability sets.
+For details on how
+.BR execve (2)
+affects capabilities, see
+.I Transformation of capabilities during execve()
+below.
+.PP
+Using
+.BR capset (2),
+a thread may manipulate its own capability sets; see
+.I Programmatically adjusting capability sets
+below.
+.PP
+Since Linux 3.2, the file
+.I /proc/sys/kernel/cap_last_cap
+.\" commit 73efc0394e148d0e15583e13712637831f926720
+exposes the numerical value of the highest capability
+supported by the running kernel;
+this can be used to determine the highest bit
+that may be set in a capability set.
+.\"
+.SS File capabilities
+Since Linux 2.6.24, the kernel supports
+associating capability sets with an executable file using
+.BR setcap (8).
+The file capability sets are stored in an extended attribute (see
+.BR setxattr (2)
+and
+.BR xattr (7))
+named
+.IR "security.capability" .
+Writing to this extended attribute requires the
+.B CAP_SETFCAP
+capability.
+The file capability sets,
+in conjunction with the capability sets of the thread,
+determine the capabilities of a thread after an
+.BR execve (2).
+.PP
+The three file capability sets are:
+.TP
+.IR Permitted " (formerly known as " forced ):
+These capabilities are automatically permitted to the thread,
+regardless of the thread's inheritable capabilities.
+.TP
+.IR Inheritable " (formerly known as " allowed ):
+This set is ANDed with the thread's inheritable set to determine which
+inheritable capabilities are enabled in the permitted set of
+the thread after the
+.BR execve (2).
+.TP
+.IR Effective :
+This is not a set, but rather just a single bit.
+If this bit is set, then during an
+.BR execve (2)
+all of the new permitted capabilities for the thread are
+also raised in the effective set.
+If this bit is not set, then after an
+.BR execve (2),
+none of the new permitted capabilities is in the new effective set.
+.IP
+Enabling the file effective capability bit implies
+that any file permitted or inheritable capability that causes a
+thread to acquire the corresponding permitted capability during an
+.BR execve (2)
+(see
+.I Transformation of capabilities during execve()
+below) will also acquire that
+capability in its effective set.
+Therefore, when assigning capabilities to a file
+.RB ( setcap (8),
+.BR cap_set_file (3),
+.BR cap_set_fd (3)),
+if we specify the effective flag as being enabled for any capability,
+then the effective flag must also be specified as enabled
+for all other capabilities for which the corresponding permitted or
+inheritable flag is enabled.
+.\"
+.SS File capability extended attribute versioning
+To allow extensibility,
+the kernel supports a scheme to encode a version number inside the
+.I security.capability
+extended attribute that is used to implement file capabilities.
+These version numbers are internal to the implementation,
+and not directly visible to user-space applications.
+To date, the following versions are supported:
+.TP
+.B VFS_CAP_REVISION_1
+This was the original file capability implementation,
+which supported 32-bit masks for file capabilities.
+.TP
+.BR VFS_CAP_REVISION_2 " (since Linux 2.6.25)"
+.\" commit e338d263a76af78fe8f38a72131188b58fceb591
+This version allows for file capability masks that are 64 bits in size,
+and was necessary as the number of supported capabilities grew beyond 32.
+The kernel transparently continues to support the execution of files
+that have 32-bit version 1 capability masks,
+but when adding capabilities to files that did not previously
+have capabilities, or modifying the capabilities of existing files,
+it automatically uses the version 2 scheme
+(or possibly the version 3 scheme, as described below).
+.TP
+.BR VFS_CAP_REVISION_3 " (since Linux 4.14)"
+.\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340
+Version 3 file capabilities are provided
+to support namespaced file capabilities (described below).
+.IP
+As with version 2 file capabilities,
+version 3 capability masks are 64 bits in size.
+But in addition, the root user ID of namespace is encoded in the
+.I security.capability
+extended attribute.
+(A namespace's root user ID is the value that user ID 0
+inside that namespace maps to in the initial user namespace.)
+.IP
+Version 3 file capabilities are designed to coexist
+with version 2 capabilities;
+that is, on a modern Linux system,
+there may be some files with version 2 capabilities
+while others have version 3 capabilities.
+.PP
+Before Linux 4.14,
+the only kind of file capability extended attribute
+that could be attached to a file was a
+.B VFS_CAP_REVISION_2
+attribute.
+Since Linux 4.14,
+the version of the
+.I security.capability
+extended attribute that is attached to a file
+depends on the circumstances in which the attribute was created.
+.PP
+Starting with Linux 4.14, a
+.I security.capability
+extended attribute is automatically created as (or converted to)
+a version 3
+.RB ( VFS_CAP_REVISION_3 )
+attribute if both of the following are true:
+.IP \[bu] 3
+The thread writing the attribute resides in a noninitial user namespace.
+(More precisely: the thread resides in a user namespace other
+than the one from which the underlying filesystem was mounted.)
+.IP \[bu]
+The thread has the
+.B CAP_SETFCAP
+capability over the file inode,
+meaning that (a) the thread has the
+.B CAP_SETFCAP
+capability in its own user namespace;
+and (b) the UID and GID of the file inode have mappings in
+the writer's user namespace.
+.PP
+When a
+.B VFS_CAP_REVISION_3
+.I security.capability
+extended attribute is created, the root user ID of the creating thread's
+user namespace is saved in the extended attribute.
+.PP
+By contrast, creating or modifying a
+.I security.capability
+extended attribute from a privileged
+.RB ( CAP_SETFCAP )
+thread that resides in the
+namespace where the underlying filesystem was mounted
+(this normally means the initial user namespace)
+automatically results in the creation of a version 2
+.RB ( VFS_CAP_REVISION_2 )
+attribute.
+.PP
+Note that the creation of a version 3
+.I security.capability
+extended attribute is automatic.
+That is to say, when a user-space application writes
+.RB ( setxattr (2))
+a
+.I security.capability
+attribute in the version 2 format,
+the kernel will automatically create a version 3 attribute
+if the attribute is created in the circumstances described above.
+Correspondingly, when a version 3
+.I security.capability
+attribute is retrieved
+.RB ( getxattr (2))
+by a process that resides inside a user namespace that was created by the
+root user ID (or a descendant of that user namespace),
+the returned attribute is (automatically)
+simplified to appear as a version 2 attribute
+(i.e., the returned value is the size of a version 2 attribute and does
+not include the root user ID).
+These automatic translations mean that no changes are required to
+user-space tools (e.g.,
+.BR setcap (1)
+and
+.BR getcap (1))
+in order for those tools to be used to create and retrieve version 3
+.I security.capability
+attributes.
+.PP
+Note that a file can have either a version 2 or a version 3
+.I security.capability
+extended attribute associated with it, but not both:
+creation or modification of the
+.I security.capability
+extended attribute will automatically modify the version
+according to the circumstances in which the extended attribute is
+created or modified.
+.\"
+.SS Transformation of capabilities during execve()
+During an
+.BR execve (2),
+the kernel calculates the new capabilities of
+the process using the following algorithm:
+.PP
+.in +4n
+.EX
+P'(ambient) = (file is privileged) ? 0 : P(ambient)
+\&
+P'(permitted) = (P(inheritable) & F(inheritable)) |
+ (F(permitted) & P(bounding)) | P'(ambient)
+\&
+P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
+\&
+P'(inheritable) = P(inheritable) [i.e., unchanged]
+\&
+P'(bounding) = P(bounding) [i.e., unchanged]
+.EE
+.in
+.PP
+where:
+.RS 4
+.TP
+P()
+denotes the value of a thread capability set before the
+.BR execve (2)
+.TP
+P'()
+denotes the value of a thread capability set after the
+.BR execve (2)
+.TP
+F()
+denotes a file capability set
+.RE
+.PP
+Note the following details relating to the above capability
+transformation rules:
+.IP \[bu] 3
+The ambient capability set is present only since Linux 4.3.
+When determining the transformation of the ambient set during
+.BR execve (2),
+a privileged file is one that has capabilities or
+has the set-user-ID or set-group-ID bit set.
+.IP \[bu]
+Prior to Linux 2.6.25,
+the bounding set was a system-wide attribute shared by all threads.
+That system-wide value was employed to calculate the new permitted set during
+.BR execve (2)
+in the same manner as shown above for
+.IR P(bounding) .
+.PP
+.IR Note :
+during the capability transitions described above,
+file capabilities may be ignored (treated as empty) for the same reasons
+that the set-user-ID and set-group-ID bits are ignored; see
+.BR execve (2).
+File capabilities are similarly ignored if the kernel was booted with the
+.I no_file_caps
+option.
+.PP
+.IR Note :
+according to the rules above,
+if a process with nonzero user IDs performs an
+.BR execve (2)
+then any capabilities that are present in
+its permitted and effective sets will be cleared.
+For the treatment of capabilities when a process with a
+user ID of zero performs an
+.BR execve (2),
+see
+.I Capabilities and execution of programs by root
+below.
+.\"
+.SS Safety checking for capability-dumb binaries
+A capability-dumb binary is an application that has been
+marked to have file capabilities, but has not been converted to use the
+.BR libcap (3)
+API to manipulate its capabilities.
+(In other words, this is a traditional set-user-ID-root program
+that has been switched to use file capabilities,
+but whose code has not been modified to understand capabilities.)
+For such applications,
+the effective capability bit is set on the file,
+so that the file permitted capabilities are automatically
+enabled in the process effective set when executing the file.
+The kernel recognizes a file which has the effective capability bit set
+as capability-dumb for the purpose of the check described here.
+.PP
+When executing a capability-dumb binary,
+the kernel checks if the process obtained all permitted capabilities
+that were specified in the file permitted set,
+after the capability transformations described above have been performed.
+(The typical reason why this might
+.I not
+occur is that the capability bounding set masked out some
+of the capabilities in the file permitted set.)
+If the process did not obtain the full set of
+file permitted capabilities, then
+.BR execve (2)
+fails with the error
+.BR EPERM .
+This prevents possible security risks that could arise when
+a capability-dumb application is executed with less privilege than it needs.
+Note that, by definition,
+the application could not itself recognize this problem,
+since it does not employ the
+.BR libcap (3)
+API.
+.\"
+.SS Capabilities and execution of programs by root
+.\" See cap_bprm_set_creds(), bprm_caps_from_vfs_cap() and
+.\" handle_privileged_root() in security/commoncap.c (Linux 5.0 source)
+In order to mirror traditional UNIX semantics,
+the kernel performs special treatment of file capabilities when
+a process with UID 0 (root) executes a program and
+when a set-user-ID-root program is executed.
+.PP
+After having performed any changes to the process effective ID that
+were triggered by the set-user-ID mode bit of the binary\[em]e.g.,
+switching the effective user ID to 0 (root) because
+a set-user-ID-root program was executed\[em]the
+kernel calculates the file capability sets as follows:
+.IP (1) 5
+If the real or effective user ID of the process is 0 (root),
+then the file inheritable and permitted sets are ignored;
+instead they are notionally considered to be all ones
+(i.e., all capabilities enabled).
+(There is one exception to this behavior, described in
+.I Set-user-ID-root programs that have file capabilities
+below.)
+.IP (2)
+If the effective user ID of the process is 0 (root) or
+the file effective bit is in fact enabled,
+then the file effective bit is notionally defined to be one (enabled).
+.PP
+These notional values for the file's capability sets are then used
+as described above to calculate the transformation of the process's
+capabilities during
+.BR execve (2).
+.PP
+Thus, when a process with nonzero UIDs
+.BR execve (2)s
+a set-user-ID-root program that does not have capabilities attached,
+or when a process whose real and effective UIDs are zero
+.BR execve (2)s
+a program, the calculation of the process's new
+permitted capabilities simplifies to:
+.PP
+.in +4n
+.EX
+P'(permitted) = P(inheritable) | P(bounding)
+\&
+P'(effective) = P'(permitted)
+.EE
+.in
+.PP
+Consequently, the process gains all capabilities in its permitted and
+effective capability sets,
+except those masked out by the capability bounding set.
+(In the calculation of P'(permitted),
+the P'(ambient) term can be simplified away because it is by
+definition a proper subset of P(inheritable).)
+.PP
+The special treatments of user ID 0 (root) described in this subsection
+can be disabled using the securebits mechanism described below.
+.\"
+.\"
+.SS Set-user-ID-root programs that have file capabilities
+There is one exception to the behavior described in
+.I Capabilities and execution of programs by root
+above.
+If (a) the binary that is being executed has capabilities attached and
+(b) the real user ID of the process is
+.I not
+0 (root) and
+(c) the effective user ID of the process
+.I is
+0 (root), then the file capability bits are honored
+(i.e., they are not notionally considered to be all ones).
+The usual way in which this situation can arise is when executing
+a set-UID-root program that also has file capabilities.
+When such a program is executed,
+the process gains just the capabilities granted by the program
+(i.e., not all capabilities,
+as would occur when executing a set-user-ID-root program
+that does not have any associated file capabilities).
+.PP
+Note that one can assign empty capability sets to a program file,
+and thus it is possible to create a set-user-ID-root program that
+changes the effective and saved set-user-ID of the process
+that executes the program to 0,
+but confers no capabilities to that process.
+.\"
+.SS Capability bounding set
+The capability bounding set is a security mechanism that can be used
+to limit the capabilities that can be gained during an
+.BR execve (2).
+The bounding set is used in the following ways:
+.IP \[bu] 3
+During an
+.BR execve (2),
+the capability bounding set is ANDed with the file permitted
+capability set, and the result of this operation is assigned to the
+thread's permitted capability set.
+The capability bounding set thus places a limit on the permitted
+capabilities that may be granted by an executable file.
+.IP \[bu]
+(Since Linux 2.6.25)
+The capability bounding set acts as a limiting superset for
+the capabilities that a thread can add to its inheritable set using
+.BR capset (2).
+This means that if a capability is not in the bounding set,
+then a thread can't add this capability to its
+inheritable set, even if it was in its permitted capabilities,
+and thereby cannot have this capability preserved in its
+permitted set when it
+.BR execve (2)s
+a file that has the capability in its inheritable set.
+.PP
+Note that the bounding set masks the file permitted capabilities,
+but not the inheritable capabilities.
+If a thread maintains a capability in its inheritable set
+that is not in its bounding set,
+then it can still gain that capability in its permitted set
+by executing a file that has the capability in its inheritable set.
+.PP
+Depending on the kernel version, the capability bounding set is either
+a system-wide attribute, or a per-process attribute.
+.PP
+.B "Capability bounding set from Linux 2.6.25 onward"
+.PP
+From Linux 2.6.25, the
+.I "capability bounding set"
+is a per-thread attribute.
+(The system-wide capability bounding set described below no longer exists.)
+.PP
+The bounding set is inherited at
+.BR fork (2)
+from the thread's parent, and is preserved across an
+.BR execve (2).
+.PP
+A thread may remove capabilities from its capability bounding set using the
+.BR prctl (2)
+.B PR_CAPBSET_DROP
+operation, provided it has the
+.B CAP_SETPCAP
+capability.
+Once a capability has been dropped from the bounding set,
+it cannot be restored to that set.
+A thread can determine if a capability is in its bounding set using the
+.BR prctl (2)
+.B PR_CAPBSET_READ
+operation.
+.PP
+Removing capabilities from the bounding set is supported only if file
+capabilities are compiled into the kernel.
+Before Linux 2.6.33,
+file capabilities were an optional feature configurable via the
+.B CONFIG_SECURITY_FILE_CAPABILITIES
+option.
+Since Linux 2.6.33,
+.\" commit b3a222e52e4d4be77cc4520a57af1a4a0d8222d1
+the configuration option has been removed
+and file capabilities are always part of the kernel.
+When file capabilities are compiled into the kernel, the
+.B init
+process (the ancestor of all processes) begins with a full bounding set.
+If file capabilities are not compiled into the kernel, then
+.B init
+begins with a full bounding set minus
+.BR CAP_SETPCAP ,
+because this capability has a different meaning when there are
+no file capabilities.
+.PP
+Removing a capability from the bounding set does not remove it
+from the thread's inheritable set.
+However it does prevent the capability from being added
+back into the thread's inheritable set in the future.
+.PP
+.B "Capability bounding set prior to Linux 2.6.25"
+.PP
+Before Linux 2.6.25, the capability bounding set is a system-wide
+attribute that affects all threads on the system.
+The bounding set is accessible via the file
+.IR /proc/sys/kernel/cap\-bound .
+(Confusingly, this bit mask parameter is expressed as a
+signed decimal number in
+.IR /proc/sys/kernel/cap\-bound .)
+.PP
+Only the
+.B init
+process may set capabilities in the capability bounding set;
+other than that, the superuser (more precisely: a process with the
+.B CAP_SYS_MODULE
+capability) may only clear capabilities from this set.
+.PP
+On a standard system the capability bounding set always masks out the
+.B CAP_SETPCAP
+capability.
+To remove this restriction (dangerous!), modify the definition of
+.B CAP_INIT_EFF_SET
+in
+.I include/linux/capability.h
+and rebuild the kernel.
+.PP
+The system-wide capability bounding set feature was added
+to Linux 2.2.11.
+.\"
+.\"
+.\"
+.SS Effect of user ID changes on capabilities
+To preserve the traditional semantics for transitions between
+0 and nonzero user IDs,
+the kernel makes the following changes to a thread's capability
+sets on changes to the thread's real, effective, saved set,
+and filesystem user IDs (using
+.BR setuid (2),
+.BR setresuid (2),
+or similar):
+.IP \[bu] 3
+If one or more of the real, effective, or saved set user IDs
+was previously 0, and as a result of the UID changes all of these IDs
+have a nonzero value,
+then all capabilities are cleared from the permitted, effective, and ambient
+capability sets.
+.IP \[bu]
+If the effective user ID is changed from 0 to nonzero,
+then all capabilities are cleared from the effective set.
+.IP \[bu]
+If the effective user ID is changed from nonzero to 0,
+then the permitted set is copied to the effective set.
+.IP \[bu]
+If the filesystem user ID is changed from 0 to nonzero (see
+.BR setfsuid (2)),
+then the following capabilities are cleared from the effective set:
+.BR CAP_CHOWN ,
+.BR CAP_DAC_OVERRIDE ,
+.BR CAP_DAC_READ_SEARCH ,
+.BR CAP_FOWNER ,
+.BR CAP_FSETID ,
+.B CAP_LINUX_IMMUTABLE
+(since Linux 2.6.30),
+.BR CAP_MAC_OVERRIDE ,
+and
+.B CAP_MKNOD
+(since Linux 2.6.30).
+If the filesystem UID is changed from nonzero to 0,
+then any of these capabilities that are enabled in the permitted set
+are enabled in the effective set.
+.PP
+If a thread that has a 0 value for one or more of its user IDs wants
+to prevent its permitted capability set being cleared when it resets
+all of its user IDs to nonzero values, it can do so using the
+.B SECBIT_KEEP_CAPS
+securebits flag described below.
+.\"
+.SS Programmatically adjusting capability sets
+A thread can retrieve and change its permitted, effective, and inheritable
+capability sets using the
+.BR capget (2)
+and
+.BR capset (2)
+system calls.
+However, the use of
+.BR cap_get_proc (3)
+and
+.BR cap_set_proc (3),
+both provided in the
+.I libcap
+package,
+is preferred for this purpose.
+The following rules govern changes to the thread capability sets:
+.IP \[bu] 3
+If the caller does not have the
+.B CAP_SETPCAP
+capability,
+the new inheritable set must be a subset of the combination
+of the existing inheritable and permitted sets.
+.IP \[bu]
+(Since Linux 2.6.25)
+The new inheritable set must be a subset of the combination of the
+existing inheritable set and the capability bounding set.
+.IP \[bu]
+The new permitted set must be a subset of the existing permitted set
+(i.e., it is not possible to acquire permitted capabilities
+that the thread does not currently have).
+.IP \[bu]
+The new effective set must be a subset of the new permitted set.
+.SS The securebits flags: establishing a capabilities-only environment
+.\" For some background:
+.\" see http://lwn.net/Articles/280279/ and
+.\" http://article.gmane.org/gmane.linux.kernel.lsm/5476/
+Starting with Linux 2.6.26,
+and with a kernel in which file capabilities are enabled,
+Linux implements a set of per-thread
+.I securebits
+flags that can be used to disable special handling of capabilities for UID 0
+.RI ( root ).
+These flags are as follows:
+.TP
+.B SECBIT_KEEP_CAPS
+Setting this flag allows a thread that has one or more 0 UIDs to retain
+capabilities in its permitted set
+when it switches all of its UIDs to nonzero values.
+If this flag is not set,
+then such a UID switch causes the thread to lose all permitted capabilities.
+This flag is always cleared on an
+.BR execve (2).
+.IP
+Note that even with the
+.B SECBIT_KEEP_CAPS
+flag set, the effective capabilities of a thread are cleared when it
+switches its effective UID to a nonzero value.
+However,
+if the thread has set this flag and its effective UID is already nonzero,
+and the thread subsequently switches all other UIDs to nonzero values,
+then the effective capabilities will not be cleared.
+.IP
+The setting of the
+.B SECBIT_KEEP_CAPS
+flag is ignored if the
+.B SECBIT_NO_SETUID_FIXUP
+flag is set.
+(The latter flag provides a superset of the effect of the former flag.)
+.IP
+This flag provides the same functionality as the older
+.BR prctl (2)
+.B PR_SET_KEEPCAPS
+operation.
+.TP
+.B SECBIT_NO_SETUID_FIXUP
+Setting this flag stops the kernel from adjusting the process's
+permitted, effective, and ambient capability sets when
+the thread's effective and filesystem UIDs are switched between
+zero and nonzero values.
+See
+.I Effect of user ID changes on capabilities
+above.
+.TP
+.B SECBIT_NOROOT
+If this bit is set, then the kernel does not grant capabilities
+when a set-user-ID-root program is executed, or when a process with
+an effective or real UID of 0 calls
+.BR execve (2).
+(See
+.I Capabilities and execution of programs by root
+above.)
+.TP
+.B SECBIT_NO_CAP_AMBIENT_RAISE
+Setting this flag disallows raising ambient capabilities via the
+.BR prctl (2)
+.B PR_CAP_AMBIENT_RAISE
+operation.
+.PP
+Each of the above "base" flags has a companion "locked" flag.
+Setting any of the "locked" flags is irreversible,
+and has the effect of preventing further changes to the
+corresponding "base" flag.
+The locked flags are:
+.BR SECBIT_KEEP_CAPS_LOCKED ,
+.BR SECBIT_NO_SETUID_FIXUP_LOCKED ,
+.BR SECBIT_NOROOT_LOCKED ,
+and
+.BR SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED .
+.PP
+The
+.I securebits
+flags can be modified and retrieved using the
+.BR prctl (2)
+.B PR_SET_SECUREBITS
+and
+.B PR_GET_SECUREBITS
+operations.
+The
+.B CAP_SETPCAP
+capability is required to modify the flags.
+Note that the
+.B SECBIT_*
+constants are available only after including the
+.I <linux/securebits.h>
+header file.
+.PP
+The
+.I securebits
+flags are inherited by child processes.
+During an
+.BR execve (2),
+all of the flags are preserved, except
+.B SECBIT_KEEP_CAPS
+which is always cleared.
+.PP
+An application can use the following call to lock itself,
+and all of its descendants,
+into an environment where the only way of gaining capabilities
+is by executing a program with associated file capabilities:
+.PP
+.in +4n
+.EX
+prctl(PR_SET_SECUREBITS,
+ /* SECBIT_KEEP_CAPS off */
+ SECBIT_KEEP_CAPS_LOCKED |
+ SECBIT_NO_SETUID_FIXUP |
+ SECBIT_NO_SETUID_FIXUP_LOCKED |
+ SECBIT_NOROOT |
+ SECBIT_NOROOT_LOCKED);
+ /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE
+ is not required */
+.EE
+.in
+.\"
+.\"
+.SS Per-user-namespace """set-user-ID-root""" programs
+A set-user-ID program whose UID matches the UID that
+created a user namespace will confer capabilities
+in the process's permitted and effective sets
+when executed by any process inside that namespace
+or any descendant user namespace.
+.PP
+The rules about the transformation of the process's capabilities during the
+.BR execve (2)
+are exactly as described in
+.I Transformation of capabilities during execve()
+and
+.I Capabilities and execution of programs by root
+above,
+with the difference that, in the latter subsection, "root"
+is the UID of the creator of the user namespace.
+.\"
+.\"
+.SS Namespaced file capabilities
+.\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340
+Traditional (i.e., version 2) file capabilities associate
+only a set of capability masks with a binary executable file.
+When a process executes a binary with such capabilities,
+it gains the associated capabilities (within its user namespace)
+as per the rules described in
+.I Transformation of capabilities during execve()
+above.
+.PP
+Because version 2 file capabilities confer capabilities to
+the executing process regardless of which user namespace it resides in,
+only privileged processes are permitted to associate capabilities with a file.
+Here, "privileged" means a process that has the
+.B CAP_SETFCAP
+capability in the user namespace where the filesystem was mounted
+(normally the initial user namespace).
+This limitation renders file capabilities useless for certain use cases.
+For example, in user-namespaced containers,
+it can be desirable to be able to create a binary that
+confers capabilities only to processes executed inside that container,
+but not to processes that are executed outside the container.
+.PP
+Linux 4.14 added so-called namespaced file capabilities
+to support such use cases.
+Namespaced file capabilities are recorded as version 3 (i.e.,
+.BR VFS_CAP_REVISION_3 )
+.I security.capability
+extended attributes.
+Such an attribute is automatically created in the circumstances described
+in
+.I File capability extended attribute versioning
+above.
+When a version 3
+.I security.capability
+extended attribute is created,
+the kernel records not just the capability masks in the extended attribute,
+but also the namespace root user ID.
+.PP
+As with a binary that has
+.B VFS_CAP_REVISION_2
+file capabilities, a binary with
+.B VFS_CAP_REVISION_3
+file capabilities confers capabilities to a process during
+.BR execve ().
+However, capabilities are conferred only if the binary is executed by
+a process that resides in a user namespace whose
+UID 0 maps to the root user ID that is saved in the extended attribute,
+or when executed by a process that resides in a descendant of such a namespace.
+.\"
+.\"
+.SS Interaction with user namespaces
+For further information on the interaction of
+capabilities and user namespaces, see
+.BR user_namespaces (7).
+.SH STANDARDS
+No standards govern capabilities, but the Linux capability implementation
+is based on the withdrawn
+.UR https://archive.org\:/details\:/posix_1003.1e\-990310
+POSIX.1e draft standard
+.UE .
+.SH NOTES
+When attempting to
+.BR strace (1)
+binaries that have capabilities (or set-user-ID-root binaries),
+you may find the
+.I \-u <username>
+option useful.
+Something like:
+.PP
+.in +4n
+.EX
+$ \fBsudo strace \-o trace.log \-u ceci ./myprivprog\fP
+.EE
+.in
+.PP
+From Linux 2.5.27 to Linux 2.6.26,
+.\" commit 5915eb53861c5776cfec33ca4fcc1fd20d66dd27 removed
+.\" CONFIG_SECURITY_CAPABILITIES
+capabilities were an optional kernel component,
+and could be enabled/disabled via the
+.B CONFIG_SECURITY_CAPABILITIES
+kernel configuration option.
+.PP
+The
+.IR /proc/ pid /task/TID/status
+file can be used to view the capability sets of a thread.
+The
+.IR /proc/ pid /status
+file shows the capability sets of a process's main thread.
+Before Linux 3.8, nonexistent capabilities were shown as being
+enabled (1) in these sets.
+Since Linux 3.8,
+.\" 7b9a7ec565505699f503b4fcf61500dceb36e744
+all nonexistent capabilities (above
+.BR CAP_LAST_CAP )
+are shown as disabled (0).
+.PP
+The
+.I libcap
+package provides a suite of routines for setting and
+getting capabilities that is more comfortable and less likely
+to change than the interface provided by
+.BR capset (2)
+and
+.BR capget (2).
+This package also provides the
+.BR setcap (8)
+and
+.BR getcap (8)
+programs.
+It can be found at
+.br
+.UR https://git.kernel.org\:/pub\:/scm\:/libs\:/libcap\:/libcap.git\:/refs/
+.UE .
+.PP
+Before Linux 2.6.24, and from Linux 2.6.24 to Linux 2.6.32 if
+file capabilities are not enabled, a thread with the
+.B CAP_SETPCAP
+capability can manipulate the capabilities of threads other than itself.
+However, this is only theoretically possible,
+since no thread ever has
+.B CAP_SETPCAP
+in either of these cases:
+.IP \[bu] 3
+In the pre-2.6.25 implementation the system-wide capability bounding set,
+.IR /proc/sys/kernel/cap\-bound ,
+always masks out the
+.B CAP_SETPCAP
+capability, and this can not be changed
+without modifying the kernel source and rebuilding the kernel.
+.IP \[bu]
+If file capabilities are disabled (i.e., the kernel
+.B CONFIG_SECURITY_FILE_CAPABILITIES
+option is disabled), then
+.B init
+starts out with the
+.B CAP_SETPCAP
+capability removed from its per-process bounding
+set, and that bounding set is inherited by all other processes
+created on the system.
+.SH SEE ALSO
+.BR capsh (1),
+.BR setpriv (1),
+.BR prctl (2),
+.BR setfsuid (2),
+.BR cap_clear (3),
+.BR cap_copy_ext (3),
+.BR cap_from_text (3),
+.BR cap_get_file (3),
+.BR cap_get_proc (3),
+.BR cap_init (3),
+.BR capgetp (3),
+.BR capsetp (3),
+.BR libcap (3),
+.BR proc (5),
+.BR credentials (7),
+.BR pthreads (7),
+.BR user_namespaces (7),
+.BR captest (8), \" from libcap-ng
+.BR filecap (8), \" from libcap-ng
+.BR getcap (8),
+.BR getpcaps (8),
+.BR netcap (8), \" from libcap-ng
+.BR pscap (8), \" from libcap-ng
+.BR setcap (8)
+.PP
+.I include/linux/capability.h
+in the Linux kernel source tree