diff options
Diffstat (limited to 'man7/capabilities.7')
-rw-r--r-- | man7/capabilities.7 | 1872 |
1 files changed, 1872 insertions, 0 deletions
diff --git a/man7/capabilities.7 b/man7/capabilities.7 new file mode 100644 index 0000000..c8766d2 --- /dev/null +++ b/man7/capabilities.7 @@ -0,0 +1,1872 @@ +.\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com> +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.\" 6 Aug 2002 - Initial Creation +.\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com> +.\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com> +.\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER +.\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE +.\" 2008-07-15, Serge Hallyn <serue@us.bbm.com> +.\" Document file capabilities, per-process capability +.\" bounding set, changed semantics for CAP_SETPCAP, +.\" and other changes in Linux 2.6.2[45]. +.\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP. +.\" 2008-07-15, mtk +.\" Add text describing circumstances in which CAP_SETPCAP +.\" (theoretically) permits a thread to change the +.\" capability sets of another thread. +.\" Add section describing rules for programmatically +.\" adjusting thread capability sets. +.\" Describe rationale for capability bounding set. +.\" Document "securebits" flags. +.\" Add text noting that if we set the effective flag for one file +.\" capability, then we must also set the effective flag for all +.\" other capabilities where the permitted or inheritable bit is set. +.\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG +.\" +.TH Capabilities 7 2023-05-03 "Linux man-pages 6.05.01" +.SH NAME +capabilities \- overview of Linux capabilities +.SH DESCRIPTION +For the purpose of performing permission checks, +traditional UNIX implementations distinguish two categories of processes: +.I privileged +processes (whose effective user ID is 0, referred to as superuser or root), +and +.I unprivileged +processes (whose effective UID is nonzero). +Privileged processes bypass all kernel permission checks, +while unprivileged processes are subject to full permission +checking based on the process's credentials +(usually: effective UID, effective GID, and supplementary group list). +.PP +Starting with Linux 2.2, Linux divides the privileges traditionally +associated with superuser into distinct units, known as +.IR capabilities , +which can be independently enabled and disabled. +Capabilities are a per-thread attribute. +.\" +.SS Capabilities list +The following list shows the capabilities implemented on Linux, +and the operations or behaviors that each capability permits: +.TP +.BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)" +Enable and disable kernel auditing; change auditing filter rules; +retrieve auditing status and filtering rules. +.TP +.BR CAP_AUDIT_READ " (since Linux 3.16)" +.\" commit a29b694aa1739f9d76538e34ae25524f9c549d59 +.\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48 +Allow reading the audit log via a multicast netlink socket. +.TP +.BR CAP_AUDIT_WRITE " (since Linux 2.6.11)" +Write records to kernel auditing log. +.\" FIXME Add FAN_ENABLE_AUDIT +.TP +.BR CAP_BLOCK_SUSPEND " (since Linux 3.5)" +Employ features that can block system suspend +.RB ( epoll (7) +.BR EPOLLWAKEUP , +.IR /proc/sys/wake_lock ). +.TP +.BR CAP_BPF " (since Linux 5.8)" +Employ privileged BPF operations; see +.BR bpf (2) +and +.BR bpf\-helpers (7). +.IP +This capability was added in Linux 5.8 to separate out +BPF functionality from the overloaded +.B CAP_SYS_ADMIN +capability. +.TP +.BR CAP_CHECKPOINT_RESTORE " (since Linux 5.9)" +.\" commit 124ea650d3072b005457faed69909221c2905a1f +.PD 0 +.RS +.IP \[bu] 3 +Update +.I /proc/sys/kernel/ns_last_pid +(see +.BR pid_namespaces (7)); +.IP \[bu] +employ the +.I set_tid +feature of +.BR clone3 (2); +.\" FIXME There is also some use case relating to +.\" prctl_set_mm_exe_file(); in the 5.9 sources, see +.\" prctl_set_mm_map(). +.IP \[bu] +read the contents of the symbolic links in +.IR /proc/ pid /map_files +for other processes. +.RE +.PD +.IP +This capability was added in Linux 5.9 to separate out +checkpoint/restore functionality from the overloaded +.B CAP_SYS_ADMIN +capability. +.TP +.B CAP_CHOWN +Make arbitrary changes to file UIDs and GIDs (see +.BR chown (2)). +.TP +.B CAP_DAC_OVERRIDE +Bypass file read, write, and execute permission checks. +(DAC is an abbreviation of "discretionary access control".) +.TP +.B CAP_DAC_READ_SEARCH +.PD 0 +.RS +.IP \[bu] 3 +Bypass file read permission checks and +directory read and execute permission checks; +.IP \[bu] +invoke +.BR open_by_handle_at (2); +.IP \[bu] +use the +.BR linkat (2) +.B AT_EMPTY_PATH +flag to create a link to a file referred to by a file descriptor. +.RE +.PD +.TP +.B CAP_FOWNER +.PD 0 +.RS +.IP \[bu] 3 +Bypass permission checks on operations that normally +require the filesystem UID of the process to match the UID of +the file (e.g., +.BR chmod (2), +.BR utime (2)), +excluding those operations covered by +.B CAP_DAC_OVERRIDE +and +.BR CAP_DAC_READ_SEARCH ; +.IP \[bu] +set inode flags (see +.BR ioctl_iflags (2)) +on arbitrary files; +.IP \[bu] +set Access Control Lists (ACLs) on arbitrary files; +.IP \[bu] +ignore directory sticky bit on file deletion; +.IP \[bu] +modify +.I user +extended attributes on sticky directory owned by any user; +.IP \[bu] +specify +.B O_NOATIME +for arbitrary files in +.BR open (2) +and +.BR fcntl (2). +.RE +.PD +.TP +.B CAP_FSETID +.PD 0 +.RS +.IP \[bu] 3 +Don't clear set-user-ID and set-group-ID mode +bits when a file is modified; +.IP \[bu] +set the set-group-ID bit for a file whose GID does not match +the filesystem or any of the supplementary GIDs of the calling process. +.RE +.PD +.TP +.B CAP_IPC_LOCK +.\" FIXME . As at Linux 3.2, there are some strange uses of this capability +.\" in other places; they probably should be replaced with something else. +.PD 0 +.RS +.IP \[bu] 3 +Lock memory +.RB ( mlock (2), +.BR mlockall (2), +.BR mmap (2), +.BR shmctl (2)); +.IP \[bu] +Allocate memory using huge pages +.RB ( memfd_create (2), +.BR mmap (2), +.BR shmctl (2)). +.RE +.PD +.TP +.B CAP_IPC_OWNER +Bypass permission checks for operations on System V IPC objects. +.TP +.B CAP_KILL +Bypass permission checks for sending signals (see +.BR kill (2)). +This includes use of the +.BR ioctl (2) +.B KDSIGACCEPT +operation. +.\" FIXME . CAP_KILL also has an effect for threads + setting child +.\" termination signal to other than SIGCHLD: without this +.\" capability, the termination signal reverts to SIGCHLD +.\" if the child does an exec(). What is the rationale +.\" for this? +.TP +.BR CAP_LEASE " (since Linux 2.4)" +Establish leases on arbitrary files (see +.BR fcntl (2)). +.TP +.B CAP_LINUX_IMMUTABLE +Set the +.B FS_APPEND_FL +and +.B FS_IMMUTABLE_FL +inode flags (see +.BR ioctl_iflags (2)). +.TP +.BR CAP_MAC_ADMIN " (since Linux 2.6.25)" +Allow MAC configuration or state changes. +Implemented for the Smack Linux Security Module (LSM). +.TP +.BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)" +Override Mandatory Access Control (MAC). +Implemented for the Smack LSM. +.TP +.BR CAP_MKNOD " (since Linux 2.4)" +Create special files using +.BR mknod (2). +.TP +.B CAP_NET_ADMIN +Perform various network-related operations: +.PD 0 +.RS +.IP \[bu] 3 +interface configuration; +.IP \[bu] +administration of IP firewall, masquerading, and accounting; +.IP \[bu] +modify routing tables; +.IP \[bu] +bind to any address for transparent proxying; +.IP \[bu] +set type-of-service (TOS); +.IP \[bu] +clear driver statistics; +.IP \[bu] +set promiscuous mode; +.IP \[bu] +enabling multicasting; +.IP \[bu] +use +.BR setsockopt (2) +to set the following socket options: +.BR SO_DEBUG , +.BR SO_MARK , +.B SO_PRIORITY +(for a priority outside the range 0 to 6), +.BR SO_RCVBUFFORCE , +and +.BR SO_SNDBUFFORCE . +.RE +.PD +.TP +.B CAP_NET_BIND_SERVICE +Bind a socket to Internet domain privileged ports +(port numbers less than 1024). +.TP +.B CAP_NET_BROADCAST +(Unused) Make socket broadcasts, and listen to multicasts. +.\" FIXME Since Linux 4.2, there are use cases for netlink sockets +.\" commit 59324cf35aba5336b611074028777838a963d03b +.TP +.B CAP_NET_RAW +.PD 0 +.RS +.IP \[bu] 3 +Use RAW and PACKET sockets; +.IP \[bu] +bind to any address for transparent proxying. +.RE +.PD +.\" Also various IP options and setsockopt(SO_BINDTODEVICE) +.TP +.BR CAP_PERFMON " (since Linux 5.8)" +Employ various performance-monitoring mechanisms, including: +.RS +.IP \[bu] 3 +.PD 0 +call +.BR perf_event_open (2); +.IP \[bu] +employ various BPF operations that have performance implications. +.RE +.PD +.IP +This capability was added in Linux 5.8 to separate out +performance monitoring functionality from the overloaded +.B CAP_SYS_ADMIN +capability. +See also the kernel source file +.IR Documentation/admin\-guide/perf\-security.rst . +.TP +.B CAP_SETGID +.RS +.PD 0 +.IP \[bu] 3 +Make arbitrary manipulations of process GIDs and supplementary GID list; +.IP \[bu] +forge GID when passing socket credentials via UNIX domain sockets; +.IP \[bu] +write a group ID mapping in a user namespace (see +.BR user_namespaces (7)). +.PD +.RE +.TP +.BR CAP_SETFCAP " (since Linux 2.6.24)" +Set arbitrary capabilities on a file. +.IP +.\" commit db2e718a47984b9d71ed890eb2ea36ecf150de18 +Since Linux 5.12, this capability is +also needed to map user ID 0 in a new user namespace; see +.BR user_namespaces (7) +for details. +.TP +.B CAP_SETPCAP +If file capabilities are supported (i.e., since Linux 2.6.24): +add any capability from the calling thread's bounding set +to its inheritable set; +drop capabilities from the bounding set (via +.BR prctl (2) +.BR PR_CAPBSET_DROP ); +make changes to the +.I securebits +flags. +.IP +If file capabilities are not supported (i.e., before Linux 2.6.24): +grant or remove any capability in the +caller's permitted capability set to or from any other process. +(This property of +.B CAP_SETPCAP +is not available when the kernel is configured to support +file capabilities, since +.B CAP_SETPCAP +has entirely different semantics for such kernels.) +.TP +.B CAP_SETUID +.RS +.PD 0 +.IP \[bu] 3 +Make arbitrary manipulations of process UIDs +.RB ( setuid (2), +.BR setreuid (2), +.BR setresuid (2), +.BR setfsuid (2)); +.IP \[bu] +forge UID when passing socket credentials via UNIX domain sockets; +.IP \[bu] +write a user ID mapping in a user namespace (see +.BR user_namespaces (7)). +.PD +.RE +.\" FIXME CAP_SETUID also an effect in exec(); document this. +.TP +.B CAP_SYS_ADMIN +.IR Note : +this capability is overloaded; see +.I Notes to kernel developers +below. +.IP +.PD 0 +.RS +.IP \[bu] 3 +Perform a range of system administration operations including: +.BR quotactl (2), +.BR mount (2), +.BR umount (2), +.BR pivot_root (2), +.BR swapon (2), +.BR swapoff (2), +.BR sethostname (2), +and +.BR setdomainname (2); +.IP \[bu] +perform privileged +.BR syslog (2) +operations (since Linux 2.6.37, +.B CAP_SYSLOG +should be used to permit such operations); +.IP \[bu] +perform +.B VM86_REQUEST_IRQ +.BR vm86 (2) +command; +.IP \[bu] +access the same checkpoint/restore functionality that is governed by +.B CAP_CHECKPOINT_RESTORE +(but the latter, weaker capability is preferred for accessing +that functionality). +.IP \[bu] +perform the same BPF operations as are governed by +.B CAP_BPF +(but the latter, weaker capability is preferred for accessing +that functionality). +.IP \[bu] +employ the same performance monitoring mechanisms as are governed by +.B CAP_PERFMON +(but the latter, weaker capability is preferred for accessing +that functionality). +.IP \[bu] +perform +.B IPC_SET +and +.B IPC_RMID +operations on arbitrary System V IPC objects; +.IP \[bu] +override +.B RLIMIT_NPROC +resource limit; +.IP \[bu] +perform operations on +.I trusted +and +.I security +extended attributes (see +.BR xattr (7)); +.IP \[bu] +use +.BR lookup_dcookie (2); +.IP \[bu] +use +.BR ioprio_set (2) +to assign +.B IOPRIO_CLASS_RT +and (before Linux 2.6.25) +.B IOPRIO_CLASS_IDLE +I/O scheduling classes; +.IP \[bu] +forge PID when passing socket credentials via UNIX domain sockets; +.IP \[bu] +exceed +.IR /proc/sys/fs/file\-max , +the system-wide limit on the number of open files, +in system calls that open files (e.g., +.BR accept (2), +.BR execve (2), +.BR open (2), +.BR pipe (2)); +.IP \[bu] +employ +.B CLONE_* +flags that create new namespaces with +.BR clone (2) +and +.BR unshare (2) +(but, since Linux 3.8, +creating user namespaces does not require any capability); +.IP \[bu] +access privileged +.I perf +event information; +.IP \[bu] +call +.BR setns (2) +(requires +.B CAP_SYS_ADMIN +in the +.I target +namespace); +.IP \[bu] +call +.BR fanotify_init (2); +.IP \[bu] +perform privileged +.B KEYCTL_CHOWN +and +.B KEYCTL_SETPERM +.BR keyctl (2) +operations; +.IP \[bu] +perform +.BR madvise (2) +.B MADV_HWPOISON +operation; +.IP \[bu] +employ the +.B TIOCSTI +.BR ioctl (2) +to insert characters into the input queue of a terminal other than +the caller's controlling terminal; +.IP \[bu] +employ the obsolete +.BR nfsservctl (2) +system call; +.IP \[bu] +employ the obsolete +.BR bdflush (2) +system call; +.IP \[bu] +perform various privileged block-device +.BR ioctl (2) +operations; +.IP \[bu] +perform various privileged filesystem +.BR ioctl (2) +operations; +.IP \[bu] +perform privileged +.BR ioctl (2) +operations on the +.I /dev/random +device (see +.BR random (4)); +.IP \[bu] +install a +.BR seccomp (2) +filter without first having to set the +.I no_new_privs +thread attribute; +.IP \[bu] +modify allow/deny rules for device control groups; +.IP \[bu] +employ the +.BR ptrace (2) +.B PTRACE_SECCOMP_GET_FILTER +operation to dump tracee's seccomp filters; +.IP \[bu] +employ the +.BR ptrace (2) +.B PTRACE_SETOPTIONS +operation to suspend the tracee's seccomp protections (i.e., the +.B PTRACE_O_SUSPEND_SECCOMP +flag); +.IP \[bu] +perform administrative operations on many device drivers; +.IP \[bu] +modify autogroup nice values by writing to +.IR /proc/ pid /autogroup +(see +.BR sched (7)). +.RE +.PD +.TP +.B CAP_SYS_BOOT +Use +.BR reboot (2) +and +.BR kexec_load (2). +.TP +.B CAP_SYS_CHROOT +.RS +.PD 0 +.IP \[bu] 3 +Use +.BR chroot (2); +.IP \[bu] +change mount namespaces using +.BR setns (2). +.PD +.RE +.TP +.B CAP_SYS_MODULE +.RS +.PD 0 +.IP \[bu] 3 +Load and unload kernel modules +(see +.BR init_module (2) +and +.BR delete_module (2)); +.IP \[bu] +before Linux 2.6.25: +drop capabilities from the system-wide capability bounding set. +.PD +.RE +.TP +.B CAP_SYS_NICE +.PD 0 +.RS +.IP \[bu] 3 +Lower the process nice value +.RB ( nice (2), +.BR setpriority (2)) +and change the nice value for arbitrary processes; +.IP \[bu] +set real-time scheduling policies for calling process, +and set scheduling policies and priorities for arbitrary processes +.RB ( sched_setscheduler (2), +.BR sched_setparam (2), +.BR sched_setattr (2)); +.IP \[bu] +set CPU affinity for arbitrary processes +.RB ( sched_setaffinity (2)); +.IP \[bu] +set I/O scheduling class and priority for arbitrary processes +.RB ( ioprio_set (2)); +.IP \[bu] +apply +.BR migrate_pages (2) +to arbitrary processes and allow processes +to be migrated to arbitrary nodes; +.\" FIXME CAP_SYS_NICE also has the following effect for +.\" migrate_pages(2): +.\" do_migrate_pages(mm, &old, &new, +.\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE); +.\" +.\" Document this. +.IP \[bu] +apply +.BR move_pages (2) +to arbitrary processes; +.IP \[bu] +use the +.B MPOL_MF_MOVE_ALL +flag with +.BR mbind (2) +and +.BR move_pages (2). +.RE +.PD +.TP +.B CAP_SYS_PACCT +Use +.BR acct (2). +.TP +.B CAP_SYS_PTRACE +.PD 0 +.RS +.IP \[bu] 3 +Trace arbitrary processes using +.BR ptrace (2); +.IP \[bu] +apply +.BR get_robust_list (2) +to arbitrary processes; +.IP \[bu] +transfer data to or from the memory of arbitrary processes using +.BR process_vm_readv (2) +and +.BR process_vm_writev (2); +.IP \[bu] +inspect processes using +.BR kcmp (2). +.RE +.PD +.TP +.B CAP_SYS_RAWIO +.PD 0 +.RS +.IP \[bu] 3 +Perform I/O port operations +.RB ( iopl (2) +and +.BR ioperm (2)); +.IP \[bu] +access +.IR /proc/kcore ; +.IP \[bu] +employ the +.B FIBMAP +.BR ioctl (2) +operation; +.IP \[bu] +open devices for accessing x86 model-specific registers (MSRs, see +.BR msr (4)); +.IP \[bu] +update +.IR /proc/sys/vm/mmap_min_addr ; +.IP \[bu] +create memory mappings at addresses below the value specified by +.IR /proc/sys/vm/mmap_min_addr ; +.IP \[bu] +map files in +.IR /proc/bus/pci ; +.IP \[bu] +open +.I /dev/mem +and +.IR /dev/kmem ; +.IP \[bu] +perform various SCSI device commands; +.IP \[bu] +perform certain operations on +.BR hpsa (4) +and +.BR cciss (4) +devices; +.IP \[bu] +perform a range of device-specific operations on other devices. +.RE +.PD +.TP +.B CAP_SYS_RESOURCE +.PD 0 +.RS +.IP \[bu] 3 +Use reserved space on ext2 filesystems; +.IP \[bu] +make +.BR ioctl (2) +calls controlling ext3 journaling; +.IP \[bu] +override disk quota limits; +.IP \[bu] +increase resource limits (see +.BR setrlimit (2)); +.IP \[bu] +override +.B RLIMIT_NPROC +resource limit; +.IP \[bu] +override maximum number of consoles on console allocation; +.IP \[bu] +override maximum number of keymaps; +.IP \[bu] +allow more than 64hz interrupts from the real-time clock; +.IP \[bu] +raise +.I msg_qbytes +limit for a System V message queue above the limit in +.I /proc/sys/kernel/msgmnb +(see +.BR msgop (2) +and +.BR msgctl (2)); +.IP \[bu] +allow the +.B RLIMIT_NOFILE +resource limit on the number of "in-flight" file descriptors +to be bypassed when passing file descriptors to another process +via a UNIX domain socket (see +.BR unix (7)); +.IP \[bu] +override the +.I /proc/sys/fs/pipe\-size\-max +limit when setting the capacity of a pipe using the +.B F_SETPIPE_SZ +.BR fcntl (2) +command; +.IP \[bu] +use +.B F_SETPIPE_SZ +to increase the capacity of a pipe above the limit specified by +.IR /proc/sys/fs/pipe\-max\-size ; +.IP \[bu] +override +.IR /proc/sys/fs/mqueue/queues_max , +.IR /proc/sys/fs/mqueue/msg_max , +and +.I /proc/sys/fs/mqueue/msgsize_max +limits when creating POSIX message queues (see +.BR mq_overview (7)); +.IP \[bu] +employ the +.BR prctl (2) +.B PR_SET_MM +operation; +.IP \[bu] +set +.IR /proc/ pid /oom_score_adj +to a value lower than the value last set by a process with +.BR CAP_SYS_RESOURCE . +.RE +.PD +.TP +.B CAP_SYS_TIME +Set system clock +.RB ( settimeofday (2), +.BR stime (2), +.BR adjtimex (2)); +set real-time (hardware) clock. +.TP +.B CAP_SYS_TTY_CONFIG +Use +.BR vhangup (2); +employ various privileged +.BR ioctl (2) +operations on virtual terminals. +.TP +.BR CAP_SYSLOG " (since Linux 2.6.37)" +.RS +.PD 0 +.IP \[bu] 3 +Perform privileged +.BR syslog (2) +operations. +See +.BR syslog (2) +for information on which operations require privilege. +.IP \[bu] +View kernel addresses exposed via +.I /proc +and other interfaces when +.I /proc/sys/kernel/kptr_restrict +has the value 1. +(See the discussion of the +.I kptr_restrict +in +.BR proc (5).) +.PD +.RE +.TP +.BR CAP_WAKE_ALARM " (since Linux 3.0)" +Trigger something that will wake up the system (set +.B CLOCK_REALTIME_ALARM +and +.B CLOCK_BOOTTIME_ALARM +timers). +.\" +.SS Past and current implementation +A full implementation of capabilities requires that: +.IP \[bu] 3 +For all privileged operations, +the kernel must check whether the thread has the required +capability in its effective set. +.IP \[bu] +The kernel must provide system calls allowing a thread's capability sets to +be changed and retrieved. +.IP \[bu] +The filesystem must support attaching capabilities to an executable file, +so that a process gains those capabilities when the file is executed. +.PP +Before Linux 2.6.24, only the first two of these requirements are met; +since Linux 2.6.24, all three requirements are met. +.\" +.SS Notes to kernel developers +When adding a new kernel feature that should be governed by a capability, +consider the following points. +.IP \[bu] 3 +The goal of capabilities is divide the power of superuser into pieces, +such that if a program that has one or more capabilities is compromised, +its power to do damage to the system would be less than the same program +running with root privilege. +.IP \[bu] +You have the choice of either creating a new capability for your new feature, +or associating the feature with one of the existing capabilities. +In order to keep the set of capabilities to a manageable size, +the latter option is preferable, +unless there are compelling reasons to take the former option. +(There is also a technical limit: +the size of capability sets is currently limited to 64 bits.) +.IP \[bu] +To determine which existing capability might best be associated +with your new feature, review the list of capabilities above in order +to find a "silo" into which your new feature best fits. +One approach to take is to determine if there are other features +requiring capabilities that will always be used along with the new feature. +If the new feature is useless without these other features, +you should use the same capability as the other features. +.IP \[bu] +.I Don't +choose +.B CAP_SYS_ADMIN +if you can possibly avoid it! +A vast proportion of existing capability checks are associated +with this capability (see the partial list above). +It can plausibly be called "the new root", +since on the one hand, it confers a wide range of powers, +and on the other hand, +its broad scope means that this is the capability +that is required by many privileged programs. +Don't make the problem worse. +The only new features that should be associated with +.B CAP_SYS_ADMIN +are ones that +.I closely +match existing uses in that silo. +.IP \[bu] +If you have determined that it really is necessary to create +a new capability for your feature, +don't make or name it as a "single-use" capability. +Thus, for example, the addition of the highly specific +.B CAP_SYS_PACCT +was probably a mistake. +Instead, try to identify and name your new capability as a broader +silo into which other related future use cases might fit. +.\" +.SS Thread capability sets +Each thread has the following capability sets containing zero or more +of the above capabilities: +.TP +.I Permitted +This is a limiting superset for the effective +capabilities that the thread may assume. +It is also a limiting superset for the capabilities that +may be added to the inheritable set by a thread that does not have the +.B CAP_SETPCAP +capability in its effective set. +.IP +If a thread drops a capability from its permitted set, +it can never reacquire that capability (unless it +.BR execve (2)s +either a set-user-ID-root program, or +a program whose associated file capabilities grant that capability). +.TP +.I Inheritable +This is a set of capabilities preserved across an +.BR execve (2). +Inheritable capabilities remain inheritable when executing any program, +and inheritable capabilities are added to the permitted set when executing +a program that has the corresponding bits set in the file inheritable set. +.IP +Because inheritable capabilities are not generally preserved across +.BR execve (2) +when running as a non-root user, applications that wish to run helper +programs with elevated capabilities should consider using +ambient capabilities, described below. +.TP +.I Effective +This is the set of capabilities used by the kernel to +perform permission checks for the thread. +.TP +.IR Bounding " (per-thread since Linux 2.6.25)" +The capability bounding set is a mechanism that can be used +to limit the capabilities that are gained during +.BR execve (2). +.IP +Since Linux 2.6.25, this is a per-thread capability set. +In older kernels, the capability bounding set was a system wide attribute +shared by all threads on the system. +.IP +For more details, see +.I Capability bounding set +below. +.TP +.IR Ambient " (since Linux 4.3)" +.\" commit 58319057b7847667f0c9585b9de0e8932b0fdb08 +This is a set of capabilities that are preserved across an +.BR execve (2) +of a program that is not privileged. +The ambient capability set obeys the invariant that no capability +can ever be ambient if it is not both permitted and inheritable. +.IP +The ambient capability set can be directly modified using +.BR prctl (2). +Ambient capabilities are automatically lowered if either of +the corresponding permitted or inheritable capabilities is lowered. +.IP +Executing a program that changes UID or GID due to the +set-user-ID or set-group-ID bits or executing a program that has +any file capabilities set will clear the ambient set. +Ambient capabilities are added to the permitted set and +assigned to the effective set when +.BR execve (2) +is called. +If ambient capabilities cause a process's permitted and effective +capabilities to increase during an +.BR execve (2), +this does not trigger the secure-execution mode described in +.BR ld.so (8). +.PP +A child created via +.BR fork (2) +inherits copies of its parent's capability sets. +For details on how +.BR execve (2) +affects capabilities, see +.I Transformation of capabilities during execve() +below. +.PP +Using +.BR capset (2), +a thread may manipulate its own capability sets; see +.I Programmatically adjusting capability sets +below. +.PP +Since Linux 3.2, the file +.I /proc/sys/kernel/cap_last_cap +.\" commit 73efc0394e148d0e15583e13712637831f926720 +exposes the numerical value of the highest capability +supported by the running kernel; +this can be used to determine the highest bit +that may be set in a capability set. +.\" +.SS File capabilities +Since Linux 2.6.24, the kernel supports +associating capability sets with an executable file using +.BR setcap (8). +The file capability sets are stored in an extended attribute (see +.BR setxattr (2) +and +.BR xattr (7)) +named +.IR "security.capability" . +Writing to this extended attribute requires the +.B CAP_SETFCAP +capability. +The file capability sets, +in conjunction with the capability sets of the thread, +determine the capabilities of a thread after an +.BR execve (2). +.PP +The three file capability sets are: +.TP +.IR Permitted " (formerly known as " forced ): +These capabilities are automatically permitted to the thread, +regardless of the thread's inheritable capabilities. +.TP +.IR Inheritable " (formerly known as " allowed ): +This set is ANDed with the thread's inheritable set to determine which +inheritable capabilities are enabled in the permitted set of +the thread after the +.BR execve (2). +.TP +.IR Effective : +This is not a set, but rather just a single bit. +If this bit is set, then during an +.BR execve (2) +all of the new permitted capabilities for the thread are +also raised in the effective set. +If this bit is not set, then after an +.BR execve (2), +none of the new permitted capabilities is in the new effective set. +.IP +Enabling the file effective capability bit implies +that any file permitted or inheritable capability that causes a +thread to acquire the corresponding permitted capability during an +.BR execve (2) +(see +.I Transformation of capabilities during execve() +below) will also acquire that +capability in its effective set. +Therefore, when assigning capabilities to a file +.RB ( setcap (8), +.BR cap_set_file (3), +.BR cap_set_fd (3)), +if we specify the effective flag as being enabled for any capability, +then the effective flag must also be specified as enabled +for all other capabilities for which the corresponding permitted or +inheritable flag is enabled. +.\" +.SS File capability extended attribute versioning +To allow extensibility, +the kernel supports a scheme to encode a version number inside the +.I security.capability +extended attribute that is used to implement file capabilities. +These version numbers are internal to the implementation, +and not directly visible to user-space applications. +To date, the following versions are supported: +.TP +.B VFS_CAP_REVISION_1 +This was the original file capability implementation, +which supported 32-bit masks for file capabilities. +.TP +.BR VFS_CAP_REVISION_2 " (since Linux 2.6.25)" +.\" commit e338d263a76af78fe8f38a72131188b58fceb591 +This version allows for file capability masks that are 64 bits in size, +and was necessary as the number of supported capabilities grew beyond 32. +The kernel transparently continues to support the execution of files +that have 32-bit version 1 capability masks, +but when adding capabilities to files that did not previously +have capabilities, or modifying the capabilities of existing files, +it automatically uses the version 2 scheme +(or possibly the version 3 scheme, as described below). +.TP +.BR VFS_CAP_REVISION_3 " (since Linux 4.14)" +.\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340 +Version 3 file capabilities are provided +to support namespaced file capabilities (described below). +.IP +As with version 2 file capabilities, +version 3 capability masks are 64 bits in size. +But in addition, the root user ID of namespace is encoded in the +.I security.capability +extended attribute. +(A namespace's root user ID is the value that user ID 0 +inside that namespace maps to in the initial user namespace.) +.IP +Version 3 file capabilities are designed to coexist +with version 2 capabilities; +that is, on a modern Linux system, +there may be some files with version 2 capabilities +while others have version 3 capabilities. +.PP +Before Linux 4.14, +the only kind of file capability extended attribute +that could be attached to a file was a +.B VFS_CAP_REVISION_2 +attribute. +Since Linux 4.14, +the version of the +.I security.capability +extended attribute that is attached to a file +depends on the circumstances in which the attribute was created. +.PP +Starting with Linux 4.14, a +.I security.capability +extended attribute is automatically created as (or converted to) +a version 3 +.RB ( VFS_CAP_REVISION_3 ) +attribute if both of the following are true: +.IP \[bu] 3 +The thread writing the attribute resides in a noninitial user namespace. +(More precisely: the thread resides in a user namespace other +than the one from which the underlying filesystem was mounted.) +.IP \[bu] +The thread has the +.B CAP_SETFCAP +capability over the file inode, +meaning that (a) the thread has the +.B CAP_SETFCAP +capability in its own user namespace; +and (b) the UID and GID of the file inode have mappings in +the writer's user namespace. +.PP +When a +.B VFS_CAP_REVISION_3 +.I security.capability +extended attribute is created, the root user ID of the creating thread's +user namespace is saved in the extended attribute. +.PP +By contrast, creating or modifying a +.I security.capability +extended attribute from a privileged +.RB ( CAP_SETFCAP ) +thread that resides in the +namespace where the underlying filesystem was mounted +(this normally means the initial user namespace) +automatically results in the creation of a version 2 +.RB ( VFS_CAP_REVISION_2 ) +attribute. +.PP +Note that the creation of a version 3 +.I security.capability +extended attribute is automatic. +That is to say, when a user-space application writes +.RB ( setxattr (2)) +a +.I security.capability +attribute in the version 2 format, +the kernel will automatically create a version 3 attribute +if the attribute is created in the circumstances described above. +Correspondingly, when a version 3 +.I security.capability +attribute is retrieved +.RB ( getxattr (2)) +by a process that resides inside a user namespace that was created by the +root user ID (or a descendant of that user namespace), +the returned attribute is (automatically) +simplified to appear as a version 2 attribute +(i.e., the returned value is the size of a version 2 attribute and does +not include the root user ID). +These automatic translations mean that no changes are required to +user-space tools (e.g., +.BR setcap (1) +and +.BR getcap (1)) +in order for those tools to be used to create and retrieve version 3 +.I security.capability +attributes. +.PP +Note that a file can have either a version 2 or a version 3 +.I security.capability +extended attribute associated with it, but not both: +creation or modification of the +.I security.capability +extended attribute will automatically modify the version +according to the circumstances in which the extended attribute is +created or modified. +.\" +.SS Transformation of capabilities during execve() +During an +.BR execve (2), +the kernel calculates the new capabilities of +the process using the following algorithm: +.PP +.in +4n +.EX +P'(ambient) = (file is privileged) ? 0 : P(ambient) +\& +P'(permitted) = (P(inheritable) & F(inheritable)) | + (F(permitted) & P(bounding)) | P'(ambient) +\& +P'(effective) = F(effective) ? P'(permitted) : P'(ambient) +\& +P'(inheritable) = P(inheritable) [i.e., unchanged] +\& +P'(bounding) = P(bounding) [i.e., unchanged] +.EE +.in +.PP +where: +.RS 4 +.TP +P() +denotes the value of a thread capability set before the +.BR execve (2) +.TP +P'() +denotes the value of a thread capability set after the +.BR execve (2) +.TP +F() +denotes a file capability set +.RE +.PP +Note the following details relating to the above capability +transformation rules: +.IP \[bu] 3 +The ambient capability set is present only since Linux 4.3. +When determining the transformation of the ambient set during +.BR execve (2), +a privileged file is one that has capabilities or +has the set-user-ID or set-group-ID bit set. +.IP \[bu] +Prior to Linux 2.6.25, +the bounding set was a system-wide attribute shared by all threads. +That system-wide value was employed to calculate the new permitted set during +.BR execve (2) +in the same manner as shown above for +.IR P(bounding) . +.PP +.IR Note : +during the capability transitions described above, +file capabilities may be ignored (treated as empty) for the same reasons +that the set-user-ID and set-group-ID bits are ignored; see +.BR execve (2). +File capabilities are similarly ignored if the kernel was booted with the +.I no_file_caps +option. +.PP +.IR Note : +according to the rules above, +if a process with nonzero user IDs performs an +.BR execve (2) +then any capabilities that are present in +its permitted and effective sets will be cleared. +For the treatment of capabilities when a process with a +user ID of zero performs an +.BR execve (2), +see +.I Capabilities and execution of programs by root +below. +.\" +.SS Safety checking for capability-dumb binaries +A capability-dumb binary is an application that has been +marked to have file capabilities, but has not been converted to use the +.BR libcap (3) +API to manipulate its capabilities. +(In other words, this is a traditional set-user-ID-root program +that has been switched to use file capabilities, +but whose code has not been modified to understand capabilities.) +For such applications, +the effective capability bit is set on the file, +so that the file permitted capabilities are automatically +enabled in the process effective set when executing the file. +The kernel recognizes a file which has the effective capability bit set +as capability-dumb for the purpose of the check described here. +.PP +When executing a capability-dumb binary, +the kernel checks if the process obtained all permitted capabilities +that were specified in the file permitted set, +after the capability transformations described above have been performed. +(The typical reason why this might +.I not +occur is that the capability bounding set masked out some +of the capabilities in the file permitted set.) +If the process did not obtain the full set of +file permitted capabilities, then +.BR execve (2) +fails with the error +.BR EPERM . +This prevents possible security risks that could arise when +a capability-dumb application is executed with less privilege than it needs. +Note that, by definition, +the application could not itself recognize this problem, +since it does not employ the +.BR libcap (3) +API. +.\" +.SS Capabilities and execution of programs by root +.\" See cap_bprm_set_creds(), bprm_caps_from_vfs_cap() and +.\" handle_privileged_root() in security/commoncap.c (Linux 5.0 source) +In order to mirror traditional UNIX semantics, +the kernel performs special treatment of file capabilities when +a process with UID 0 (root) executes a program and +when a set-user-ID-root program is executed. +.PP +After having performed any changes to the process effective ID that +were triggered by the set-user-ID mode bit of the binary\[em]e.g., +switching the effective user ID to 0 (root) because +a set-user-ID-root program was executed\[em]the +kernel calculates the file capability sets as follows: +.IP (1) 5 +If the real or effective user ID of the process is 0 (root), +then the file inheritable and permitted sets are ignored; +instead they are notionally considered to be all ones +(i.e., all capabilities enabled). +(There is one exception to this behavior, described in +.I Set-user-ID-root programs that have file capabilities +below.) +.IP (2) +If the effective user ID of the process is 0 (root) or +the file effective bit is in fact enabled, +then the file effective bit is notionally defined to be one (enabled). +.PP +These notional values for the file's capability sets are then used +as described above to calculate the transformation of the process's +capabilities during +.BR execve (2). +.PP +Thus, when a process with nonzero UIDs +.BR execve (2)s +a set-user-ID-root program that does not have capabilities attached, +or when a process whose real and effective UIDs are zero +.BR execve (2)s +a program, the calculation of the process's new +permitted capabilities simplifies to: +.PP +.in +4n +.EX +P'(permitted) = P(inheritable) | P(bounding) +\& +P'(effective) = P'(permitted) +.EE +.in +.PP +Consequently, the process gains all capabilities in its permitted and +effective capability sets, +except those masked out by the capability bounding set. +(In the calculation of P'(permitted), +the P'(ambient) term can be simplified away because it is by +definition a proper subset of P(inheritable).) +.PP +The special treatments of user ID 0 (root) described in this subsection +can be disabled using the securebits mechanism described below. +.\" +.\" +.SS Set-user-ID-root programs that have file capabilities +There is one exception to the behavior described in +.I Capabilities and execution of programs by root +above. +If (a) the binary that is being executed has capabilities attached and +(b) the real user ID of the process is +.I not +0 (root) and +(c) the effective user ID of the process +.I is +0 (root), then the file capability bits are honored +(i.e., they are not notionally considered to be all ones). +The usual way in which this situation can arise is when executing +a set-UID-root program that also has file capabilities. +When such a program is executed, +the process gains just the capabilities granted by the program +(i.e., not all capabilities, +as would occur when executing a set-user-ID-root program +that does not have any associated file capabilities). +.PP +Note that one can assign empty capability sets to a program file, +and thus it is possible to create a set-user-ID-root program that +changes the effective and saved set-user-ID of the process +that executes the program to 0, +but confers no capabilities to that process. +.\" +.SS Capability bounding set +The capability bounding set is a security mechanism that can be used +to limit the capabilities that can be gained during an +.BR execve (2). +The bounding set is used in the following ways: +.IP \[bu] 3 +During an +.BR execve (2), +the capability bounding set is ANDed with the file permitted +capability set, and the result of this operation is assigned to the +thread's permitted capability set. +The capability bounding set thus places a limit on the permitted +capabilities that may be granted by an executable file. +.IP \[bu] +(Since Linux 2.6.25) +The capability bounding set acts as a limiting superset for +the capabilities that a thread can add to its inheritable set using +.BR capset (2). +This means that if a capability is not in the bounding set, +then a thread can't add this capability to its +inheritable set, even if it was in its permitted capabilities, +and thereby cannot have this capability preserved in its +permitted set when it +.BR execve (2)s +a file that has the capability in its inheritable set. +.PP +Note that the bounding set masks the file permitted capabilities, +but not the inheritable capabilities. +If a thread maintains a capability in its inheritable set +that is not in its bounding set, +then it can still gain that capability in its permitted set +by executing a file that has the capability in its inheritable set. +.PP +Depending on the kernel version, the capability bounding set is either +a system-wide attribute, or a per-process attribute. +.PP +.B "Capability bounding set from Linux 2.6.25 onward" +.PP +From Linux 2.6.25, the +.I "capability bounding set" +is a per-thread attribute. +(The system-wide capability bounding set described below no longer exists.) +.PP +The bounding set is inherited at +.BR fork (2) +from the thread's parent, and is preserved across an +.BR execve (2). +.PP +A thread may remove capabilities from its capability bounding set using the +.BR prctl (2) +.B PR_CAPBSET_DROP +operation, provided it has the +.B CAP_SETPCAP +capability. +Once a capability has been dropped from the bounding set, +it cannot be restored to that set. +A thread can determine if a capability is in its bounding set using the +.BR prctl (2) +.B PR_CAPBSET_READ +operation. +.PP +Removing capabilities from the bounding set is supported only if file +capabilities are compiled into the kernel. +Before Linux 2.6.33, +file capabilities were an optional feature configurable via the +.B CONFIG_SECURITY_FILE_CAPABILITIES +option. +Since Linux 2.6.33, +.\" commit b3a222e52e4d4be77cc4520a57af1a4a0d8222d1 +the configuration option has been removed +and file capabilities are always part of the kernel. +When file capabilities are compiled into the kernel, the +.B init +process (the ancestor of all processes) begins with a full bounding set. +If file capabilities are not compiled into the kernel, then +.B init +begins with a full bounding set minus +.BR CAP_SETPCAP , +because this capability has a different meaning when there are +no file capabilities. +.PP +Removing a capability from the bounding set does not remove it +from the thread's inheritable set. +However it does prevent the capability from being added +back into the thread's inheritable set in the future. +.PP +.B "Capability bounding set prior to Linux 2.6.25" +.PP +Before Linux 2.6.25, the capability bounding set is a system-wide +attribute that affects all threads on the system. +The bounding set is accessible via the file +.IR /proc/sys/kernel/cap\-bound . +(Confusingly, this bit mask parameter is expressed as a +signed decimal number in +.IR /proc/sys/kernel/cap\-bound .) +.PP +Only the +.B init +process may set capabilities in the capability bounding set; +other than that, the superuser (more precisely: a process with the +.B CAP_SYS_MODULE +capability) may only clear capabilities from this set. +.PP +On a standard system the capability bounding set always masks out the +.B CAP_SETPCAP +capability. +To remove this restriction (dangerous!), modify the definition of +.B CAP_INIT_EFF_SET +in +.I include/linux/capability.h +and rebuild the kernel. +.PP +The system-wide capability bounding set feature was added +to Linux 2.2.11. +.\" +.\" +.\" +.SS Effect of user ID changes on capabilities +To preserve the traditional semantics for transitions between +0 and nonzero user IDs, +the kernel makes the following changes to a thread's capability +sets on changes to the thread's real, effective, saved set, +and filesystem user IDs (using +.BR setuid (2), +.BR setresuid (2), +or similar): +.IP \[bu] 3 +If one or more of the real, effective, or saved set user IDs +was previously 0, and as a result of the UID changes all of these IDs +have a nonzero value, +then all capabilities are cleared from the permitted, effective, and ambient +capability sets. +.IP \[bu] +If the effective user ID is changed from 0 to nonzero, +then all capabilities are cleared from the effective set. +.IP \[bu] +If the effective user ID is changed from nonzero to 0, +then the permitted set is copied to the effective set. +.IP \[bu] +If the filesystem user ID is changed from 0 to nonzero (see +.BR setfsuid (2)), +then the following capabilities are cleared from the effective set: +.BR CAP_CHOWN , +.BR CAP_DAC_OVERRIDE , +.BR CAP_DAC_READ_SEARCH , +.BR CAP_FOWNER , +.BR CAP_FSETID , +.B CAP_LINUX_IMMUTABLE +(since Linux 2.6.30), +.BR CAP_MAC_OVERRIDE , +and +.B CAP_MKNOD +(since Linux 2.6.30). +If the filesystem UID is changed from nonzero to 0, +then any of these capabilities that are enabled in the permitted set +are enabled in the effective set. +.PP +If a thread that has a 0 value for one or more of its user IDs wants +to prevent its permitted capability set being cleared when it resets +all of its user IDs to nonzero values, it can do so using the +.B SECBIT_KEEP_CAPS +securebits flag described below. +.\" +.SS Programmatically adjusting capability sets +A thread can retrieve and change its permitted, effective, and inheritable +capability sets using the +.BR capget (2) +and +.BR capset (2) +system calls. +However, the use of +.BR cap_get_proc (3) +and +.BR cap_set_proc (3), +both provided in the +.I libcap +package, +is preferred for this purpose. +The following rules govern changes to the thread capability sets: +.IP \[bu] 3 +If the caller does not have the +.B CAP_SETPCAP +capability, +the new inheritable set must be a subset of the combination +of the existing inheritable and permitted sets. +.IP \[bu] +(Since Linux 2.6.25) +The new inheritable set must be a subset of the combination of the +existing inheritable set and the capability bounding set. +.IP \[bu] +The new permitted set must be a subset of the existing permitted set +(i.e., it is not possible to acquire permitted capabilities +that the thread does not currently have). +.IP \[bu] +The new effective set must be a subset of the new permitted set. +.SS The securebits flags: establishing a capabilities-only environment +.\" For some background: +.\" see http://lwn.net/Articles/280279/ and +.\" http://article.gmane.org/gmane.linux.kernel.lsm/5476/ +Starting with Linux 2.6.26, +and with a kernel in which file capabilities are enabled, +Linux implements a set of per-thread +.I securebits +flags that can be used to disable special handling of capabilities for UID 0 +.RI ( root ). +These flags are as follows: +.TP +.B SECBIT_KEEP_CAPS +Setting this flag allows a thread that has one or more 0 UIDs to retain +capabilities in its permitted set +when it switches all of its UIDs to nonzero values. +If this flag is not set, +then such a UID switch causes the thread to lose all permitted capabilities. +This flag is always cleared on an +.BR execve (2). +.IP +Note that even with the +.B SECBIT_KEEP_CAPS +flag set, the effective capabilities of a thread are cleared when it +switches its effective UID to a nonzero value. +However, +if the thread has set this flag and its effective UID is already nonzero, +and the thread subsequently switches all other UIDs to nonzero values, +then the effective capabilities will not be cleared. +.IP +The setting of the +.B SECBIT_KEEP_CAPS +flag is ignored if the +.B SECBIT_NO_SETUID_FIXUP +flag is set. +(The latter flag provides a superset of the effect of the former flag.) +.IP +This flag provides the same functionality as the older +.BR prctl (2) +.B PR_SET_KEEPCAPS +operation. +.TP +.B SECBIT_NO_SETUID_FIXUP +Setting this flag stops the kernel from adjusting the process's +permitted, effective, and ambient capability sets when +the thread's effective and filesystem UIDs are switched between +zero and nonzero values. +See +.I Effect of user ID changes on capabilities +above. +.TP +.B SECBIT_NOROOT +If this bit is set, then the kernel does not grant capabilities +when a set-user-ID-root program is executed, or when a process with +an effective or real UID of 0 calls +.BR execve (2). +(See +.I Capabilities and execution of programs by root +above.) +.TP +.B SECBIT_NO_CAP_AMBIENT_RAISE +Setting this flag disallows raising ambient capabilities via the +.BR prctl (2) +.B PR_CAP_AMBIENT_RAISE +operation. +.PP +Each of the above "base" flags has a companion "locked" flag. +Setting any of the "locked" flags is irreversible, +and has the effect of preventing further changes to the +corresponding "base" flag. +The locked flags are: +.BR SECBIT_KEEP_CAPS_LOCKED , +.BR SECBIT_NO_SETUID_FIXUP_LOCKED , +.BR SECBIT_NOROOT_LOCKED , +and +.BR SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED . +.PP +The +.I securebits +flags can be modified and retrieved using the +.BR prctl (2) +.B PR_SET_SECUREBITS +and +.B PR_GET_SECUREBITS +operations. +The +.B CAP_SETPCAP +capability is required to modify the flags. +Note that the +.B SECBIT_* +constants are available only after including the +.I <linux/securebits.h> +header file. +.PP +The +.I securebits +flags are inherited by child processes. +During an +.BR execve (2), +all of the flags are preserved, except +.B SECBIT_KEEP_CAPS +which is always cleared. +.PP +An application can use the following call to lock itself, +and all of its descendants, +into an environment where the only way of gaining capabilities +is by executing a program with associated file capabilities: +.PP +.in +4n +.EX +prctl(PR_SET_SECUREBITS, + /* SECBIT_KEEP_CAPS off */ + SECBIT_KEEP_CAPS_LOCKED | + SECBIT_NO_SETUID_FIXUP | + SECBIT_NO_SETUID_FIXUP_LOCKED | + SECBIT_NOROOT | + SECBIT_NOROOT_LOCKED); + /* Setting/locking SECBIT_NO_CAP_AMBIENT_RAISE + is not required */ +.EE +.in +.\" +.\" +.SS Per-user-namespace """set-user-ID-root""" programs +A set-user-ID program whose UID matches the UID that +created a user namespace will confer capabilities +in the process's permitted and effective sets +when executed by any process inside that namespace +or any descendant user namespace. +.PP +The rules about the transformation of the process's capabilities during the +.BR execve (2) +are exactly as described in +.I Transformation of capabilities during execve() +and +.I Capabilities and execution of programs by root +above, +with the difference that, in the latter subsection, "root" +is the UID of the creator of the user namespace. +.\" +.\" +.SS Namespaced file capabilities +.\" commit 8db6c34f1dbc8e06aa016a9b829b06902c3e1340 +Traditional (i.e., version 2) file capabilities associate +only a set of capability masks with a binary executable file. +When a process executes a binary with such capabilities, +it gains the associated capabilities (within its user namespace) +as per the rules described in +.I Transformation of capabilities during execve() +above. +.PP +Because version 2 file capabilities confer capabilities to +the executing process regardless of which user namespace it resides in, +only privileged processes are permitted to associate capabilities with a file. +Here, "privileged" means a process that has the +.B CAP_SETFCAP +capability in the user namespace where the filesystem was mounted +(normally the initial user namespace). +This limitation renders file capabilities useless for certain use cases. +For example, in user-namespaced containers, +it can be desirable to be able to create a binary that +confers capabilities only to processes executed inside that container, +but not to processes that are executed outside the container. +.PP +Linux 4.14 added so-called namespaced file capabilities +to support such use cases. +Namespaced file capabilities are recorded as version 3 (i.e., +.BR VFS_CAP_REVISION_3 ) +.I security.capability +extended attributes. +Such an attribute is automatically created in the circumstances described +in +.I File capability extended attribute versioning +above. +When a version 3 +.I security.capability +extended attribute is created, +the kernel records not just the capability masks in the extended attribute, +but also the namespace root user ID. +.PP +As with a binary that has +.B VFS_CAP_REVISION_2 +file capabilities, a binary with +.B VFS_CAP_REVISION_3 +file capabilities confers capabilities to a process during +.BR execve (). +However, capabilities are conferred only if the binary is executed by +a process that resides in a user namespace whose +UID 0 maps to the root user ID that is saved in the extended attribute, +or when executed by a process that resides in a descendant of such a namespace. +.\" +.\" +.SS Interaction with user namespaces +For further information on the interaction of +capabilities and user namespaces, see +.BR user_namespaces (7). +.SH STANDARDS +No standards govern capabilities, but the Linux capability implementation +is based on the withdrawn +.UR https://archive.org\:/details\:/posix_1003.1e\-990310 +POSIX.1e draft standard +.UE . +.SH NOTES +When attempting to +.BR strace (1) +binaries that have capabilities (or set-user-ID-root binaries), +you may find the +.I \-u <username> +option useful. +Something like: +.PP +.in +4n +.EX +$ \fBsudo strace \-o trace.log \-u ceci ./myprivprog\fP +.EE +.in +.PP +From Linux 2.5.27 to Linux 2.6.26, +.\" commit 5915eb53861c5776cfec33ca4fcc1fd20d66dd27 removed +.\" CONFIG_SECURITY_CAPABILITIES +capabilities were an optional kernel component, +and could be enabled/disabled via the +.B CONFIG_SECURITY_CAPABILITIES +kernel configuration option. +.PP +The +.IR /proc/ pid /task/TID/status +file can be used to view the capability sets of a thread. +The +.IR /proc/ pid /status +file shows the capability sets of a process's main thread. +Before Linux 3.8, nonexistent capabilities were shown as being +enabled (1) in these sets. +Since Linux 3.8, +.\" 7b9a7ec565505699f503b4fcf61500dceb36e744 +all nonexistent capabilities (above +.BR CAP_LAST_CAP ) +are shown as disabled (0). +.PP +The +.I libcap +package provides a suite of routines for setting and +getting capabilities that is more comfortable and less likely +to change than the interface provided by +.BR capset (2) +and +.BR capget (2). +This package also provides the +.BR setcap (8) +and +.BR getcap (8) +programs. +It can be found at +.br +.UR https://git.kernel.org\:/pub\:/scm\:/libs\:/libcap\:/libcap.git\:/refs/ +.UE . +.PP +Before Linux 2.6.24, and from Linux 2.6.24 to Linux 2.6.32 if +file capabilities are not enabled, a thread with the +.B CAP_SETPCAP +capability can manipulate the capabilities of threads other than itself. +However, this is only theoretically possible, +since no thread ever has +.B CAP_SETPCAP +in either of these cases: +.IP \[bu] 3 +In the pre-2.6.25 implementation the system-wide capability bounding set, +.IR /proc/sys/kernel/cap\-bound , +always masks out the +.B CAP_SETPCAP +capability, and this can not be changed +without modifying the kernel source and rebuilding the kernel. +.IP \[bu] +If file capabilities are disabled (i.e., the kernel +.B CONFIG_SECURITY_FILE_CAPABILITIES +option is disabled), then +.B init +starts out with the +.B CAP_SETPCAP +capability removed from its per-process bounding +set, and that bounding set is inherited by all other processes +created on the system. +.SH SEE ALSO +.BR capsh (1), +.BR setpriv (1), +.BR prctl (2), +.BR setfsuid (2), +.BR cap_clear (3), +.BR cap_copy_ext (3), +.BR cap_from_text (3), +.BR cap_get_file (3), +.BR cap_get_proc (3), +.BR cap_init (3), +.BR capgetp (3), +.BR capsetp (3), +.BR libcap (3), +.BR proc (5), +.BR credentials (7), +.BR pthreads (7), +.BR user_namespaces (7), +.BR captest (8), \" from libcap-ng +.BR filecap (8), \" from libcap-ng +.BR getcap (8), +.BR getpcaps (8), +.BR netcap (8), \" from libcap-ng +.BR pscap (8), \" from libcap-ng +.BR setcap (8) +.PP +.I include/linux/capability.h +in the Linux kernel source tree |