diff options
Diffstat (limited to 'man2/ptrace.2')
-rw-r--r-- | man2/ptrace.2 | 2974 |
1 files changed, 2974 insertions, 0 deletions
diff --git a/man2/ptrace.2 b/man2/ptrace.2 new file mode 100644 index 0000000..4149a32 --- /dev/null +++ b/man2/ptrace.2 @@ -0,0 +1,2974 @@ +.\" Copyright (c) 1993 Michael Haardt <michael@moria.de> +.\" Fri Apr 2 11:32:09 MET DST 1993 +.\" +.\" and changes Copyright (C) 1999 Mike Coleman (mkc@acm.org) +.\" -- major revision to fully document ptrace semantics per recent Linux +.\" kernel (2.2.10) and glibc (2.1.2) +.\" Sun Nov 7 03:18:35 CST 1999 +.\" +.\" and Copyright (c) 2011, Denys Vlasenko <vda.linux@googlemail.com> +.\" and Copyright (c) 2015, 2016, Michael Kerrisk <mtk.manpages@gmail.com> +.\" +.\" SPDX-License-Identifier: GPL-2.0-or-later +.\" +.\" Modified Fri Jul 23 23:47:18 1993 by Rik Faith <faith@cs.unc.edu> +.\" Modified Fri Jan 31 16:46:30 1997 by Eric S. Raymond <esr@thyrsus.com> +.\" Modified Thu Oct 7 17:28:49 1999 by Andries Brouwer <aeb@cwi.nl> +.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com> +.\" Added notes on capability requirements +.\" +.\" 2006-03-24, Chuck Ebbert <76306.1226@compuserve.com> +.\" Added PTRACE_SETOPTIONS, PTRACE_GETEVENTMSG, PTRACE_GETSIGINFO, +.\" PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP +.\" (Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.) +.\" 2011-09, major update by Denys Vlasenko <vda.linux@googlemail.com> +.\" 2015-01, Kees Cook <keescook@chromium.org> +.\" Added PTRACE_O_TRACESECCOMP, PTRACE_EVENT_SECCOMP +.\" +.\" FIXME The following are undocumented: +.\" +.\" PTRACE_GETWMMXREGS +.\" PTRACE_SETWMMXREGS +.\" ARM +.\" Linux 2.6.12 +.\" +.\" PTRACE_SET_SYSCALL +.\" ARM and ARM64 +.\" Linux 2.6.16 +.\" commit 3f471126ee53feb5e9b210ea2f525ed3bb9b7a7f +.\" Author: Nicolas Pitre <nico@cam.org> +.\" Date: Sat Jan 14 19:30:04 2006 +0000 +.\" +.\" PTRACE_GETCRUNCHREGS +.\" PTRACE_SETCRUNCHREGS +.\" ARM +.\" Linux 2.6.18 +.\" commit 3bec6ded282b331552587267d67a06ed7fd95ddd +.\" Author: Lennert Buytenhek <buytenh@wantstofly.org> +.\" Date: Tue Jun 27 22:56:18 2006 +0100 +.\" +.\" PTRACE_GETVFPREGS +.\" PTRACE_SETVFPREGS +.\" ARM and ARM64 +.\" Linux 2.6.30 +.\" commit 3d1228ead618b88e8606015cbabc49019981805d +.\" Author: Catalin Marinas <catalin.marinas@arm.com> +.\" Date: Wed Feb 11 13:12:56 2009 +0100 +.\" +.\" PTRACE_GETHBPREGS +.\" PTRACE_SETHBPREGS +.\" ARM and ARM64 +.\" Linux 2.6.37 +.\" commit 864232fa1a2f8dfe003438ef0851a56722740f3e +.\" Author: Will Deacon <will.deacon@arm.com> +.\" Date: Fri Sep 3 10:42:55 2010 +0100 +.\" +.\" PTRACE_SINGLEBLOCK +.\" Since at least Linux 2.4.0 on various architectures +.\" Since Linux 2.6.25 on x86 (and others?) +.\" commit 5b88abbf770a0e1975c668743100f42934f385e8 +.\" Author: Roland McGrath <roland@redhat.com> +.\" Date: Wed Jan 30 13:30:53 2008 +0100 +.\" ptrace: generic PTRACE_SINGLEBLOCK +.\" +.\" PTRACE_GETFPXREGS +.\" PTRACE_SETFPXREGS +.\" Since at least Linux 2.4.0 on various architectures +.\" +.\" PTRACE_GETFDPIC +.\" PTRACE_GETFDPIC_EXEC +.\" PTRACE_GETFDPIC_INTERP +.\" blackfin, c6x, frv, sh +.\" First appearance in Linux 2.6.11 on frv +.\" +.\" and others that can be found in the arch/*/include/uapi/asm/ptrace files +.\" +.TH ptrace 2 2023-03-30 "Linux man-pages 6.05.01" +.SH NAME +ptrace \- process trace +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.B #include <sys/ptrace.h> +.PP +.BI "long ptrace(enum __ptrace_request " request ", pid_t " pid , +.BI " void *" addr ", void *" data ); +.fi +.SH DESCRIPTION +The +.BR ptrace () +system call provides a means by which one process (the "tracer") +may observe and control the execution of another process (the "tracee"), +and examine and change the tracee's memory and registers. +It is primarily used to implement breakpoint debugging and system +call tracing. +.PP +A tracee first needs to be attached to the tracer. +Attachment and subsequent commands are per thread: +in a multithreaded process, +every thread can be individually attached to a +(potentially different) tracer, +or left not attached and thus not debugged. +Therefore, "tracee" always means "(one) thread", +never "a (possibly multithreaded) process". +Ptrace commands are always sent to +a specific tracee using a call of the form +.PP +.in +4n +.EX +ptrace(PTRACE_foo, pid, ...) +.EE +.in +.PP +where +.I pid +is the thread ID of the corresponding Linux thread. +.PP +(Note that in this page, a "multithreaded process" +means a thread group consisting of threads created using the +.BR clone (2) +.B CLONE_THREAD +flag.) +.PP +A process can initiate a trace by calling +.BR fork (2) +and having the resulting child do a +.BR PTRACE_TRACEME , +followed (typically) by an +.BR execve (2). +Alternatively, one process may commence tracing another process using +.B PTRACE_ATTACH +or +.BR PTRACE_SEIZE . +.PP +While being traced, the tracee will stop each time a signal is delivered, +even if the signal is being ignored. +(An exception is +.BR SIGKILL , +which has its usual effect.) +The tracer will be notified at its next call to +.BR waitpid (2) +(or one of the related "wait" system calls); that call will return a +.I status +value containing information that indicates +the cause of the stop in the tracee. +While the tracee is stopped, +the tracer can use various ptrace requests to inspect and modify the tracee. +The tracer then causes the tracee to continue, +optionally ignoring the delivered signal +(or even delivering a different signal instead). +.PP +If the +.B PTRACE_O_TRACEEXEC +option is not in effect, all successful calls to +.BR execve (2) +by the traced process will cause it to be sent a +.B SIGTRAP +signal, +giving the parent a chance to gain control before the new program +begins execution. +.PP +When the tracer is finished tracing, it can cause the tracee to continue +executing in a normal, untraced mode via +.BR PTRACE_DETACH . +.PP +The value of +.I request +determines the action to be performed: +.TP +.B PTRACE_TRACEME +Indicate that this process is to be traced by its parent. +A process probably shouldn't make this request if its parent +isn't expecting to trace it. +.RI ( pid , +.IR addr , +and +.I data +are ignored.) +.IP +The +.B PTRACE_TRACEME +request is used only by the tracee; +the remaining requests are used only by the tracer. +In the following requests, +.I pid +specifies the thread ID of the tracee to be acted on. +For requests other than +.BR PTRACE_ATTACH , +.BR PTRACE_SEIZE , +.BR PTRACE_INTERRUPT , +and +.BR PTRACE_KILL , +the tracee must be stopped. +.TP +.BR PTRACE_PEEKTEXT ", " PTRACE_PEEKDATA +Read a word at the address +.I addr +in the tracee's memory, returning the word as the result of the +.BR ptrace () +call. +Linux does not have separate text and data address spaces, +so these two requests are currently equivalent. +.RI ( data +is ignored; but see NOTES.) +.TP +.B PTRACE_PEEKUSER +.\" PTRACE_PEEKUSR in kernel source, but glibc uses PTRACE_PEEKUSER, +.\" and that is the name that seems common on other systems. +Read a word at offset +.I addr +in the tracee's USER area, +which holds the registers and other information about the process +(see +.IR <sys/user.h> ). +The word is returned as the result of the +.BR ptrace () +call. +Typically, the offset must be word-aligned, though this might vary by +architecture. +See NOTES. +.RI ( data +is ignored; but see NOTES.) +.TP +.BR PTRACE_POKETEXT ", " PTRACE_POKEDATA +Copy the word +.I data +to the address +.I addr +in the tracee's memory. +As for +.B PTRACE_PEEKTEXT +and +.BR PTRACE_PEEKDATA , +these two requests are currently equivalent. +.TP +.B PTRACE_POKEUSER +.\" PTRACE_POKEUSR in kernel source, but glibc uses PTRACE_POKEUSER, +.\" and that is the name that seems common on other systems. +Copy the word +.I data +to offset +.I addr +in the tracee's USER area. +As for +.BR PTRACE_PEEKUSER , +the offset must typically be word-aligned. +In order to maintain the integrity of the kernel, +some modifications to the USER area are disallowed. +.\" FIXME In the preceding sentence, which modifications are disallowed, +.\" and when they are disallowed, how does user space discover that fact? +.TP +.BR PTRACE_GETREGS ", " PTRACE_GETFPREGS +Copy the tracee's general-purpose or floating-point registers, +respectively, to the address +.I data +in the tracer. +See +.I <sys/user.h> +for information on the format of this data. +.RI ( addr +is ignored.) +Note that SPARC systems have the meaning of +.I data +and +.I addr +reversed; that is, +.I data +is ignored and the registers are copied to the address +.IR addr . +.B PTRACE_GETREGS +and +.B PTRACE_GETFPREGS +are not present on all architectures. +.TP +.BR PTRACE_GETREGSET " (since Linux 2.6.34)" +Read the tracee's registers. +.I addr +specifies, in an architecture-dependent way, the type of registers to be read. +.B NT_PRSTATUS +(with numerical value 1) +usually results in reading of general-purpose registers. +If the CPU has, for example, +floating-point and/or vector registers, they can be retrieved by setting +.I addr +to the corresponding +.B NT_foo +constant. +.I data +points to a +.BR "struct iovec" , +which describes the destination buffer's location and length. +On return, the kernel modifies +.B iov.len +to indicate the actual number of bytes returned. +.TP +.BR PTRACE_SETREGS ", " PTRACE_SETFPREGS +Modify the tracee's general-purpose or floating-point registers, +respectively, from the address +.I data +in the tracer. +As for +.BR PTRACE_POKEUSER , +some general-purpose register modifications may be disallowed. +.\" FIXME . In the preceding sentence, which modifications are disallowed, +.\" and when they are disallowed, how does user space discover that fact? +.RI ( addr +is ignored.) +Note that SPARC systems have the meaning of +.I data +and +.I addr +reversed; that is, +.I data +is ignored and the registers are copied from the address +.IR addr . +.B PTRACE_SETREGS +and +.B PTRACE_SETFPREGS +are not present on all architectures. +.TP +.BR PTRACE_SETREGSET " (since Linux 2.6.34)" +Modify the tracee's registers. +The meaning of +.I addr +and +.I data +is analogous to +.BR PTRACE_GETREGSET . +.TP +.BR PTRACE_GETSIGINFO " (since Linux 2.3.99-pre6)" +Retrieve information about the signal that caused the stop. +Copy a +.I siginfo_t +structure (see +.BR sigaction (2)) +from the tracee to the address +.I data +in the tracer. +.RI ( addr +is ignored.) +.TP +.BR PTRACE_SETSIGINFO " (since Linux 2.3.99-pre6)" +Set signal information: +copy a +.I siginfo_t +structure from the address +.I data +in the tracer to the tracee. +This will affect only signals that would normally be delivered to +the tracee and were caught by the tracer. +It may be difficult to tell +these normal signals from synthetic signals generated by +.BR ptrace () +itself. +.RI ( addr +is ignored.) +.TP +.BR PTRACE_PEEKSIGINFO " (since Linux 3.10)" +.\" commit 84c751bd4aebbaae995fe32279d3dba48327bad4 +Retrieve +.I siginfo_t +structures without removing signals from a queue. +.I addr +points to a +.I ptrace_peeksiginfo_args +structure that specifies the ordinal position from which +copying of signals should start, +and the number of signals to copy. +.I siginfo_t +structures are copied into the buffer pointed to by +.IR data . +The return value contains the number of copied signals (zero indicates +that there is no signal corresponding to the specified ordinal position). +Within the returned +.I siginfo +structures, +the +.I si_code +field includes information +.RB ( __SI_CHLD , +.BR __SI_FAULT , +etc.) that are not otherwise exposed to user space. +.PP +.in +4n +.EX +struct ptrace_peeksiginfo_args { + u64 off; /* Ordinal position in queue at which + to start copying signals */ + u32 flags; /* PTRACE_PEEKSIGINFO_SHARED or 0 */ + s32 nr; /* Number of signals to copy */ +}; +.EE +.in +.IP +Currently, there is only one flag, +.BR PTRACE_PEEKSIGINFO_SHARED , +for dumping signals from the process-wide signal queue. +If this flag is not set, +signals are read from the per-thread queue of the specified thread. +.in +.TP +.BR PTRACE_GETSIGMASK " (since Linux 3.11)" +.\" commit 29000caecbe87b6b66f144f72111f0d02fbbf0c1 +Place a copy of the mask of blocked signals (see +.BR sigprocmask (2)) +in the buffer pointed to by +.IR data , +which should be a pointer to a buffer of type +.IR sigset_t . +The +.I addr +argument contains the size of the buffer pointed to by +.I data +(i.e., +.IR sizeof(sigset_t) ). +.TP +.BR PTRACE_SETSIGMASK " (since Linux 3.11)" +Change the mask of blocked signals (see +.BR sigprocmask (2)) +to the value specified in the buffer pointed to by +.IR data , +which should be a pointer to a buffer of type +.IR sigset_t . +The +.I addr +argument contains the size of the buffer pointed to by +.I data +(i.e., +.IR sizeof(sigset_t) ). +.TP +.BR PTRACE_SETOPTIONS " (since Linux 2.4.6; see BUGS for caveats)" +Set ptrace options from +.IR data . +.RI ( addr +is ignored.) +.I data +is interpreted as a bit mask of options, +which are specified by the following flags: +.RS +.TP +.BR PTRACE_O_EXITKILL " (since Linux 3.8)" +.\" commit 992fb6e170639b0849bace8e49bf31bd37c4123 +Send a +.B SIGKILL +signal to the tracee if the tracer exits. +This option is useful for ptrace jailers that +want to ensure that tracees can never escape the tracer's control. +.TP +.BR PTRACE_O_TRACECLONE " (since Linux 2.5.46)" +Stop the tracee at the next +.BR clone (2) +and automatically start tracing the newly cloned process, +which will start with a +.BR SIGSTOP , +or +.B PTRACE_EVENT_STOP +if +.B PTRACE_SEIZE +was used. +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_CLONE<<8)) +.fi +.IP +The PID of the new process can be retrieved with +.BR PTRACE_GETEVENTMSG . +.IP +This option may not catch +.BR clone (2) +calls in all cases. +If the tracee calls +.BR clone (2) +with the +.B CLONE_VFORK +flag, +.B PTRACE_EVENT_VFORK +will be delivered instead +if +.B PTRACE_O_TRACEVFORK +is set; otherwise if the tracee calls +.BR clone (2) +with the exit signal set to +.BR SIGCHLD , +.B PTRACE_EVENT_FORK +will be delivered if +.B PTRACE_O_TRACEFORK +is set. +.TP +.BR PTRACE_O_TRACEEXEC " (since Linux 2.5.46)" +Stop the tracee at the next +.BR execve (2). +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) +.fi +.IP +If the execing thread is not a thread group leader, +the thread ID is reset to thread group leader's ID before this stop. +Since Linux 3.0, the former thread ID can be retrieved with +.BR PTRACE_GETEVENTMSG . +.TP +.BR PTRACE_O_TRACEEXIT " (since Linux 2.5.60)" +Stop the tracee at exit. +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_EXIT<<8)) +.fi +.IP +The tracee's exit status can be retrieved with +.BR PTRACE_GETEVENTMSG . +.IP +The tracee is stopped early during process exit, +when registers are still available, +allowing the tracer to see where the exit occurred, +whereas the normal exit notification is done after the process +is finished exiting. +Even though context is available, +the tracer cannot prevent the exit from happening at this point. +.TP +.BR PTRACE_O_TRACEFORK " (since Linux 2.5.46)" +Stop the tracee at the next +.BR fork (2) +and automatically start tracing the newly forked process, +which will start with a +.BR SIGSTOP , +or +.B PTRACE_EVENT_STOP +if +.B PTRACE_SEIZE +was used. +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_FORK<<8)) +.fi +.IP +The PID of the new process can be retrieved with +.BR PTRACE_GETEVENTMSG . +.TP +.BR PTRACE_O_TRACESYSGOOD " (since Linux 2.4.6)" +When delivering system call traps, set bit 7 in the signal number +(i.e., deliver +.IR "SIGTRAP|0x80" ). +This makes it easy for the tracer to distinguish +normal traps from those caused by a system call. +.TP +.BR PTRACE_O_TRACEVFORK " (since Linux 2.5.46)" +Stop the tracee at the next +.BR vfork (2) +and automatically start tracing the newly vforked process, +which will start with a +.BR SIGSTOP , +or +.B PTRACE_EVENT_STOP +if +.B PTRACE_SEIZE +was used. +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_VFORK<<8)) +.fi +.IP +The PID of the new process can be retrieved with +.BR PTRACE_GETEVENTMSG . +.TP +.BR PTRACE_O_TRACEVFORKDONE " (since Linux 2.5.60)" +Stop the tracee at the completion of the next +.BR vfork (2). +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_VFORK_DONE<<8)) +.fi +.IP +The PID of the new process can (since Linux 2.6.18) be retrieved with +.BR PTRACE_GETEVENTMSG . +.TP +.BR PTRACE_O_TRACESECCOMP " (since Linux 3.5)" +Stop the tracee when a +.BR seccomp (2) +.B SECCOMP_RET_TRACE +rule is triggered. +A +.BR waitpid (2) +by the tracer will return a +.I status +value such that +.IP +.nf + status>>8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP<<8)) +.fi +.IP +While this triggers a +.B PTRACE_EVENT +stop, it is similar to a syscall-enter-stop. +For details, see the note on +.B PTRACE_EVENT_SECCOMP +below. +The seccomp event message data (from the +.B SECCOMP_RET_DATA +portion of the seccomp filter rule) can be retrieved with +.BR PTRACE_GETEVENTMSG . +.TP +.BR PTRACE_O_SUSPEND_SECCOMP " (since Linux 4.3)" +.\" commit 13c4a90119d28cfcb6b5bdd820c233b86c2b0237 +Suspend the tracee's seccomp protections. +This applies regardless of mode, and +can be used when the tracee has not yet installed seccomp filters. +That is, a valid use case is to suspend a tracee's seccomp protections +before they are installed by the tracee, +let the tracee install the filters, +and then clear this flag when the filters should be resumed. +Setting this option requires that the tracer have the +.B CAP_SYS_ADMIN +capability, +not have any seccomp protections installed, and not have +.B PTRACE_O_SUSPEND_SECCOMP +set on itself. +.RE +.TP +.BR PTRACE_GETEVENTMSG " (since Linux 2.5.46)" +Retrieve a message (as an +.IR "unsigned long" ) +about the ptrace event +that just happened, placing it at the address +.I data +in the tracer. +For +.BR PTRACE_EVENT_EXIT , +this is the tracee's exit status. +For +.BR PTRACE_EVENT_FORK , +.BR PTRACE_EVENT_VFORK , +.BR PTRACE_EVENT_VFORK_DONE , +and +.BR PTRACE_EVENT_CLONE , +this is the PID of the new process. +For +.BR PTRACE_EVENT_SECCOMP , +this is the +.BR seccomp (2) +filter's +.B SECCOMP_RET_DATA +associated with the triggered rule. +.RI ( addr +is ignored.) +.TP +.B PTRACE_CONT +Restart the stopped tracee process. +If +.I data +is nonzero, +it is interpreted as the number of a signal to be delivered to the tracee; +otherwise, no signal is delivered. +Thus, for example, the tracer can control +whether a signal sent to the tracee is delivered or not. +.RI ( addr +is ignored.) +.TP +.BR PTRACE_SYSCALL ", " PTRACE_SINGLESTEP +Restart the stopped tracee as for +.BR PTRACE_CONT , +but arrange for the tracee to be stopped at +the next entry to or exit from a system call, +or after execution of a single instruction, respectively. +(The tracee will also, as usual, be stopped upon receipt of a signal.) +From the tracer's perspective, the tracee will appear to have been +stopped by receipt of a +.BR SIGTRAP . +So, for +.BR PTRACE_SYSCALL , +for example, the idea is to inspect +the arguments to the system call at the first stop, +then do another +.B PTRACE_SYSCALL +and inspect the return value of the system call at the second stop. +The +.I data +argument is treated as for +.BR PTRACE_CONT . +.RI ( addr +is ignored.) +.TP +.BR PTRACE_SET_SYSCALL " (since Linux 2.6.16)" +.\" commit 3f471126ee53feb5e9b210ea2f525ed3bb9b7a7f +When in syscall-enter-stop, +change the number of the system call that is about to +be executed to the number specified in the +.I data +argument. +The +.I addr +argument is ignored. +This request is currently +.\" As of 4.19-rc2 +supported only on arm (and arm64, though only for backwards compatibility), +.\" commit 27aa55c5e5123fa8b8ad0156559d34d7edff58ca +but most other architectures have other means of accomplishing this +(usually by changing the register that the userland code passed the +system call number in). +.\" see change_syscall in tools/testing/selftests/seccomp/seccomp_bpf.c +.\" and also strace's linux/*/set_scno.c files. +.TP +.BR PTRACE_SYSEMU ", " PTRACE_SYSEMU_SINGLESTEP " (since Linux 2.6.14)" +For +.BR PTRACE_SYSEMU , +continue and stop on entry to the next system call, +which will not be executed. +See the documentation on syscall-stops below. +For +.BR PTRACE_SYSEMU_SINGLESTEP , +do the same but also singlestep if not a system call. +This call is used by programs like +User Mode Linux that want to emulate all the tracee's system calls. +The +.I data +argument is treated as for +.BR PTRACE_CONT . +The +.I addr +argument is ignored. +These requests are currently +.\" As at 3.7 +supported only on x86. +.TP +.BR PTRACE_LISTEN " (since Linux 3.4)" +Restart the stopped tracee, but prevent it from executing. +The resulting state of the tracee is similar to a process which +has been stopped by a +.B SIGSTOP +(or other stopping signal). +See the "group-stop" subsection for additional information. +.B PTRACE_LISTEN +works only on tracees attached by +.BR PTRACE_SEIZE . +.TP +.B PTRACE_KILL +Send the tracee a +.B SIGKILL +to terminate it. +.RI ( addr +and +.I data +are ignored.) +.IP +.I This operation is deprecated; do not use it! +Instead, send a +.B SIGKILL +directly using +.BR kill (2) +or +.BR tgkill (2). +The problem with +.B PTRACE_KILL +is that it requires the tracee to be in signal-delivery-stop, +otherwise it may not work +(i.e., may complete successfully but won't kill the tracee). +By contrast, sending a +.B SIGKILL +directly has no such limitation. +.\" [Note from Denys Vlasenko: +.\" deprecation suggested by Oleg Nesterov. He prefers to deprecate it +.\" instead of describing (and needing to support) PTRACE_KILL's quirks.] +.TP +.BR PTRACE_INTERRUPT " (since Linux 3.4)" +Stop a tracee. +If the tracee is running or sleeping in kernel space and +.B PTRACE_SYSCALL +is in effect, +the system call is interrupted and syscall-exit-stop is reported. +(The interrupted system call is restarted when the tracee is restarted.) +If the tracee was already stopped by a signal and +.B PTRACE_LISTEN +was sent to it, +the tracee stops with +.B PTRACE_EVENT_STOP +and +.I WSTOPSIG(status) +returns the stop signal. +If any other ptrace-stop is generated at the same time (for example, +if a signal is sent to the tracee), this ptrace-stop happens. +If none of the above applies (for example, if the tracee is running in user +space), it stops with +.B PTRACE_EVENT_STOP +with +.I WSTOPSIG(status) +== +.BR SIGTRAP . +.B PTRACE_INTERRUPT +only works on tracees attached by +.BR PTRACE_SEIZE . +.TP +.B PTRACE_ATTACH +Attach to the process specified in +.IR pid , +making it a tracee of the calling process. +.\" No longer true (removed by Denys Vlasenko, 2011, who remarks: +.\" "I think it isn't true in non-ancient 2.4 and in Linux 2.6/3.x. +.\" Basically, it's not true for any Linux in practical use. +.\" ; the behavior of the tracee is as if it had done a +.\" .BR PTRACE_TRACEME . +.\" The calling process actually becomes the parent of the tracee +.\" process for most purposes (e.g., it will receive +.\" notification of tracee events and appears in +.\" .BR ps (1) +.\" output as the tracee's parent), but a +.\" .BR getppid (2) +.\" by the tracee will still return the PID of the original parent. +The tracee is sent a +.BR SIGSTOP , +but will not necessarily have stopped +by the completion of this call; use +.BR waitpid (2) +to wait for the tracee to stop. +See the "Attaching and detaching" subsection for additional information. +.RI ( addr +and +.I data +are ignored.) +.IP +Permission to perform a +.B PTRACE_ATTACH +is governed by a ptrace access mode +.B PTRACE_MODE_ATTACH_REALCREDS +check; see below. +.TP +.BR PTRACE_SEIZE " (since Linux 3.4)" +.\" +.\" Noted by Dmitry Levin: +.\" +.\" PTRACE_SEIZE was introduced by commit v3.1-rc1~308^2~28, but +.\" it had to be used along with a temporary flag PTRACE_SEIZE_DEVEL, +.\" which was removed later by commit v3.4-rc1~109^2~20. +.\" +.\" That is, [before] v3.4 we had a test mode of PTRACE_SEIZE API, +.\" which was not compatible with the current PTRACE_SEIZE API introduced +.\" in Linux 3.4. +.\" +Attach to the process specified in +.IR pid , +making it a tracee of the calling process. +Unlike +.BR PTRACE_ATTACH , +.B PTRACE_SEIZE +does not stop the process. +Group-stops are reported as +.B PTRACE_EVENT_STOP +and +.I WSTOPSIG(status) +returns the stop signal. +Automatically attached children stop with +.B PTRACE_EVENT_STOP +and +.I WSTOPSIG(status) +returns +.B SIGTRAP +instead of having +.B SIGSTOP +signal delivered to them. +.BR execve (2) +does not deliver an extra +.BR SIGTRAP . +Only a +.BR PTRACE_SEIZE d +process can accept +.B PTRACE_INTERRUPT +and +.B PTRACE_LISTEN +commands. +The "seized" behavior just described is inherited by +children that are automatically attached using +.BR PTRACE_O_TRACEFORK , +.BR PTRACE_O_TRACEVFORK , +and +.BR PTRACE_O_TRACECLONE . +.I addr +must be zero. +.I data +contains a bit mask of ptrace options to activate immediately. +.IP +Permission to perform a +.B PTRACE_SEIZE +is governed by a ptrace access mode +.B PTRACE_MODE_ATTACH_REALCREDS +check; see below. +.\" +.TP +.BR PTRACE_SECCOMP_GET_FILTER " (since Linux 4.4)" +.\" commit f8e529ed941ba2bbcbf310b575d968159ce7e895 +This operation allows the tracer to dump the tracee's +classic BPF filters. +.IP +.I addr +is an integer specifying the index of the filter to be dumped. +The most recently installed filter has the index 0. +If +.I addr +is greater than the number of installed filters, +the operation fails with the error +.BR ENOENT . +.IP +.I data +is either a pointer to a +.I struct sock_filter +array that is large enough to store the BPF program, +or NULL if the program is not to be stored. +.IP +Upon success, +the return value is the number of instructions in the BPF program. +If +.I data +was NULL, then this return value can be used to correctly size the +.I struct sock_filter +array passed in a subsequent call. +.IP +This operation fails with the error +.B EACCES +if the caller does not have the +.B CAP_SYS_ADMIN +capability or if the caller is in strict or filter seccomp mode. +If the filter referred to by +.I addr +is not a classic BPF filter, the operation fails with the error +.BR EMEDIUMTYPE . +.IP +This operation is available if the kernel was configured with both the +.B CONFIG_SECCOMP_FILTER +and the +.B CONFIG_CHECKPOINT_RESTORE +options. +.TP +.B PTRACE_DETACH +Restart the stopped tracee as for +.BR PTRACE_CONT , +but first detach from it. +Under Linux, a tracee can be detached in this way regardless +of which method was used to initiate tracing. +.RI ( addr +is ignored.) +.\" +.TP +.BR PTRACE_GET_THREAD_AREA " (since Linux 2.6.0)" +This operation performs a similar task to +.BR get_thread_area (2). +It reads the TLS entry in the GDT whose index is given in +.IR addr , +placing a copy of the entry into the +.I struct user_desc +pointed to by +.IR data . +(By contrast with +.BR get_thread_area (2), +the +.I entry_number +of the +.I struct user_desc +is ignored.) +.TP +.BR PTRACE_SET_THREAD_AREA " (since Linux 2.6.0)" +This operation performs a similar task to +.BR set_thread_area (2). +It sets the TLS entry in the GDT whose index is given in +.IR addr , +assigning it the data supplied in the +.I struct user_desc +pointed to by +.IR data . +(By contrast with +.BR set_thread_area (2), +the +.I entry_number +of the +.I struct user_desc +is ignored; in other words, +this ptrace operation can't be used to allocate a free TLS entry.) +.TP +.BR PTRACE_GET_SYSCALL_INFO " (since Linux 5.3)" +.\" commit 201766a20e30f982ccfe36bebfad9602c3ff574a +Retrieve information about the system call that caused the stop. +The information is placed into the buffer pointed by the +.I data +argument, which should be a pointer to a buffer of type +.IR "struct ptrace_syscall_info" . +The +.I addr +argument contains the size of the buffer pointed to +by the +.I data +argument (i.e., +.IR "sizeof(struct ptrace_syscall_info)" ). +The return value contains the number of bytes available +to be written by the kernel. +If the size of the data to be written by the kernel exceeds the size +specified by the +.I addr +argument, the output data is truncated. +.IP +The +.I ptrace_syscall_info +structure contains the following fields: +.IP +.in +4n +.EX +struct ptrace_syscall_info { + __u8 op; /* Type of system call stop */ + __u32 arch; /* AUDIT_ARCH_* value; see seccomp(2) */ + __u64 instruction_pointer; /* CPU instruction pointer */ + __u64 stack_pointer; /* CPU stack pointer */ + union { + struct { /* op == PTRACE_SYSCALL_INFO_ENTRY */ + __u64 nr; /* System call number */ + __u64 args[6]; /* System call arguments */ + } entry; + struct { /* op == PTRACE_SYSCALL_INFO_EXIT */ + __s64 rval; /* System call return value */ + __u8 is_error; /* System call error flag; + Boolean: does rval contain + an error value (\-ERRCODE) or + a nonerror return value? */ + } exit; + struct { /* op == PTRACE_SYSCALL_INFO_SECCOMP */ + __u64 nr; /* System call number */ + __u64 args[6]; /* System call arguments */ + __u32 ret_data; /* SECCOMP_RET_DATA portion + of SECCOMP_RET_TRACE + return value */ + } seccomp; + }; +}; +.EE +.in +.IP +The +.IR op , +.IR arch , +.IR instruction_pointer , +and +.I stack_pointer +fields are defined for all kinds of ptrace system call stops. +The rest of the structure is a union; one should read only those fields +that are meaningful for the kind of system call stop specified by the +.I op +field. +.IP +The +.I op +field has one of the following values (defined in +.IR <linux/ptrace.h> ) +indicating what type of stop occurred and +which part of the union is filled: +.RS +.TP +.B PTRACE_SYSCALL_INFO_ENTRY +The +.I entry +component of the union contains information relating to a +system call entry stop. +.TP +.B PTRACE_SYSCALL_INFO_EXIT +The +.I exit +component of the union contains information relating to a +system call exit stop. +.TP +.B PTRACE_SYSCALL_INFO_SECCOMP +The +.I seccomp +component of the union contains information relating to a +.B PTRACE_EVENT_SECCOMP +stop. +.TP +.B PTRACE_SYSCALL_INFO_NONE +No component of the union contains relevant information. +.RE +.IP +In case of system call entry or exit stops, +the data returned by +.B PTRACE_GET_SYSCALL_INFO +is limited to type +.B PTRACE_SYSCALL_INFO_NONE +unless +.B PTRACE_O_TRACESYSGOOD +option is set before the corresponding system call stop has occurred. +.\" +.SS Death under ptrace +When a (possibly multithreaded) process receives a killing signal +(one whose disposition is set to +.B SIG_DFL +and whose default action is to kill the process), +all threads exit. +Tracees report their death to their tracer(s). +Notification of this event is delivered via +.BR waitpid (2). +.PP +Note that the killing signal will first cause signal-delivery-stop +(on one tracee only), +and only after it is injected by the tracer +(or after it was dispatched to a thread which isn't traced), +will death from the signal happen on +.I all +tracees within a multithreaded process. +(The term "signal-delivery-stop" is explained below.) +.PP +.B SIGKILL +does not generate signal-delivery-stop and +therefore the tracer can't suppress it. +.B SIGKILL +kills even within system calls +(syscall-exit-stop is not generated prior to death by +.BR SIGKILL ). +The net effect is that +.B SIGKILL +always kills the process (all its threads), +even if some threads of the process are ptraced. +.PP +When the tracee calls +.BR _exit (2), +it reports its death to its tracer. +Other threads are not affected. +.PP +When any thread executes +.BR exit_group (2), +every tracee in its thread group reports its death to its tracer. +.PP +If the +.B PTRACE_O_TRACEEXIT +option is on, +.B PTRACE_EVENT_EXIT +will happen before actual death. +This applies to exits via +.BR exit (2), +.BR exit_group (2), +and signal deaths (except +.BR SIGKILL , +depending on the kernel version; see BUGS below), +and when threads are torn down on +.BR execve (2) +in a multithreaded process. +.PP +The tracer cannot assume that the ptrace-stopped tracee exists. +There are many scenarios when the tracee may die while stopped (such as +.BR SIGKILL ). +Therefore, the tracer must be prepared to handle an +.B ESRCH +error on any ptrace operation. +Unfortunately, the same error is returned if the tracee +exists but is not ptrace-stopped +(for commands which require a stopped tracee), +or if it is not traced by the process which issued the ptrace call. +The tracer needs to keep track of the stopped/running state of the tracee, +and interpret +.B ESRCH +as "tracee died unexpectedly" only if it knows that the tracee has +been observed to enter ptrace-stop. +Note that there is no guarantee that +.I waitpid(WNOHANG) +will reliably report the tracee's death status if a +ptrace operation returned +.BR ESRCH . +.I waitpid(WNOHANG) +may return 0 instead. +In other words, the tracee may be "not yet fully dead", +but already refusing ptrace requests. +.PP +The tracer can't assume that the tracee +.I always +ends its life by reporting +.I WIFEXITED(status) +or +.IR WIFSIGNALED(status) ; +there are cases where this does not occur. +For example, if a thread other than thread group leader does an +.BR execve (2), +it disappears; +its PID will never be seen again, +and any subsequent ptrace stops will be reported under +the thread group leader's PID. +.SS Stopped states +A tracee can be in two states: running or stopped. +For the purposes of ptrace, a tracee which is blocked in a system call +(such as +.BR read (2), +.BR pause (2), +etc.) +is nevertheless considered to be running, even if the tracee is blocked +for a long time. +The state of the tracee after +.B PTRACE_LISTEN +is somewhat of a gray area: it is not in any ptrace-stop (ptrace commands +won't work on it, and it will deliver +.BR waitpid (2) +notifications), +but it also may be considered "stopped" because +it is not executing instructions (is not scheduled), and if it was +in group-stop before +.BR PTRACE_LISTEN , +it will not respond to signals until +.B SIGCONT +is received. +.PP +There are many kinds of states when the tracee is stopped, and in ptrace +discussions they are often conflated. +Therefore, it is important to use precise terms. +.PP +In this manual page, any stopped state in which the tracee is ready +to accept ptrace commands from the tracer is called +.IR ptrace-stop . +Ptrace-stops can +be further subdivided into +.IR signal-delivery-stop , +.IR group-stop , +.IR syscall-stop , +.IR "PTRACE_EVENT stops" , +and so on. +These stopped states are described in detail below. +.PP +When the running tracee enters ptrace-stop, it notifies its tracer using +.BR waitpid (2) +(or one of the other "wait" system calls). +Most of this manual page assumes that the tracer waits with: +.PP +.in +4n +.EX +pid = waitpid(pid_or_minus_1, &status, __WALL); +.EE +.in +.PP +Ptrace-stopped tracees are reported as returns with +.I pid +greater than 0 and +.I WIFSTOPPED(status) +true. +.\" Denys Vlasenko: +.\" Do we require __WALL usage, or will just using 0 be ok? (With 0, +.\" I am not 100% sure there aren't ugly corner cases.) Are the +.\" rules different if user wants to use waitid? Will waitid require +.\" WEXITED? +.\" +.PP +The +.B __WALL +flag does not include the +.B WSTOPPED +and +.B WEXITED +flags, but implies their functionality. +.PP +Setting the +.B WCONTINUED +flag when calling +.BR waitpid (2) +is not recommended: the "continued" state is per-process and +consuming it can confuse the real parent of the tracee. +.PP +Use of the +.B WNOHANG +flag may cause +.BR waitpid (2) +to return 0 ("no wait results available yet") +even if the tracer knows there should be a notification. +Example: +.PP +.in +4n +.EX +errno = 0; +ptrace(PTRACE_CONT, pid, 0L, 0L); +if (errno == ESRCH) { + /* tracee is dead */ + r = waitpid(tracee, &status, __WALL | WNOHANG); + /* r can still be 0 here! */ +} +.EE +.in +.\" FIXME . +.\" waitid usage? WNOWAIT? +.\" describe how wait notifications queue (or not queue) +.PP +The following kinds of ptrace-stops exist: signal-delivery-stops, +group-stops, +.B PTRACE_EVENT +stops, syscall-stops. +They all are reported by +.BR waitpid (2) +with +.I WIFSTOPPED(status) +true. +They may be differentiated by examining the value +.IR status>>8 , +and if there is ambiguity in that value, by querying +.BR PTRACE_GETSIGINFO . +(Note: the +.I WSTOPSIG(status) +macro can't be used to perform this examination, +because it returns the value +.IR "(status>>8)\ &\ 0xff" .) +.SS Signal-delivery-stop +When a (possibly multithreaded) process receives any signal except +.BR SIGKILL , +the kernel selects an arbitrary thread which handles the signal. +(If the signal is generated with +.BR tgkill (2), +the target thread can be explicitly selected by the caller.) +If the selected thread is traced, it enters signal-delivery-stop. +At this point, the signal is not yet delivered to the process, +and can be suppressed by the tracer. +If the tracer doesn't suppress the signal, +it passes the signal to the tracee in the next ptrace restart request. +This second step of signal delivery is called +.I "signal injection" +in this manual page. +Note that if the signal is blocked, +signal-delivery-stop doesn't happen until the signal is unblocked, +with the usual exception that +.B SIGSTOP +can't be blocked. +.PP +Signal-delivery-stop is observed by the tracer as +.BR waitpid (2) +returning with +.I WIFSTOPPED(status) +true, with the signal returned by +.IR WSTOPSIG(status) . +If the signal is +.BR SIGTRAP , +this may be a different kind of ptrace-stop; +see the "Syscall-stops" and "execve" sections below for details. +If +.I WSTOPSIG(status) +returns a stopping signal, this may be a group-stop; see below. +.SS Signal injection and suppression +After signal-delivery-stop is observed by the tracer, +the tracer should restart the tracee with the call +.PP +.in +4n +.EX +ptrace(PTRACE_restart, pid, 0, sig) +.EE +.in +.PP +where +.B PTRACE_restart +is one of the restarting ptrace requests. +If +.I sig +is 0, then a signal is not delivered. +Otherwise, the signal +.I sig +is delivered. +This operation is called +.I "signal injection" +in this manual page, to distinguish it from signal-delivery-stop. +.PP +The +.I sig +value may be different from the +.I WSTOPSIG(status) +value: the tracer can cause a different signal to be injected. +.PP +Note that a suppressed signal still causes system calls to return +prematurely. +In this case, system calls will be restarted: the tracer will +observe the tracee to reexecute the interrupted system call (or +.BR restart_syscall (2) +system call for a few system calls which use a different mechanism +for restarting) if the tracer uses +.BR PTRACE_SYSCALL . +Even system calls (such as +.BR poll (2)) +which are not restartable after signal are restarted after +signal is suppressed; +however, kernel bugs exist which cause some system calls to fail with +.B EINTR +even though no observable signal is injected to the tracee. +.PP +Restarting ptrace commands issued in ptrace-stops other than +signal-delivery-stop are not guaranteed to inject a signal, even if +.I sig +is nonzero. +No error is reported; a nonzero +.I sig +may simply be ignored. +Ptrace users should not try to "create a new signal" this way: use +.BR tgkill (2) +instead. +.PP +The fact that signal injection requests may be ignored +when restarting the tracee after +ptrace stops that are not signal-delivery-stops +is a cause of confusion among ptrace users. +One typical scenario is that the tracer observes group-stop, +mistakes it for signal-delivery-stop, restarts the tracee with +.PP +.in +4n +.EX +ptrace(PTRACE_restart, pid, 0, stopsig) +.EE +.in +.PP +with the intention of injecting +.IR stopsig , +but +.I stopsig +gets ignored and the tracee continues to run. +.PP +The +.B SIGCONT +signal has a side effect of waking up (all threads of) +a group-stopped process. +This side effect happens before signal-delivery-stop. +The tracer can't suppress this side effect (it can +only suppress signal injection, which only causes the +.B SIGCONT +handler to not be executed in the tracee, if such a handler is installed). +In fact, waking up from group-stop may be followed by +signal-delivery-stop for signal(s) +.I other than +.BR SIGCONT , +if they were pending when +.B SIGCONT +was delivered. +In other words, +.B SIGCONT +may be not the first signal observed by the tracee after it was sent. +.PP +Stopping signals cause (all threads of) a process to enter group-stop. +This side effect happens after signal injection, and therefore can be +suppressed by the tracer. +.PP +In Linux 2.4 and earlier, the +.B SIGSTOP +signal can't be injected. +.\" In the Linux 2.4 sources, in arch/i386/kernel/signal.c::do_signal(), +.\" there is: +.\" +.\" /* The debugger continued. Ignore SIGSTOP. */ +.\" if (signr == SIGSTOP) +.\" continue; +.PP +.B PTRACE_GETSIGINFO +can be used to retrieve a +.I siginfo_t +structure which corresponds to the delivered signal. +.B PTRACE_SETSIGINFO +may be used to modify it. +If +.B PTRACE_SETSIGINFO +has been used to alter +.IR siginfo_t , +the +.I si_signo +field and the +.I sig +parameter in the restarting command must match, +otherwise the result is undefined. +.SS Group-stop +When a (possibly multithreaded) process receives a stopping signal, +all threads stop. +If some threads are traced, they enter a group-stop. +Note that the stopping signal will first cause signal-delivery-stop +(on one tracee only), and only after it is injected by the tracer +(or after it was dispatched to a thread which isn't traced), +will group-stop be initiated on +.I all +tracees within the multithreaded process. +As usual, every tracee reports its group-stop separately +to the corresponding tracer. +.PP +Group-stop is observed by the tracer as +.BR waitpid (2) +returning with +.I WIFSTOPPED(status) +true, with the stopping signal available via +.IR WSTOPSIG(status) . +The same result is returned by some other classes of ptrace-stops, +therefore the recommended practice is to perform the call +.PP +.in +4n +.EX +ptrace(PTRACE_GETSIGINFO, pid, 0, &siginfo) +.EE +.in +.PP +The call can be avoided if the signal is not +.BR SIGSTOP , +.BR SIGTSTP , +.BR SIGTTIN , +or +.BR SIGTTOU ; +only these four signals are stopping signals. +If the tracer sees something else, it can't be a group-stop. +Otherwise, the tracer needs to call +.BR PTRACE_GETSIGINFO . +If +.B PTRACE_GETSIGINFO +fails with +.BR EINVAL , +then it is definitely a group-stop. +(Other failure codes are possible, such as +.B ESRCH +("no such process") if a +.B SIGKILL +killed the tracee.) +.PP +If tracee was attached using +.BR PTRACE_SEIZE , +group-stop is indicated by +.BR PTRACE_EVENT_STOP : +.IR "status>>16 == PTRACE_EVENT_STOP" . +This allows detection of group-stops +without requiring an extra +.B PTRACE_GETSIGINFO +call. +.PP +As of Linux 2.6.38, +after the tracer sees the tracee ptrace-stop and until it +restarts or kills it, the tracee will not run, +and will not send notifications (except +.B SIGKILL +death) to the tracer, even if the tracer enters into another +.BR waitpid (2) +call. +.PP +The kernel behavior described in the previous paragraph +causes a problem with transparent handling of stopping signals. +If the tracer restarts the tracee after group-stop, +the stopping signal +is effectively ignored\[em]the tracee doesn't remain stopped, it runs. +If the tracer doesn't restart the tracee before entering into the next +.BR waitpid (2), +future +.B SIGCONT +signals will not be reported to the tracer; +this would cause the +.B SIGCONT +signals to have no effect on the tracee. +.PP +Since Linux 3.4, there is a method to overcome this problem: instead of +.BR PTRACE_CONT , +a +.B PTRACE_LISTEN +command can be used to restart a tracee in a way where it does not execute, +but waits for a new event which it can report via +.BR waitpid (2) +(such as when +it is restarted by a +.BR SIGCONT ). +.SS PTRACE_EVENT stops +If the tracer sets +.B PTRACE_O_TRACE_* +options, the tracee will enter ptrace-stops called +.B PTRACE_EVENT +stops. +.PP +.B PTRACE_EVENT +stops are observed by the tracer as +.BR waitpid (2) +returning with +.IR WIFSTOPPED(status) , +and +.I WSTOPSIG(status) +returns +.B SIGTRAP +(or for +.BR PTRACE_EVENT_STOP , +returns the stopping signal if tracee is in a group-stop). +An additional bit is set in the higher byte of the status word: +the value +.I status>>8 +will be +.PP +.in +4n +.EX +((PTRACE_EVENT_foo<<8) | SIGTRAP). +.EE +.in +.PP +The following events exist: +.TP +.B PTRACE_EVENT_VFORK +Stop before return from +.BR vfork (2) +or +.BR clone (2) +with the +.B CLONE_VFORK +flag. +When the tracee is continued after this stop, it will wait for child to +exit/exec before continuing its execution +(in other words, the usual behavior on +.BR vfork (2)). +.TP +.B PTRACE_EVENT_FORK +Stop before return from +.BR fork (2) +or +.BR clone (2) +with the exit signal set to +.BR SIGCHLD . +.TP +.B PTRACE_EVENT_CLONE +Stop before return from +.BR clone (2). +.TP +.B PTRACE_EVENT_VFORK_DONE +Stop before return from +.BR vfork (2) +or +.BR clone (2) +with the +.B CLONE_VFORK +flag, +but after the child unblocked this tracee by exiting or execing. +.PP +For all four stops described above, +the stop occurs in the parent (i.e., the tracee), +not in the newly created thread. +.B PTRACE_GETEVENTMSG +can be used to retrieve the new thread's ID. +.TP +.B PTRACE_EVENT_EXEC +Stop before return from +.BR execve (2). +Since Linux 3.0, +.B PTRACE_GETEVENTMSG +returns the former thread ID. +.TP +.B PTRACE_EVENT_EXIT +Stop before exit (including death from +.BR exit_group (2)), +signal death, or exit caused by +.BR execve (2) +in a multithreaded process. +.B PTRACE_GETEVENTMSG +returns the exit status. +Registers can be examined +(unlike when "real" exit happens). +The tracee is still alive; it needs to be +.BR PTRACE_CONT ed +or +.BR PTRACE_DETACH ed +to finish exiting. +.TP +.B PTRACE_EVENT_STOP +Stop induced by +.B PTRACE_INTERRUPT +command, or group-stop, or initial ptrace-stop when a new child is attached +(only if attached using +.BR PTRACE_SEIZE ). +.TP +.B PTRACE_EVENT_SECCOMP +Stop triggered by a +.BR seccomp (2) +rule on tracee syscall entry when +.B PTRACE_O_TRACESECCOMP +has been set by the tracer. +The seccomp event message data (from the +.B SECCOMP_RET_DATA +portion of the seccomp filter rule) can be retrieved with +.BR PTRACE_GETEVENTMSG . +The semantics of this stop are described in +detail in a separate section below. +.PP +.B PTRACE_GETSIGINFO +on +.B PTRACE_EVENT +stops returns +.B SIGTRAP +in +.IR si_signo , +with +.I si_code +set to +.IR "(event<<8)\ |\ SIGTRAP" . +.SS Syscall-stops +If the tracee was restarted by +.B PTRACE_SYSCALL +or +.BR PTRACE_SYSEMU , +the tracee enters +syscall-enter-stop just prior to entering any system call (which +will not be executed if the restart was using +.BR PTRACE_SYSEMU , +regardless of any change made to registers at this point or how the +tracee is restarted after this stop). +No matter which method caused the syscall-entry-stop, +if the tracer restarts the tracee with +.BR PTRACE_SYSCALL , +the tracee enters syscall-exit-stop when the system call is finished, +or if it is interrupted by a signal. +(That is, signal-delivery-stop never happens between syscall-enter-stop +and syscall-exit-stop; it happens +.I after +syscall-exit-stop.). +If the tracee is continued using any other method (including +.BR PTRACE_SYSEMU ), +no syscall-exit-stop occurs. +Note that all mentions +.B PTRACE_SYSEMU +apply equally to +.BR PTRACE_SYSEMU_SINGLESTEP . +.PP +However, even if the tracee was continued using +.BR PTRACE_SYSCALL , +it is not guaranteed that the next stop will be a syscall-exit-stop. +Other possibilities are that the tracee may stop in a +.B PTRACE_EVENT +stop (including seccomp stops), exit (if it entered +.BR _exit (2) +or +.BR exit_group (2)), +be killed by +.BR SIGKILL , +or die silently (if it is a thread group leader, the +.BR execve (2) +happened in another thread, +and that thread is not traced by the same tracer; +this situation is discussed later). +.PP +Syscall-enter-stop and syscall-exit-stop are observed by the tracer as +.BR waitpid (2) +returning with +.I WIFSTOPPED(status) +true, and +.I WSTOPSIG(status) +giving +.BR SIGTRAP . +If the +.B PTRACE_O_TRACESYSGOOD +option was set by the tracer, then +.I WSTOPSIG(status) +will give the value +.IR "(SIGTRAP\ |\ 0x80)" . +.PP +Syscall-stops can be distinguished from signal-delivery-stop with +.B SIGTRAP +by querying +.B PTRACE_GETSIGINFO +for the following cases: +.TP +.IR si_code " <= 0" +.B SIGTRAP +was delivered as a result of a user-space action, +for example, a system call +.RB ( tgkill (2), +.BR kill (2), +.BR sigqueue (3), +etc.), +expiration of a POSIX timer, +change of state on a POSIX message queue, +or completion of an asynchronous I/O request. +.TP +.IR si_code " == SI_KERNEL (0x80)" +.B SIGTRAP +was sent by the kernel. +.TP +.IR si_code " == SIGTRAP or " si_code " == (SIGTRAP|0x80)" +This is a syscall-stop. +.PP +However, syscall-stops happen very often (twice per system call), +and performing +.B PTRACE_GETSIGINFO +for every syscall-stop may be somewhat expensive. +.PP +Some architectures allow the cases to be distinguished +by examining registers. +For example, on x86, +.I rax +== +.RB \- ENOSYS +in syscall-enter-stop. +Since +.B SIGTRAP +(like any other signal) always happens +.I after +syscall-exit-stop, +and at this point +.I rax +almost never contains +.RB \- ENOSYS , +the +.B SIGTRAP +looks like "syscall-stop which is not syscall-enter-stop"; +in other words, it looks like a +"stray syscall-exit-stop" and can be detected this way. +But such detection is fragile and is best avoided. +.PP +Using the +.B PTRACE_O_TRACESYSGOOD +option is the recommended method to distinguish syscall-stops +from other kinds of ptrace-stops, +since it is reliable and does not incur a performance penalty. +.PP +Syscall-enter-stop and syscall-exit-stop are +indistinguishable from each other by the tracer. +The tracer needs to keep track of the sequence of +ptrace-stops in order to not misinterpret syscall-enter-stop as +syscall-exit-stop or vice versa. +In general, a syscall-enter-stop is +always followed by syscall-exit-stop, +.B PTRACE_EVENT +stop, or the tracee's death; +no other kinds of ptrace-stop can occur in between. +However, note that seccomp stops (see below) can cause syscall-exit-stops, +without preceding syscall-entry-stops. +If seccomp is in use, care needs +to be taken not to misinterpret such stops as syscall-entry-stops. +.PP +If after syscall-enter-stop, +the tracer uses a restarting command other than +.BR PTRACE_SYSCALL , +syscall-exit-stop is not generated. +.PP +.B PTRACE_GETSIGINFO +on syscall-stops returns +.B SIGTRAP +in +.IR si_signo , +with +.I si_code +set to +.B SIGTRAP +or +.IR (SIGTRAP|0x80) . +.\" +.SS PTRACE_EVENT_SECCOMP stops (Linux 3.5 to Linux 4.7) +The behavior of +.B PTRACE_EVENT_SECCOMP +stops and their interaction with other kinds +of ptrace stops has changed between kernel versions. +This documents the behavior +from their introduction until Linux 4.7 (inclusive). +The behavior in later kernel versions is documented in the next section. +.PP +A +.B PTRACE_EVENT_SECCOMP +stop occurs whenever a +.B SECCOMP_RET_TRACE +rule is triggered. +This is independent of which methods was used to restart the system call. +Notably, seccomp still runs even if the tracee was restarted using +.B PTRACE_SYSEMU +and this system call is unconditionally skipped. +.PP +Restarts from this stop will behave as if the stop had occurred right +before the system call in question. +In particular, both +.B PTRACE_SYSCALL +and +.B PTRACE_SYSEMU +will normally cause a subsequent syscall-entry-stop. +However, if after the +.B PTRACE_EVENT_SECCOMP +the system call number is negative, +both the syscall-entry-stop and the system call itself will be skipped. +This means that if the system call number is negative after a +.B PTRACE_EVENT_SECCOMP +and the tracee is restarted using +.BR PTRACE_SYSCALL , +the next observed stop will be a syscall-exit-stop, +rather than the syscall-entry-stop that might have been expected. +.\" +.SS PTRACE_EVENT_SECCOMP stops (since Linux 4.8) +Starting with Linux 4.8, +.\" commit 93e35efb8de45393cf61ed07f7b407629bf698ea +the +.B PTRACE_EVENT_SECCOMP +stop was reordered to occur between syscall-entry-stop and +syscall-exit-stop. +Note that seccomp no longer runs (and no +.B PTRACE_EVENT_SECCOMP +will be reported) if the system call is skipped due to +.BR PTRACE_SYSEMU . +.PP +Functionally, a +.B PTRACE_EVENT_SECCOMP +stop functions comparably +to a syscall-entry-stop (i.e., continuations using +.B PTRACE_SYSCALL +will cause syscall-exit-stops, +the system call number may be changed and any other modified registers +are visible to the to-be-executed system call as well). +Note that there may be, +but need not have been a preceding syscall-entry-stop. +.PP +After a +.B PTRACE_EVENT_SECCOMP +stop, seccomp will be rerun, with a +.B SECCOMP_RET_TRACE +rule now functioning the same as a +.BR SECCOMP_RET_ALLOW . +Specifically, this means that if registers are not modified during the +.B PTRACE_EVENT_SECCOMP +stop, the system call will then be allowed. +.\" +.SS PTRACE_SINGLESTEP stops +[Details of these kinds of stops are yet to be documented.] +.\" +.\" FIXME . +.\" document stops occurring with PTRACE_SINGLESTEP +.\" +.SS Informational and restarting ptrace commands +Most ptrace commands (all except +.BR PTRACE_ATTACH , +.BR PTRACE_SEIZE , +.BR PTRACE_TRACEME , +.BR PTRACE_INTERRUPT , +and +.BR PTRACE_KILL ) +require the tracee to be in a ptrace-stop, otherwise they fail with +.BR ESRCH . +.PP +When the tracee is in ptrace-stop, +the tracer can read and write data to +the tracee using informational commands. +These commands leave the tracee in ptrace-stopped state: +.PP +.in +4n +.EX +ptrace(PTRACE_PEEKTEXT/PEEKDATA/PEEKUSER, pid, addr, 0); +ptrace(PTRACE_POKETEXT/POKEDATA/POKEUSER, pid, addr, long_val); +ptrace(PTRACE_GETREGS/GETFPREGS, pid, 0, &struct); +ptrace(PTRACE_SETREGS/SETFPREGS, pid, 0, &struct); +ptrace(PTRACE_GETREGSET, pid, NT_foo, &iov); +ptrace(PTRACE_SETREGSET, pid, NT_foo, &iov); +ptrace(PTRACE_GETSIGINFO, pid, 0, &siginfo); +ptrace(PTRACE_SETSIGINFO, pid, 0, &siginfo); +ptrace(PTRACE_GETEVENTMSG, pid, 0, &long_var); +ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_flags); +.EE +.in +.PP +Note that some errors are not reported. +For example, setting signal information +.RI ( siginfo ) +may have no effect in some ptrace-stops, yet the call may succeed +(return 0 and not set +.IR errno ); +querying +.B PTRACE_GETEVENTMSG +may succeed and return some random value if current ptrace-stop +is not documented as returning a meaningful event message. +.PP +The call +.PP +.in +4n +.EX +ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_flags); +.EE +.in +.PP +affects one tracee. +The tracee's current flags are replaced. +Flags are inherited by new tracees created and "auto-attached" via active +.BR PTRACE_O_TRACEFORK , +.BR PTRACE_O_TRACEVFORK , +or +.B PTRACE_O_TRACECLONE +options. +.PP +Another group of commands makes the ptrace-stopped tracee run. +They have the form: +.PP +.in +4n +.EX +ptrace(cmd, pid, 0, sig); +.EE +.in +.PP +where +.I cmd +is +.BR PTRACE_CONT , +.BR PTRACE_LISTEN , +.BR PTRACE_DETACH , +.BR PTRACE_SYSCALL , +.BR PTRACE_SINGLESTEP , +.BR PTRACE_SYSEMU , +or +.BR PTRACE_SYSEMU_SINGLESTEP . +If the tracee is in signal-delivery-stop, +.I sig +is the signal to be injected (if it is nonzero). +Otherwise, +.I sig +may be ignored. +(When restarting a tracee from a ptrace-stop other than signal-delivery-stop, +recommended practice is to always pass 0 in +.IR sig .) +.SS Attaching and detaching +A thread can be attached to the tracer using the call +.PP +.in +4n +.EX +ptrace(PTRACE_ATTACH, pid, 0, 0); +.EE +.in +.PP +or +.PP +.in +4n +.EX +ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_flags); +.EE +.in +.PP +.B PTRACE_ATTACH +sends +.B SIGSTOP +to this thread. +If the tracer wants this +.B SIGSTOP +to have no effect, it needs to suppress it. +Note that if other signals are concurrently sent to +this thread during attach, +the tracer may see the tracee enter signal-delivery-stop +with other signal(s) first! +The usual practice is to reinject these signals until +.B SIGSTOP +is seen, then suppress +.B SIGSTOP +injection. +The design bug here is that a ptrace attach and a concurrently delivered +.B SIGSTOP +may race and the concurrent +.B SIGSTOP +may be lost. +.\" +.\" FIXME Describe how to attach to a thread which is already group-stopped. +.PP +Since attaching sends +.B SIGSTOP +and the tracer usually suppresses it, this may cause a stray +.B EINTR +return from the currently executing system call in the tracee, +as described in the "Signal injection and suppression" section. +.PP +Since Linux 3.4, +.B PTRACE_SEIZE +can be used instead of +.BR PTRACE_ATTACH . +.B PTRACE_SEIZE +does not stop the attached process. +If you need to stop +it after attach (or at any other time) without sending it any signals, +use +.B PTRACE_INTERRUPT +command. +.PP +The request +.PP +.in +4n +.EX +ptrace(PTRACE_TRACEME, 0, 0, 0); +.EE +.in +.PP +turns the calling thread into a tracee. +The thread continues to run (doesn't enter ptrace-stop). +A common practice is to follow the +.B PTRACE_TRACEME +with +.PP +.in +4n +.EX +raise(SIGSTOP); +.EE +.in +.PP +and allow the parent (which is our tracer now) to observe our +signal-delivery-stop. +.PP +If the +.BR PTRACE_O_TRACEFORK , +.BR PTRACE_O_TRACEVFORK , +or +.B PTRACE_O_TRACECLONE +options are in effect, then children created by, respectively, +.BR vfork (2) +or +.BR clone (2) +with the +.B CLONE_VFORK +flag, +.BR fork (2) +or +.BR clone (2) +with the exit signal set to +.BR SIGCHLD , +and other kinds of +.BR clone (2), +are automatically attached to the same tracer which traced their parent. +.B SIGSTOP +is delivered to the children, causing them to enter +signal-delivery-stop after they exit the system call which created them. +.PP +Detaching of the tracee is performed by: +.PP +.in +4n +.EX +ptrace(PTRACE_DETACH, pid, 0, sig); +.EE +.in +.PP +.B PTRACE_DETACH +is a restarting operation; +therefore it requires the tracee to be in ptrace-stop. +If the tracee is in signal-delivery-stop, a signal can be injected. +Otherwise, the +.I sig +parameter may be silently ignored. +.PP +If the tracee is running when the tracer wants to detach it, +the usual solution is to send +.B SIGSTOP +(using +.BR tgkill (2), +to make sure it goes to the correct thread), +wait for the tracee to stop in signal-delivery-stop for +.B SIGSTOP +and then detach it (suppressing +.B SIGSTOP +injection). +A design bug is that this can race with concurrent +.BR SIGSTOP s. +Another complication is that the tracee may enter other ptrace-stops +and needs to be restarted and waited for again, until +.B SIGSTOP +is seen. +Yet another complication is to be sure that +the tracee is not already ptrace-stopped, +because no signal delivery happens while it is\[em]not even +.BR SIGSTOP . +.\" FIXME Describe how to detach from a group-stopped tracee so that it +.\" doesn't run, but continues to wait for SIGCONT. +.PP +If the tracer dies, all tracees are automatically detached and restarted, +unless they were in group-stop. +Handling of restart from group-stop is currently buggy, +but the "as planned" behavior is to leave tracee stopped and waiting for +.BR SIGCONT . +If the tracee is restarted from signal-delivery-stop, +the pending signal is injected. +.SS execve(2) under ptrace +.\" clone(2) CLONE_THREAD says: +.\" If any of the threads in a thread group performs an execve(2), +.\" then all threads other than the thread group leader are terminated, +.\" and the new program is executed in the thread group leader. +.\" +When one thread in a multithreaded process calls +.BR execve (2), +the kernel destroys all other threads in the process, +.\" In Linux 3.1 sources, see fs/exec.c::de_thread() +and resets the thread ID of the execing thread to the +thread group ID (process ID). +(Or, to put things another way, when a multithreaded process does an +.BR execve (2), +at completion of the call, it appears as though the +.BR execve (2) +occurred in the thread group leader, regardless of which thread did the +.BR execve (2).) +This resetting of the thread ID looks very confusing to tracers: +.IP \[bu] 3 +All other threads stop in +.B PTRACE_EVENT_EXIT +stop, if the +.B PTRACE_O_TRACEEXIT +option was turned on. +Then all other threads except the thread group leader report +death as if they exited via +.BR _exit (2) +with exit code 0. +.IP \[bu] +The execing tracee changes its thread ID while it is in the +.BR execve (2). +(Remember, under ptrace, the "pid" returned from +.BR waitpid (2), +or fed into ptrace calls, is the tracee's thread ID.) +That is, the tracee's thread ID is reset to be the same as its process ID, +which is the same as the thread group leader's thread ID. +.IP \[bu] +Then a +.B PTRACE_EVENT_EXEC +stop happens, if the +.B PTRACE_O_TRACEEXEC +option was turned on. +.IP \[bu] +If the thread group leader has reported its +.B PTRACE_EVENT_EXIT +stop by this time, +it appears to the tracer that +the dead thread leader "reappears from nowhere". +(Note: the thread group leader does not report death via +.I WIFEXITED(status) +until there is at least one other live thread. +This eliminates the possibility that the tracer will see +it dying and then reappearing.) +If the thread group leader was still alive, +for the tracer this may look as if thread group leader +returns from a different system call than it entered, +or even "returned from a system call even though +it was not in any system call". +If the thread group leader was not traced +(or was traced by a different tracer), then during +.BR execve (2) +it will appear as if it has become a tracee of +the tracer of the execing tracee. +.PP +All of the above effects are the artifacts of +the thread ID change in the tracee. +.PP +The +.B PTRACE_O_TRACEEXEC +option is the recommended tool for dealing with this situation. +First, it enables +.B PTRACE_EVENT_EXEC +stop, +which occurs before +.BR execve (2) +returns. +In this stop, the tracer can use +.B PTRACE_GETEVENTMSG +to retrieve the tracee's former thread ID. +(This feature was introduced in Linux 3.0.) +Second, the +.B PTRACE_O_TRACEEXEC +option disables legacy +.B SIGTRAP +generation on +.BR execve (2). +.PP +When the tracer receives +.B PTRACE_EVENT_EXEC +stop notification, +it is guaranteed that except this tracee and the thread group leader, +no other threads from the process are alive. +.PP +On receiving the +.B PTRACE_EVENT_EXEC +stop notification, +the tracer should clean up all its internal +data structures describing the threads of this process, +and retain only one data structure\[em]one which +describes the single still running tracee, with +.PP +.in +4n +.EX +thread ID == thread group ID == process ID. +.EE +.in +.PP +Example: two threads call +.BR execve (2) +at the same time: +.PP +.nf +*** we get syscall-enter-stop in thread 1: ** +PID1 execve("/bin/foo", "foo" <unfinished ...> +*** we issue PTRACE_SYSCALL for thread 1 ** +*** we get syscall-enter-stop in thread 2: ** +PID2 execve("/bin/bar", "bar" <unfinished ...> +*** we issue PTRACE_SYSCALL for thread 2 ** +*** we get PTRACE_EVENT_EXEC for PID0, we issue PTRACE_SYSCALL ** +*** we get syscall-exit-stop for PID0: ** +PID0 <... execve resumed> ) = 0 +.fi +.PP +If the +.B PTRACE_O_TRACEEXEC +option is +.I not +in effect for the execing tracee, +and if the tracee was +.BR PTRACE_ATTACH ed +rather that +.BR PTRACE_SEIZE d, +the kernel delivers an extra +.B SIGTRAP +to the tracee after +.BR execve (2) +returns. +This is an ordinary signal (similar to one which can be +generated by +.IR "kill \-TRAP" ), +not a special kind of ptrace-stop. +Employing +.B PTRACE_GETSIGINFO +for this signal returns +.I si_code +set to 0 +.RI ( SI_USER ). +This signal may be blocked by signal mask, +and thus may be delivered (much) later. +.PP +Usually, the tracer (for example, +.BR strace (1)) +would not want to show this extra post-execve +.B SIGTRAP +signal to the user, and would suppress its delivery to the tracee (if +.B SIGTRAP +is set to +.BR SIG_DFL , +it is a killing signal). +However, determining +.I which +.B SIGTRAP +to suppress is not easy. +Setting the +.B PTRACE_O_TRACEEXEC +option or using +.B PTRACE_SEIZE +and thus suppressing this extra +.B SIGTRAP +is the recommended approach. +.SS Real parent +The ptrace API (ab)uses the standard UNIX parent/child signaling over +.BR waitpid (2). +This used to cause the real parent of the process to stop receiving +several kinds of +.BR waitpid (2) +notifications when the child process is traced by some other process. +.PP +Many of these bugs have been fixed, but as of Linux 2.6.38 several still +exist; see BUGS below. +.PP +As of Linux 2.6.38, the following is believed to work correctly: +.IP \[bu] 3 +exit/death by signal is reported first to the tracer, then, +when the tracer consumes the +.BR waitpid (2) +result, to the real parent (to the real parent only when the +whole multithreaded process exits). +If the tracer and the real parent are the same process, +the report is sent only once. +.SH RETURN VALUE +On success, the +.B PTRACE_PEEK* +requests return the requested data (but see NOTES), +the +.B PTRACE_SECCOMP_GET_FILTER +request returns the number of instructions in the BPF program, +the +.B PTRACE_GET_SYSCALL_INFO +request returns the number of bytes available to be written by the kernel, +and other requests return zero. +.PP +On error, all requests return \-1, and +.I errno +is set to indicate the error. +Since the value returned by a successful +.B PTRACE_PEEK* +request may be \-1, the caller must clear +.I errno +before the call, and then check it afterward +to determine whether or not an error occurred. +.SH ERRORS +.TP +.B EBUSY +(i386 only) There was an error with allocating or freeing a debug register. +.TP +.B EFAULT +There was an attempt to read from or write to an invalid area in +the tracer's or the tracee's memory, +probably because the area wasn't mapped or accessible. +Unfortunately, under Linux, different variations of this fault +will return +.B EIO +or +.B EFAULT +more or less arbitrarily. +.TP +.B EINVAL +An attempt was made to set an invalid option. +.TP +.B EIO +.I request +is invalid, or an attempt was made to read from or +write to an invalid area in the tracer's or the tracee's memory, +or there was a word-alignment violation, +or an invalid signal was specified during a restart request. +.TP +.B EPERM +The specified process cannot be traced. +This could be because the +tracer has insufficient privileges (the required capability is +.BR CAP_SYS_PTRACE ); +unprivileged processes cannot trace processes that they +cannot send signals to or those running +set-user-ID/set-group-ID programs, for obvious reasons. +Alternatively, the process may already be being traced, +or (before Linux 2.6.26) be +.BR init (1) +(PID 1). +.TP +.B ESRCH +The specified process does not exist, or is not currently being traced +by the caller, or is not stopped +(for requests that require a stopped tracee). +.SH STANDARDS +None. +.SH HISTORY +SVr4, 4.3BSD. +.PP +Before Linux 2.6.26, +.\" See commit 00cd5c37afd5f431ac186dd131705048c0a11fdb +.BR init (1), +the process with PID 1, may not be traced. +.SH NOTES +Although arguments to +.BR ptrace () +are interpreted according to the prototype given, +glibc currently declares +.BR ptrace () +as a variadic function with only the +.I request +argument fixed. +It is recommended to always supply four arguments, +even if the requested operation does not use them, +setting unused/ignored arguments to +.I 0L +or +.IR "(void\ *)\ 0". +.PP +A tracees parent continues to be the tracer even if that tracer calls +.BR execve (2). +.PP +The layout of the contents of memory and the USER area are +quite operating-system- and architecture-specific. +The offset supplied, and the data returned, +might not entirely match with the definition of +.IR "struct user" . +.\" See http://lkml.org/lkml/2008/5/8/375 +.PP +The size of a "word" is determined by the operating-system variant +(e.g., for 32-bit Linux it is 32 bits). +.PP +This page documents the way the +.BR ptrace () +call works currently in Linux. +Its behavior differs significantly on other flavors of UNIX. +In any case, use of +.BR ptrace () +is highly specific to the operating system and architecture. +.\" +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.\" +.SS Ptrace access mode checking +Various parts of the kernel-user-space API (not just +.BR ptrace () +operations), require so-called "ptrace access mode" checks, +whose outcome determines whether an operation is permitted +(or, in a few cases, causes a "read" operation to return sanitized data). +These checks are performed in cases where one process can +inspect sensitive information about, +or in some cases modify the state of, another process. +The checks are based on factors such as the credentials and capabilities +of the two processes, +whether or not the "target" process is dumpable, +and the results of checks performed by any enabled Linux Security Module +(LSM)\[em]for example, SELinux, Yama, or Smack\[em]and by the commoncap LSM +(which is always invoked). +.PP +Prior to Linux 2.6.27, all access checks were of a single type. +Since Linux 2.6.27, +.\" commit 006ebb40d3d65338bd74abb03b945f8d60e362bd +two access mode levels are distinguished: +.TP +.B PTRACE_MODE_READ +For "read" operations or other operations that are less dangerous, +such as: +.BR get_robust_list (2); +.BR kcmp (2); +reading +.IR /proc/ pid /auxv , +.IR /proc/ pid /environ , +or +.IR /proc/ pid /stat ; +or +.BR readlink (2) +of a +.IR /proc/ pid /ns/* +file. +.TP +.B PTRACE_MODE_ATTACH +For "write" operations, or other operations that are more dangerous, +such as: ptrace attaching +.RB ( PTRACE_ATTACH ) +to another process +or calling +.BR process_vm_writev (2). +.RB ( PTRACE_MODE_ATTACH +was effectively the default before Linux 2.6.27.) +.\" +.\" Regarding the above description of the distinction between +.\" PTRACE_MODE_READ and PTRACE_MODE_ATTACH, Stephen Smalley notes: +.\" +.\" That was the intent when the distinction was introduced, but it doesn't +.\" appear to have been properly maintained, e.g. there is now a common +.\" helper lock_trace() that is used for +.\" /proc/pid/{stack,syscall,personality} but checks PTRACE_MODE_ATTACH, and +.\" PTRACE_MODE_ATTACH is also used in timerslack_ns_write/show(). Likely +.\" should review and make them consistent. There was also some debate +.\" about proper handling of /proc/pid/fd. Arguably that one might belong +.\" back in the _ATTACH camp. +.\" +.PP +Since Linux 4.5, +.\" commit caaee6234d05a58c5b4d05e7bf766131b810a657 +the above access mode checks are combined (ORed) with +one of the following modifiers: +.TP +.B PTRACE_MODE_FSCREDS +Use the caller's filesystem UID and GID (see +.BR credentials (7)) +or effective capabilities for LSM checks. +.TP +.B PTRACE_MODE_REALCREDS +Use the caller's real UID and GID or permitted capabilities for LSM checks. +This was effectively the default before Linux 4.5. +.PP +Because combining one of the credential modifiers with one of +the aforementioned access modes is typical, +some macros are defined in the kernel sources for the combinations: +.TP +.B PTRACE_MODE_READ_FSCREDS +Defined as +.BR "PTRACE_MODE_READ | PTRACE_MODE_FSCREDS" . +.TP +.B PTRACE_MODE_READ_REALCREDS +Defined as +.BR "PTRACE_MODE_READ | PTRACE_MODE_REALCREDS" . +.TP +.B PTRACE_MODE_ATTACH_FSCREDS +Defined as +.BR "PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS" . +.TP +.B PTRACE_MODE_ATTACH_REALCREDS +Defined as +.BR "PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS" . +.PP +One further modifier can be ORed with the access mode: +.TP +.BR PTRACE_MODE_NOAUDIT " (since Linux 3.3)" +.\" commit 69f594a38967f4540ce7a29b3fd214e68a8330bd +.\" Just for /proc/pid/stat +Don't audit this access mode check. +This modifier is employed for ptrace access mode checks +(such as checks when reading +.IR /proc/ pid /stat ) +that merely cause the output to be filtered or sanitized, +rather than causing an error to be returned to the caller. +In these cases, accessing the file is not a security violation and +there is no reason to generate a security audit record. +This modifier suppresses the generation of +such an audit record for the particular access check. +.PP +Note that all of the +.B PTRACE_MODE_* +constants described in this subsection are kernel-internal, +and not visible to user space. +The constant names are mentioned here in order to label the various kinds of +ptrace access mode checks that are performed for various system calls +and accesses to various pseudofiles (e.g., under +.IR /proc ). +These names are used in other manual pages to provide a simple +shorthand for labeling the different kernel checks. +.PP +The algorithm employed for ptrace access mode checking determines whether +the calling process is allowed to perform the corresponding action +on the target process. +(In the case of opening +.IR /proc/ pid +files, the "calling process" is the one opening the file, +and the process with the corresponding PID is the "target process".) +The algorithm is as follows: +.IP (1) 5 +If the calling thread and the target thread are in the same +thread group, access is always allowed. +.IP (2) +If the access mode specifies +.BR PTRACE_MODE_FSCREDS , +then, for the check in the next step, +employ the caller's filesystem UID and GID. +(As noted in +.BR credentials (7), +the filesystem UID and GID almost always have the same values +as the corresponding effective IDs.) +.IP +Otherwise, the access mode specifies +.BR PTRACE_MODE_REALCREDS , +so use the caller's real UID and GID for the checks in the next step. +(Most APIs that check the caller's UID and GID use the effective IDs. +For historical reasons, the +.B PTRACE_MODE_REALCREDS +check uses the real IDs instead.) +.IP (3) +Deny access if +.I neither +of the following is true: +.RS +.IP \[bu] 3 +The real, effective, and saved-set user IDs of the target +match the caller's user ID, +.I and +the real, effective, and saved-set group IDs of the target +match the caller's group ID. +.IP \[bu] +The caller has the +.B CAP_SYS_PTRACE +capability in the user namespace of the target. +.RE +.IP (4) +Deny access if the target process "dumpable" attribute has a value other than 1 +.RB ( SUID_DUMP_USER ; +see the discussion of +.B PR_SET_DUMPABLE +in +.BR prctl (2)), +and the caller does not have the +.B CAP_SYS_PTRACE +capability in the user namespace of the target process. +.IP (5) +The kernel LSM +.IR security_ptrace_access_check () +interface is invoked to see if ptrace access is permitted. +The results depend on the LSM(s). +The implementation of this interface in the commoncap LSM performs +the following steps: +.\" (in cap_ptrace_access_check()): +.RS +.IP (5.1) 7 +If the access mode includes +.BR PTRACE_MODE_FSCREDS , +then use the caller's +.I effective +capability set +in the following check; +otherwise (the access mode specifies +.BR PTRACE_MODE_REALCREDS , +so) use the caller's +.I permitted +capability set. +.IP (5.2) +Deny access if +.I neither +of the following is true: +.RS +.IP \[bu] 3 +The caller and the target process are in the same user namespace, +and the caller's capabilities are a superset of the target process's +.I permitted +capabilities. +.IP \[bu] +The caller has the +.B CAP_SYS_PTRACE +capability in the target process's user namespace. +.RE +.IP +Note that the commoncap LSM does not distinguish between +.B PTRACE_MODE_READ +and +.BR PTRACE_MODE_ATTACH . +.RE +.IP (6) +If access has not been denied by any of the preceding steps, +then access is allowed. +.\" +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.\" +.SS /proc/sys/kernel/yama/ptrace_scope +On systems with the Yama Linux Security Module (LSM) installed +(i.e., the kernel was configured with +.BR CONFIG_SECURITY_YAMA ), +the +.I /proc/sys/kernel/yama/ptrace_scope +file (available since Linux 3.4) +.\" commit 2d514487faf188938a4ee4fb3464eeecfbdcf8eb +can be used to restrict the ability to trace a process with +.BR ptrace () +(and thus also the ability to use tools such as +.BR strace (1) +and +.BR gdb (1)). +The goal of such restrictions is to prevent attack escalation whereby +a compromised process can ptrace-attach to other sensitive processes +(e.g., a GPG agent or an SSH session) owned by the user in order +to gain additional credentials that may exist in memory +and thus expand the scope of the attack. +.PP +More precisely, the Yama LSM limits two types of operations: +.IP \[bu] 3 +Any operation that performs a ptrace access mode +.B PTRACE_MODE_ATTACH +check\[em]for example, +.BR ptrace () +.BR PTRACE_ATTACH . +(See the "Ptrace access mode checking" discussion above.) +.IP \[bu] +.BR ptrace () +.BR PTRACE_TRACEME . +.PP +A process that has the +.B CAP_SYS_PTRACE +capability can update the +.I /proc/sys/kernel/yama/ptrace_scope +file with one of the following values: +.TP +0 ("classic ptrace permissions") +No additional restrictions on operations that perform +.B PTRACE_MODE_ATTACH +checks (beyond those imposed by the commoncap and other LSMs). +.IP +The use of +.B PTRACE_TRACEME +is unchanged. +.TP +1 ("restricted ptrace") [default value] +When performing an operation that requires a +.B PTRACE_MODE_ATTACH +check, the calling process must either have the +.B CAP_SYS_PTRACE +capability in the user namespace of the target process or +it must have a predefined relationship with the target process. +By default, +the predefined relationship is that the target process +must be a descendant of the caller. +.IP +A target process can employ the +.BR prctl (2) +.B PR_SET_PTRACER +operation to declare an additional PID that is allowed to perform +.B PTRACE_MODE_ATTACH +operations on the target. +See the kernel source file +.I Documentation/admin\-guide/LSM/Yama.rst +.\" commit 90bb766440f2147486a2acc3e793d7b8348b0c22 +(or +.I Documentation/security/Yama.txt +before Linux 4.13) +for further details. +.IP +The use of +.B PTRACE_TRACEME +is unchanged. +.TP +2 ("admin-only attach") +Only processes with the +.B CAP_SYS_PTRACE +capability in the user namespace of the target process may perform +.B PTRACE_MODE_ATTACH +operations or trace children that employ +.BR PTRACE_TRACEME . +.TP +3 ("no attach") +No process may perform +.B PTRACE_MODE_ATTACH +operations or trace children that employ +.BR PTRACE_TRACEME . +.IP +Once this value has been written to the file, it cannot be changed. +.PP +With respect to values 1 and 2, +note that creating a new user namespace effectively removes the +protection offered by Yama. +This is because a process in the parent user namespace whose effective +UID matches the UID of the creator of a child namespace +has all capabilities (including +.BR CAP_SYS_PTRACE ) +when performing operations within the child user namespace +(and further-removed descendants of that namespace). +Consequently, when a process tries to use user namespaces to sandbox itself, +it inadvertently weakens the protections offered by the Yama LSM. +.\" +.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +.\" +.SS C library/kernel differences +At the system call level, the +.BR PTRACE_PEEKTEXT , +.BR PTRACE_PEEKDATA , +and +.B PTRACE_PEEKUSER +requests have a different API: they store the result +at the address specified by the +.I data +parameter, and the return value is the error flag. +The glibc wrapper function provides the API given in DESCRIPTION above, +with the result being returned via the function return value. +.SH BUGS +On hosts with Linux 2.6 kernel headers, +.B PTRACE_SETOPTIONS +is declared with a different value than the one for Linux 2.4. +This leads to applications compiled with Linux 2.6 kernel +headers failing when run on Linux 2.4. +This can be worked around by redefining +.B PTRACE_SETOPTIONS +to +.BR PTRACE_OLDSETOPTIONS , +if that is defined. +.PP +Group-stop notifications are sent to the tracer, but not to real parent. +Last confirmed on 2.6.38.6. +.PP +If a thread group leader is traced and exits by calling +.BR _exit (2), +.\" Note from Denys Vlasenko: +.\" Here "exits" means any kind of death - _exit, exit_group, +.\" signal death. Signal death and exit_group cases are trivial, +.\" though: since signal death and exit_group kill all other threads +.\" too, "until all other threads exit" thing happens rather soon +.\" in these cases. Therefore, only _exit presents observably +.\" puzzling behavior to ptrace users: thread leader _exit's, +.\" but WIFEXITED isn't reported! We are trying to explain here +.\" why it is so. +a +.B PTRACE_EVENT_EXIT +stop will happen for it (if requested), but the subsequent +.B WIFEXITED +notification will not be delivered until all other threads exit. +As explained above, if one of other threads calls +.BR execve (2), +the death of the thread group leader will +.I never +be reported. +If the execed thread is not traced by this tracer, +the tracer will never know that +.BR execve (2) +happened. +One possible workaround is to +.B PTRACE_DETACH +the thread group leader instead of restarting it in this case. +Last confirmed on 2.6.38.6. +.\" FIXME . need to test/verify this scenario +.PP +A +.B SIGKILL +signal may still cause a +.B PTRACE_EVENT_EXIT +stop before actual signal death. +This may be changed in the future; +.B SIGKILL +is meant to always immediately kill tasks even under ptrace. +Last confirmed on Linux 3.13. +.PP +Some system calls return with +.B EINTR +if a signal was sent to a tracee, but delivery was suppressed by the tracer. +(This is very typical operation: it is usually +done by debuggers on every attach, in order to not introduce +a bogus +.BR SIGSTOP ). +As of Linux 3.2.9, the following system calls are affected +(this list is likely incomplete): +.BR epoll_wait (2), +and +.BR read (2) +from an +.BR inotify (7) +file descriptor. +The usual symptom of this bug is that when you attach to +a quiescent process with the command +.PP +.in +4n +.EX +strace \-p <process\-ID> +.EE +.in +.PP +then, instead of the usual +and expected one-line output such as +.PP +.in +4n +.EX +restart_syscall(<... resuming interrupted call ...>_ +.EE +.in +.PP +or +.PP +.in +4n +.EX +select(6, [5], NULL, [5], NULL_ +.EE +.in +.PP +('_' denotes the cursor position), you observe more than one line. +For example: +.PP +.in +4n +.EX + clock_gettime(CLOCK_MONOTONIC, {15370, 690928118}) = 0 + epoll_wait(4,_ +.EE +.in +.PP +What is not visible here is that the process was blocked in +.BR epoll_wait (2) +before +.BR strace (1) +has attached to it. +Attaching caused +.BR epoll_wait (2) +to return to user space with the error +.BR EINTR . +In this particular case, the program reacted to +.B EINTR +by checking the current time, and then executing +.BR epoll_wait (2) +again. +(Programs which do not expect such "stray" +.B EINTR +errors may behave in an unintended way upon an +.BR strace (1) +attach.) +.PP +Contrary to the normal rules, the glibc wrapper for +.BR ptrace () +can set +.I errno +to zero. +.SH SEE ALSO +.BR gdb (1), +.BR ltrace (1), +.BR strace (1), +.BR clone (2), +.BR execve (2), +.BR fork (2), +.BR gettid (2), +.BR prctl (2), +.BR seccomp (2), +.BR sigaction (2), +.BR tgkill (2), +.BR vfork (2), +.BR waitpid (2), +.BR exec (3), +.BR capabilities (7), +.BR signal (7) |