diff options
Diffstat (limited to 'upstream/opensuse-tumbleweed/man2/perf_event_open.2')
-rw-r--r-- | upstream/opensuse-tumbleweed/man2/perf_event_open.2 | 206 |
1 files changed, 127 insertions, 79 deletions
diff --git a/upstream/opensuse-tumbleweed/man2/perf_event_open.2 b/upstream/opensuse-tumbleweed/man2/perf_event_open.2 index d9e7877c..882797da 100644 --- a/upstream/opensuse-tumbleweed/man2/perf_event_open.2 +++ b/upstream/opensuse-tumbleweed/man2/perf_event_open.2 @@ -5,7 +5,7 @@ .\" This document is based on the perf_event.h header file, the .\" tools/perf/design.txt file, and a lot of bitter experience. .\" -.TH perf_event_open 2 2023-05-03 "Linux man-pages 6.05.01" +.TH perf_event_open 2 2024-05-02 "Linux man-pages (unreleased)" .SH NAME perf_event_open \- set up performance monitoring .SH LIBRARY @@ -17,12 +17,12 @@ Standard C library .BR "#include <linux/hw_breakpoint.h>" " /* Definition of " HW_* " constants */" .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */" .B #include <unistd.h> -.PP +.P .BI "int syscall(SYS_perf_event_open, struct perf_event_attr *" attr , .BI " pid_t " pid ", int " cpu ", int " group_fd \ ", unsigned long " flags ); .fi -.PP +.P .IR Note : glibc provides no wrapper for .BR perf_event_open (), @@ -32,7 +32,12 @@ necessitating the use of Given a list of parameters, .BR perf_event_open () returns a file descriptor, for use in subsequent system calls -.RB ( read "(2), " mmap "(2), " prctl "(2), " fcntl "(2), etc.)." +(\c +.BR read (2), +.BR mmap (2), +.BR prctl (2), +.BR fcntl (2), +etc.). .PP A call to .BR perf_event_open () @@ -41,14 +46,14 @@ information. Each file descriptor corresponds to one event that is measured; these can be grouped together to measure multiple events simultaneously. -.PP +.P Events can be enabled and disabled in two ways: via .BR ioctl (2) and via .BR prctl (2). When an event is disabled it does not count or generate overflows but does continue to exist and maintain its count value. -.PP +.P Events come in two flavors: counting and sampled. A .I counting @@ -95,7 +100,7 @@ value of less than 1. .TP .BR "pid == \-1" " and " "cpu == \-1" This setting is invalid and will return an error. -.PP +.P When .I pid is greater than zero, permission to perform this system call @@ -105,7 +110,7 @@ is governed by .B PTRACE_MODE_READ_REALCREDS check on older Linux versions; see .BR ptrace (2). -.PP +.P The .I group_fd argument allows event groups to be created. @@ -127,7 +132,7 @@ This means that the values of the member events can be meaningfully compared \[em]added, divided (to get ratios), and so on\[em] with each other, since they have counted events for the same set of executed instructions. -.PP +.P The .I flags argument is formed by ORing together zero or more of the following values: @@ -186,12 +191,12 @@ must be passed as the parameter. cgroup monitoring is available only for system-wide events and may therefore require extra permissions. -.PP +.P The .I perf_event_attr structure provides detailed configuration information for the event being created. -.PP +.P .in +4n .EX struct perf_event_attr { @@ -291,7 +296,7 @@ struct perf_event_attr { }; .EE .in -.PP +.P The fields of the .I perf_event_attr structure are described in more detail below: @@ -562,7 +567,7 @@ This counts context switches to a task in a different cgroup. In other words, if the next task is in the same cgroup, it won't count the switch. .RE -.PP +.P .RS If .I type @@ -575,7 +580,7 @@ can be obtained from under debugfs .I tracing/events/*/*/id if ftrace is enabled in the kernel. .RE -.PP +.P .RS If .I type @@ -586,7 +591,7 @@ To calculate the appropriate .I config value, use the following equation: .RS 4 -.PP +.P .in +4n .EX config = (perf_hw_cache_id) | @@ -594,7 +599,7 @@ config = (perf_hw_cache_id) | (perf_hw_cache_op_result_id << 16); .EE .in -.PP +.P where .I perf_hw_cache_id is one of: @@ -622,7 +627,7 @@ for measuring the branch prediction unit .\" commit 89d6c0b5bdbb1927775584dcf532d98b3efe1477 for measuring local memory accesses .RE -.PP +.P and .I perf_hw_cache_op_id is one of: @@ -637,7 +642,7 @@ for write accesses .B PERF_COUNT_HW_CACHE_OP_PREFETCH for prefetch accesses .RE -.PP +.P and .I perf_hw_cache_op_result_id is one of: @@ -650,7 +655,7 @@ to measure accesses to measure misses .RE .RE -.PP +.P If .I type is @@ -666,7 +671,7 @@ The libpfm4 library can be used to translate from the name in the architectural manuals to the raw hex value .BR perf_event_open () expects in this field. -.PP +.P If .I type is @@ -675,7 +680,7 @@ then leave .I config set to zero. Its parameters are set in other places. -.PP +.P If .I type is @@ -698,7 +703,13 @@ and for more details. .RE .TP -.IR kprobe_func ", " uprobe_path ", " kprobe_addr ", and " probe_offset +.I kprobe_func +.TQ +.I uprobe_path +.TQ +.I kprobe_addr +.TQ +.I probe_offset These fields describe the kprobe/uprobe for dynamic PMUs .B kprobe and @@ -721,7 +732,9 @@ use and .IR probe_offset . .TP -.IR sample_period ", " sample_freq +.I sample_period +.TQ +.I sample_freq A "sampling" event is one that generates an overflow notification every N events, where N is given by .IR sample_period . @@ -925,7 +938,7 @@ not both. It has the following format and the meaning of each field is dependent on the hardware implementation. -.PP +.P .in +4n .EX union perf_sample_weight { @@ -1354,7 +1367,9 @@ This enables synchronous signal delivery of .B SIGTRAP on event overflow. .TP -.IR wakeup_events ", " wakeup_watermark +.I wakeup_events +.TQ +.I wakeup_watermark This union sets how many samples .RI ( wakeup_events ) or bytes @@ -1400,7 +1415,7 @@ Count when we read or write the memory location. .TP .B HW_BREAKPOINT_X Count when we execute code at the memory location. -.PP +.P The values can be combined via a bitwise or, but the combination of .B HW_BREAKPOINT_R @@ -1474,7 +1489,7 @@ Branch target is in hypervisor. .TP .B PERF_SAMPLE_BRANCH_PLM_ALL A convenience value that is the three preceding values ORed together. -.PP +.P In addition to the privilege value, at least one or more of the following bits must be set. .TP @@ -1591,12 +1606,12 @@ The values that are there are specified by the field in the .I attr structure at open time. -.PP +.P If you attempt to read into a buffer that is not big enough to hold the data, the error .B ENOSPC results. -.PP +.P Here is the layout of the data returned by a read: .IP \[bu] 3 If @@ -1635,7 +1650,7 @@ struct read_format { }; .EE .in -.PP +.P The values read are as follows: .TP .I nr @@ -1644,7 +1659,9 @@ Available only if .B PERF_FORMAT_GROUP was specified. .TP -.IR time_enabled ", " time_running +.I time_enabled +.TQ +.I time_running Total time the event was enabled and running. Normally these values are the same. Multiplexing happens if the number of events is more than the @@ -1680,18 +1697,18 @@ mmap tracking) are logged into a ring-buffer. This ring-buffer is created and accessed through .BR mmap (2). -.PP +.P The mmap size should be 1+2\[ha]n pages, where the first page is a metadata page .RI ( "struct perf_event_mmap_page" ) that contains various bits of information such as where the ring-buffer head is. -.PP +.P Before Linux 2.6.39, there is a bug that means you must allocate an mmap ring buffer when sampling even if you do not plan to access it. -.PP +.P The structure of the first metadata mmap page is as follows: -.PP +.P .in +4n .EX struct perf_event_mmap_page { @@ -1729,7 +1746,7 @@ struct perf_event_mmap_page { } .EE .in -.PP +.P The following list describes the fields in the .I perf_event_mmap_page structure in more detail: @@ -1861,7 +1878,11 @@ count += pmc; .EE .in .TP -.IR time_shift ", " time_mult ", " time_offset +.I time_shift +.TQ +.I time_mult +.TQ +.I time_offset .IP If .IR cap_usr_time , @@ -1966,7 +1987,13 @@ where perf sample data begins. Contains the size of the perf sample region within the mmap buffer. .TP -.IR aux_head ", " aux_tail ", " aux_offset ", " aux_size " (since Linux 4.1)" +.I aux_head +.TQ +.I aux_tail +.TQ +.I aux_offset +.TQ +.I aux_size " (since Linux 4.1)" .\" commit 45bfb2e50471abbbfd83d40d28c986078b0d24ff The AUX region allows .BR mmap (2)-ing @@ -2011,9 +2038,9 @@ rules as the previous described .I data_head and .IR data_tail . -.PP +.P The following 2^n ring-buffer pages have the layout described below. -.PP +.P If .I perf_event_attr.sample_id_all is set, then all event types will @@ -2027,9 +2054,9 @@ fields, that is, at the end of the payload. This allows a newer perf.data file to be supported by older perf tools, with the new optional fields being ignored. -.PP +.P The mmap values start with a header: -.PP +.P .in +4n .EX struct perf_event_header { @@ -2039,7 +2066,7 @@ struct perf_event_header { }; .EE .in -.PP +.P Below, we describe the .I perf_event_header fields in more detail. @@ -2080,7 +2107,7 @@ Sample happened in the guest kernel. .\" commit 39447b386c846bbf1c56f6403c5282837486200f Sample happened in guest user code. .RE -.PP +.P .RS Since the following three statuses are generated by different record types, they alias to the same bit: @@ -2109,7 +2136,7 @@ record is generated, this bit indicates that the context switch is away from the current process (instead of into the current process). .RE -.PP +.P .RS In addition, the following bits can be set: .TP @@ -2260,7 +2287,9 @@ struct { .EE .in .TP -.BR PERF_RECORD_THROTTLE ", " PERF_RECORD_UNTHROTTLE +.B PERF_RECORD_THROTTLE +.TQ +.B PERF_RECORD_UNTHROTTLE This record indicates a throttle/unthrottle event. .IP .in +4n @@ -2373,7 +2402,9 @@ If is enabled, then a 64-bit instruction pointer value is included. .TP -.IR pid ", " tid +.I pid +.TQ +.I tid If .B PERF_SAMPLE_TID is enabled, then a 32-bit process ID @@ -2412,7 +2443,9 @@ the actual ID is returned, not the group leader. This ID is the same as the one returned by .BR PERF_FORMAT_ID . .TP -.IR cpu ", " res +.I cpu +.TQ +.I res If .B PERF_SAMPLE_CPU is enabled, this is a 32-bit value indicating @@ -2436,7 +2469,9 @@ value used at .BR perf_event_open () time. .TP -.IR nr ", " ips[nr] +.I nr +.TQ +.I ips[nr] If .B PERF_SAMPLE_CALLCHAIN is enabled, then a 64-bit number is included @@ -2444,7 +2479,9 @@ which indicates how many following 64-bit instruction pointers will follow. This is the current callchain. .TP -.IR size ", " data[size] +.I size +.TQ +.I data[size] If .B PERF_SAMPLE_RAW is enabled, then a 32-bit value indicating size @@ -2456,7 +2493,9 @@ The ABI doesn't make any promises with respect to the stability of its content, it may vary depending on event, hardware, and kernel version. .TP -.IR bnr ", " lbr[bnr] +.I bnr +.TQ +.I lbr[bnr] If .B PERF_SAMPLE_BRANCH_STACK is enabled, then a 64-bit value indicating @@ -2490,10 +2529,10 @@ The branch was in an aborted transactional memory transaction. .\" commit 71ef3c6b9d4665ee7afbbe4c208a98917dcfc32f This reports the number of cycles elapsed since the previous branch stack update. -.PP +.P The entries are from most to least recent, so the first entry has the most recent branch. -.PP +.P Support for .IR mispred , .IR predicted , @@ -2501,13 +2540,15 @@ and .I cycles is optional; if not supported, those values will be 0. -.PP +.P The type of branches recorded is specified by the .I branch_sample_type field. .RE .TP -.IR abi ", " regs[weight(mask)] +.I abi +.TQ +.I regs[weight(mask)] If .B PERF_SAMPLE_REGS_USER is enabled, then the user CPU registers are recorded. @@ -2530,7 +2571,11 @@ The number of values is the number of bits set in the .I sample_regs_user bit mask. .TP -.IR size ", " data[size] ", " dyn_size +.I size +.TQ +.I data[size] +.TQ +.I dyn_size If .B PERF_SAMPLE_STACK_USER is enabled, then the user stack is recorded. @@ -2754,7 +2799,9 @@ the high 32 bits of the field by shifting right by and masking with the value .BR PERF_TXN_ABORT_MASK . .TP -.IR abi ", " regs[weight(mask)] +.I abi +.TQ +.I regs[weight(mask)] If .B PERF_SAMPLE_REGS_INTR is enabled, then the user CPU registers are recorded. @@ -3254,13 +3301,13 @@ and .B F_SETSIG operations in .BR fcntl (2). -.PP +.P Overflows are generated only by sampling events .RI ( sample_period must have a nonzero value). -.PP +.P There are two ways to generate overflow notifications. -.PP +.P The first is to set a .I wakeup_events or @@ -3270,7 +3317,7 @@ or bytes have been written to the mmap ring buffer. In this case, .B POLL_IN is indicated. -.PP +.P The other way is by use of the .B PERF_EVENT_IOC_REFRESH ioctl. @@ -3282,13 +3329,13 @@ once the counter reaches 0 .B POLL_HUP is indicated and the underlying event is disabled. -.PP +.P Refreshing an event group leader refreshes all siblings and refreshing with a parameter of 0 currently enables infinite refreshes; these behaviors are unsupported and should not be relied on. .\" See https://lkml.org/lkml/2011/5/24/337 -.PP +.P Starting with Linux 3.18, .\" commit 179033b3e064d2cd3f5f9945e76b0a0f0fbf4883 .B POLL_HUP @@ -3302,12 +3349,12 @@ instruction to get low-latency reads without having to enter the kernel. Note that using .I rdpmc is not necessarily faster than other methods for reading event values. -.PP +.P Support for this can be detected with the .I cap_usr_rdpmc field in the mmap page; documentation on how to calculate event values can be found in that section. -.PP +.P Originally, when rdpmc support was enabled, any process (not just ones with an active perf event) could use the rdpmc instruction to access the counters. @@ -3567,10 +3614,10 @@ Maximum number of pages an unprivileged user can .BR mlock (2). The default is 516 (kB). .RE -.PP +.P Files in .I /sys/bus/event_source/devices/ -.PP +.P .RS 4 Since Linux 2.6.34, the kernel supports having multiple PMUs available for monitoring. @@ -3831,7 +3878,7 @@ The official way of knowing if support is enabled is checking for the existence of the file .IR /proc/sys/kernel/perf_event_paranoid . -.PP +.P .B CAP_PERFMON capability (since Linux 5.8) provides secure approach to performance monitoring and observability operations in a system @@ -3855,7 +3902,7 @@ option to is needed to properly get overflow signals in threads. This was introduced in Linux 2.6.32. .\" commit ba0a6c9f6fceed11c6a99e8326f0477fe383e6b5 -.PP +.P Prior to Linux 2.6.33 (at least for x86), .\" commit b690081d4d3f6a23541493f1682835c3cd5c54a1 the kernel did not check @@ -3865,40 +3912,40 @@ This means to see if a given set of events works you have to .BR perf_event_open (), start, then read before you know for sure you can get valid measurements. -.PP +.P Prior to Linux 2.6.34, .\" FIXME . cannot find a kernel commit for this one event constraints were not enforced by the kernel. In that case, some events would silently return "0" if the kernel scheduled them in an improper counter slot. -.PP +.P Prior to Linux 2.6.34, there was a bug when multiplexing where the wrong results could be returned. .\" commit 45e16a6834b6af098702e5ea6c9a40de42ff77d8 -.PP +.P Kernels from Linux 2.6.35 to Linux 2.6.39 can quickly crash the kernel if "inherit" is enabled and many threads are started. .\" commit 38b435b16c36b0d863efcf3f07b34a6fac9873fd -.PP +.P Prior to Linux 2.6.35, .\" commit 050735b08ca8a016bbace4445fa025b88fee770b .B PERF_FORMAT_GROUP did not work with attached processes. -.PP +.P There is a bug in the kernel code between Linux 2.6.36 and Linux 3.0 that ignores the "watermark" field and acts as if a wakeup_event was chosen if the union has a nonzero value in it. .\" commit 4ec8363dfc1451f8c8f86825731fe712798ada02 -.PP +.P From Linux 2.6.31 to Linux 3.4, the .B PERF_IOC_FLAG_GROUP ioctl argument was broken and would repeatedly operate on the event specified rather than iterating across all sibling events in a group. .\" commit 724b6daa13e100067c30cfc4d1ad06629609dc4e -.PP +.P From Linux 3.4 to Linux 3.11, the mmap .\" commit fa7315871046b9a4c48627905691dbde57e51033 .I cap_usr_rdpmc @@ -3910,7 +3957,7 @@ Code should migrate to the new and .I cap_user_time fields instead. -.PP +.P Always double-check your results! Various generalized events have had wrong values. For example, retired branches measured @@ -3920,7 +3967,7 @@ the wrong thing on AMD machines until Linux 2.6.35. The following is a short example that measures the total instruction count of a call to .BR printf (3). -.PP +.P .\" SRC BEGIN (perf_event_open.c) .EX #include <linux/perf_event.h> @@ -3929,6 +3976,7 @@ instruction count of a call to #include <string.h> #include <sys/ioctl.h> #include <sys/syscall.h> +#include <sys/types.h> #include <unistd.h> \& static long @@ -3984,6 +4032,6 @@ main(void) .BR open (2), .BR prctl (2), .BR read (2) -.PP +.P .I Documentation/admin\-guide/perf\-security.rst in the kernel source tree |