From 5d1646d90e1f2cceb9f0828f4b28318cd0ec7744 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 27 Apr 2024 12:05:51 +0200 Subject: Adding upstream version 5.10.209. Signed-off-by: Daniel Baumann --- tools/perf/Documentation/perf-record.txt | 677 +++++++++++++++++++++++++++++++ 1 file changed, 677 insertions(+) create mode 100644 tools/perf/Documentation/perf-record.txt (limited to 'tools/perf/Documentation/perf-record.txt') diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt new file mode 100644 index 000000000..768888b93 --- /dev/null +++ b/tools/perf/Documentation/perf-record.txt @@ -0,0 +1,677 @@ +perf-record(1) +============== + +NAME +---- +perf-record - Run a command and record its profile into perf.data + +SYNOPSIS +-------- +[verse] +'perf record' [-e | --event=EVENT] [-a] +'perf record' [-e | --event=EVENT] [-a] -- [] + +DESCRIPTION +----------- +This command runs a command and gathers a performance counter profile +from it, into perf.data - without displaying anything. + +This file can then be inspected later on, using 'perf report'. + + +OPTIONS +------- +...:: + Any command you can specify in a shell. + +-e:: +--event=:: + Select the PMU event. Selection can be: + + - a symbolic event name (use 'perf list' to list all events) + + - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a + hexadecimal event descriptor. + + - a symbolic or raw PMU event followed by an optional colon + and a list of event modifiers, e.g., cpu-cycles:p. See the + linkperf:perf-list[1] man page for details on event modifiers. + + - a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where + 'param1', 'param2', etc are defined as formats for the PMU in + /sys/bus/event_source/devices//format/*. + + - a symbolically formed event like 'pmu/config=M,config1=N,config3=K/' + + where M, N, K are numbers (in decimal, hex, octal format). Acceptable + values for each of 'config', 'config1' and 'config2' are defined by + corresponding entries in /sys/bus/event_source/devices//format/* + param1 and param2 are defined as formats for the PMU in: + /sys/bus/event_source/devices//format/* + + There are also some parameters which are not defined in ...//format/*. + These params can be used to overload default config values per event. + Here are some common parameters: + - 'period': Set event sampling period + - 'freq': Set event sampling frequency + - 'time': Disable/enable time stamping. Acceptable values are 1 for + enabling time stamping. 0 for disabling time stamping. + The default is 1. + - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for + FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and + "no" for disable callgraph. + - 'stack-size': user stack size for dwarf mode + - 'name' : User defined event name. Single quotes (') may be used to + escape symbols in the name from parsing by shell and tool + like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'. + - 'aux-output': Generate AUX records instead of events. This requires + that an AUX area event is also provided. + - 'aux-sample-size': Set sample size for AUX area sampling. If the + '--aux-sample' option has been used, set aux-sample-size=0 to disable + AUX area sampling for the event. + + See the linkperf:perf-list[1] man page for more parameters. + + Note: If user explicitly sets options which conflict with the params, + the value set by the parameters will be overridden. + + Also not defined in ...//format/* are PMU driver specific + configuration parameters. Any configuration parameter preceded by + the letter '@' is not interpreted in user space and sent down directly + to the PMU driver. For example: + + perf record -e some_event/@cfg1,@cfg2=config/ ... + + will see 'cfg1' and 'cfg2=config' pushed to the PMU driver associated + with the event for further processing. There is no restriction on + what the configuration parameters are, as long as their semantic is + understood and supported by the PMU driver. + + - a hardware breakpoint event in the form of '\mem:addr[/len][:access]' + where addr is the address in memory you want to break in. + Access is the memory access type (read, write, execute) it can + be passed as follows: '\mem:addr[:[r][w][x]]'. len is the range, + number of bytes from specified addr, which the breakpoint will cover. + If you want to profile read-write accesses in 0x1000, just set + 'mem:0x1000:rw'. + If you want to profile write accesses in [0x1000~1008), just set + 'mem:0x1000/8:w'. + + - a BPF source file (ending in .c) or a precompiled object file (ending + in .o) selects one or more BPF events. + The BPF program can attach to various perf events based on the ELF section + names. + + When processing a '.c' file, perf searches an installed LLVM to compile it + into an object file first. Optional clang options can be passed via the + '--clang-opt' command line option, e.g.: + + perf record --clang-opt "-DLINUX_VERSION_CODE=0x50000" \ + -e tests/bpf-script-example.c + + Note: '--clang-opt' must be placed before '--event/-e'. + + - a group of events surrounded by a pair of brace ("{event1,event2,...}"). + Each event is separated by commas and the group should be quoted to + prevent the shell interpretation. You also need to use --group on + "perf report" to view group events together. + +--filter=:: + Event filter. This option should follow an event selector (-e) which + selects either tracepoint event(s) or a hardware trace PMU + (e.g. Intel PT or CoreSight). + + - tracepoint filters + + In the case of tracepoints, multiple '--filter' options are combined + using '&&'. + + - address filters + + A hardware trace PMU advertises its ability to accept a number of + address filters by specifying a non-zero value in + /sys/bus/event_source/devices//nr_addr_filters. + + Address filters have the format: + + filter|start|stop|tracestop [/ ] [@] + + Where: + - 'filter': defines a region that will be traced. + - 'start': defines an address at which tracing will begin. + - 'stop': defines an address at which tracing will stop. + - 'tracestop': defines a region in which tracing will stop. + + is the name of the object file, is the offset to the + code to trace in that file, and is the size of the region to + trace. 'start' and 'stop' filters need not specify a . + + If no object file is specified then the kernel is assumed, in which case + the start address must be a current kernel memory address. + + can also be specified by providing the name of a symbol. If the + symbol name is not unique, it can be disambiguated by inserting #n where + 'n' selects the n'th symbol in address order. Alternately #0, #g or #G + select only a global symbol. can also be specified by providing + the name of a symbol, in which case the size is calculated to the end + of that symbol. For 'filter' and 'tracestop' filters, if is + omitted and is a symbol, then the size is calculated to the end + of that symbol. + + If is omitted and is '*', then the start and size will + be calculated from the first and last symbols, i.e. to trace the whole + file. + + If symbol names (or '*') are provided, they must be surrounded by white + space. + + The filter passed to the kernel is not necessarily the same as entered. + To see the filter that is passed, use the -v option. + + The kernel may not be able to configure a trace region if it is not + within a single mapping. MMAP events (or /proc//maps) can be + examined to determine if that is a possibility. + + Multiple filters can be separated with space or comma. + +--exclude-perf:: + Don't record events issued by perf itself. This option should follow + an event selector (-e) which selects tracepoint event(s). It adds a + filter expression 'common_pid != $PERFPID' to filters. If other + '--filter' exists, the new filter expression will be combined with + them by '&&'. + +-a:: +--all-cpus:: + System-wide collection from all CPUs (default if no target is specified). + +-p:: +--pid=:: + Record events on existing process ID (comma separated list). + +-t:: +--tid=:: + Record events on existing thread ID (comma separated list). + This option also disables inheritance by default. Enable it by adding + --inherit. + +-u:: +--uid=:: + Record events in threads owned by uid. Name or number. + +-r:: +--realtime=:: + Collect data with this RT SCHED_FIFO priority. + +--no-buffering:: + Collect data without buffering. + +-c:: +--count=:: + Event period to sample. + +-o:: +--output=:: + Output file name. + +-i:: +--no-inherit:: + Child tasks do not inherit counters. + +-F:: +--freq=:: + Profile at this frequency. Use 'max' to use the currently maximum + allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate + sysctl. Will throttle down to the currently maximum allowed frequency. + See --strict-freq. + +--strict-freq:: + Fail if the specified frequency can't be used. + +-m:: +--mmap-pages=:: + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. + Also, by adding a comma, the number of mmap pages for AUX + area tracing can be specified. + +--group:: + Put all events in a single event group. This precedes the --event + option and remains only for backward compatibility. See --event. + +-g:: + Enables call-graph (stack chain/backtrace) recording for both + kernel space and user space. + +--call-graph:: + Setup and enable call-graph (stack chain/backtrace) recording, + implies -g. Default is "fp" (for user space). + + The unwinding method used for kernel space is dependent on the + unwinder used by the active kernel configuration, i.e + CONFIG_UNWINDER_FRAME_POINTER (fp) or CONFIG_UNWINDER_ORC (orc) + + Any option specified here controls the method used for user space. + + Valid options are "fp" (frame pointer), "dwarf" (DWARF's CFI - + Call Frame Information) or "lbr" (Hardware Last Branch Record + facility). + + In some systems, where binaries are build with gcc + --fomit-frame-pointer, using the "fp" method will produce bogus + call graphs, using "dwarf", if available (perf tools linked to + the libunwind or libdw library) should be used instead. + Using the "lbr" method doesn't require any compiler options. It + will produce call graphs from the hardware LBR registers. The + main limitation is that it is only available on new Intel + platforms, such as Haswell. It can only get user call chain. It + doesn't work with branch stack sampling at the same time. + + When "dwarf" recording is used, perf also records (user) stack dump + when sampled. Default size of the stack dump is 8192 (bytes). + User can change the size by passing the size after comma like + "--call-graph dwarf,4096". + +-q:: +--quiet:: + Don't print any message, useful for scripting. + +-v:: +--verbose:: + Be more verbose (show counter open errors, etc). + +-s:: +--stat:: + Record per-thread event counts. Use it with 'perf report -T' to see + the values. + +-d:: +--data:: + Record the sample virtual addresses. + +--phys-data:: + Record the sample physical addresses. + +-T:: +--timestamp:: + Record the sample timestamps. Use it with 'perf report -D' to see the + timestamps, for instance. + +-P:: +--period:: + Record the sample period. + +--sample-cpu:: + Record the sample cpu. + +-n:: +--no-samples:: + Don't sample. + +-R:: +--raw-samples:: +Collect raw sample records from all opened counters (default for tracepoint counters). + +-C:: +--cpu:: +Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a +comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. +In per-thread mode with inheritance mode on (default), samples are captured only when +the thread executes on the designated CPUs. Default is to monitor all CPUs. + +-B:: +--no-buildid:: +Do not save the build ids of binaries in the perf.data files. This skips +post processing after recording, which sometimes makes the final step in +the recording process to take a long time, as it needs to process all +events looking for mmap records. The downside is that it can misresolve +symbols if the workload binaries used when recording get locally rebuilt +or upgraded, because the only key available in this case is the +pathname. You can also set the "record.build-id" config variable to +'skip to have this behaviour permanently. + +-N:: +--no-buildid-cache:: +Do not update the buildid cache. This saves some overhead in situations +where the information in the perf.data file (which includes buildids) +is sufficient. You can also set the "record.build-id" config variable to +'no-cache' to have the same effect. + +-G name,...:: +--cgroup name,...:: +monitor only in the container (cgroup) called "name". This option is available only +in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to +container "name" are monitored when they run on the monitored CPUs. Multiple cgroups +can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup +to first event, second cgroup to second event and so on. It is possible to provide +an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have +corresponding events, i.e., they always refer to events defined earlier on the command +line. If the user wants to track multiple events for a specific cgroup, the user can +use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. + +If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this +command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. + +-b:: +--branch-any:: +Enable taken branch stack sampling. Any type of taken branch may be sampled. +This is a shortcut for --branch-filter any. See --branch-filter for more infos. + +-j:: +--branch-filter:: +Enable taken branch stack sampling. Each sample captures a series of consecutive +taken branches. The number of branches captured with each sample depends on the +underlying hardware, the type of branches of interest, and the executed code. +It is possible to select the types of branches captured by enabling filters. The +following filters are defined: + + - any: any type of branches + - any_call: any function call or system call + - any_ret: any function return or system call return + - ind_call: any indirect branch + - call: direct calls, including far (to/from kernel) calls + - u: only when the branch target is at the user level + - k: only when the branch target is in the kernel + - hv: only when the target is at the hypervisor level + - in_tx: only when the target is in a hardware transaction + - no_tx: only when the target is not in a hardware transaction + - abort_tx: only when the target is a hardware transaction abort + - cond: conditional branches + - save_type: save branch type during sampling in case binary is not available later + ++ +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. +The privilege levels may be omitted, in which case, the privilege levels of the associated +event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege +levels are subject to permissions. When sampling on multiple events, branch stack sampling +is enabled for all the sampling events. The sampled branch type is the same for all events. +The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k +Note that this feature may not be available on all processors. + +--weight:: +Enable weightened sampling. An additional weight is recorded per sample and can be +displayed with the weight and local_weight sort keys. This currently works for TSX +abort events and some memory events in precise mode on modern Intel CPUs. + +--namespaces:: +Record events of type PERF_RECORD_NAMESPACES. This enables 'cgroup_id' sort key. + +--all-cgroups:: +Record events of type PERF_RECORD_CGROUP. This enables 'cgroup' sort key. + +--transaction:: +Record transaction flags for transaction related events. + +--per-thread:: +Use per-thread mmaps. By default per-cpu mmaps are created. This option +overrides that and uses per-thread mmaps. A side-effect of that is that +inheritance is automatically disabled. --per-thread is ignored with a warning +if combined with -a or -C options. + +-D:: +--delay=:: +After starting the program, wait msecs before measuring (-1: start with events +disabled). This is useful to filter out the startup phase of the program, which +is often very different. + +-I:: +--intr-regs:: +Capture machine state (registers) at interrupt, i.e., on counter overflows for +each sample. List of captured registers depends on the architecture. This option +is off by default. It is possible to select the registers to sample using their +symbolic names, e.g. on x86, ax, si. To list the available registers use +--intr-regs=\?. To name registers, pass a comma separated list such as +--intr-regs=ax,bx. The list of register is architecture dependent. + +--user-regs:: +Similar to -I, but capture user registers at sample time. To list the available +user registers use --user-regs=\?. + +--running-time:: +Record running and enabled time for read events (:S) + +-k:: +--clockid:: +Sets the clock id to use for the various time fields in the perf_event_type +records. See clock_gettime(). In particular CLOCK_MONOTONIC and +CLOCK_MONOTONIC_RAW are supported, some events might also allow +CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI. + +-S:: +--snapshot:: +Select AUX area tracing Snapshot Mode. This option is valid only with an +AUX area tracing event. Optionally, certain snapshot capturing parameters +can be specified in a string that follows this option: + 'e': take one last snapshot on exit; guarantees that there is at least one + snapshot in the output file; + : if the PMU supports this, specify the desired snapshot size. + +In Snapshot Mode trace data is captured only when signal SIGUSR2 is received +and on exit if the above 'e' option is given. + +--aux-sample[=OPTIONS]:: +Select AUX area sampling. At least one of the events selected by the -e option +must be an AUX area event. Samples on other events will be created containing +data from the AUX area. Optionally sample size may be specified, otherwise it +defaults to 4KiB. + +--proc-map-timeout:: +When processing pre-existing threads /proc/XXX/mmap, it may take a long time, +because the file may be huge. A time out is needed in such cases. +This option sets the time out limit. The default value is 500 ms. + +--switch-events:: +Record context switch events i.e. events of type PERF_RECORD_SWITCH or +PERF_RECORD_SWITCH_CPU_WIDE. In some cases (e.g. Intel PT or CoreSight) +switch events will be enabled automatically, which can be suppressed by +by the option --no-switch-events. + +--clang-path=PATH:: +Path to clang binary to use for compiling BPF scriptlets. +(enabled when BPF support is on) + +--clang-opt=OPTIONS:: +Options passed to clang when compiling BPF scriptlets. +(enabled when BPF support is on) + +--vmlinux=PATH:: +Specify vmlinux path which has debuginfo. +(enabled when BPF prologue is on) + +--buildid-all:: +Record build-id of all DSOs regardless whether it's actually hit or not. + +--aio[=n]:: +Use control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4). +Asynchronous mode is supported only when linking Perf tool with libc library +providing implementation for Posix AIO API. + +--affinity=mode:: +Set affinity mask of trace reading thread according to the policy defined by 'mode' value: + node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer + cpu - thread affinity mask is set to cpu of the processed mmap buffer + +--mmap-flush=number:: + +Specify minimal number of bytes that is extracted from mmap data pages and +processed for output. One can specify the number using B/K/M/G suffixes. + +The maximal allowed value is a quarter of the size of mmaped data pages. + +The default option value is 1 byte which means that every time that the output +writing thread finds some new data in the mmaped buffer the data is extracted, +possibly compressed (-z) and written to the output, perf.data or pipe. + +Larger data chunks are compressed more effectively in comparison to smaller +chunks so extraction of larger chunks from the mmap data pages is preferable +from the perspective of output size reduction. + +Also at some cases executing less output write syscalls with bigger data size +can take less time than executing more output write syscalls with smaller data +size thus lowering runtime profiling overhead. + +-z:: +--compression-level[=n]:: +Produce compressed trace using specified level n (default: 1 - fastest compression, +22 - smallest trace) + +--all-kernel:: +Configure all used events to run in kernel space. + +--all-user:: +Configure all used events to run in user space. + +--kernel-callchains:: +Collect callchains only from kernel space. I.e. this option sets +perf_event_attr.exclude_callchain_user to 1. + +--user-callchains:: +Collect callchains only from user space. I.e. this option sets +perf_event_attr.exclude_callchain_kernel to 1. + +Don't use both --kernel-callchains and --user-callchains at the same time or no +callchains will be collected. + +--timestamp-filename +Append timestamp to output file name. + +--timestamp-boundary:: +Record timestamp boundary (time of first/last samples). + +--switch-output[=mode]:: +Generate multiple perf.data files, timestamp prefixed, switching to a new one +based on 'mode' value: + "signal" - when receiving a SIGUSR2 (default value) or + - when reaching the size threshold, size is expected to + be a number with appended unit character - B/K/M/G +