diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-11 08:27:49 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-11 08:27:49 +0000 |
commit | ace9429bb58fd418f0c81d4c2835699bddf6bde6 (patch) | |
tree | b2d64bc10158fdd5497876388cd68142ca374ed3 /tools/perf/Documentation/perf-record.txt | |
parent | Initial commit. (diff) | |
download | linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.tar.xz linux-ace9429bb58fd418f0c81d4c2835699bddf6bde6.zip |
Adding upstream version 6.6.15.upstream/6.6.15
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'tools/perf/Documentation/perf-record.txt')
-rw-r--r-- | tools/perf/Documentation/perf-record.txt | 828 |
1 files changed, 828 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt new file mode 100644 index 0000000000..d5217be012 --- /dev/null +++ b/tools/perf/Documentation/perf-record.txt @@ -0,0 +1,828 @@ +perf-record(1) +============== + +NAME +---- +perf-record - Run a command and record its profile into perf.data + +SYNOPSIS +-------- +[verse] +'perf record' [-e <EVENT> | --event=EVENT] [-a] <command> +'perf record' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>] + +DESCRIPTION +----------- +This command runs a command and gathers a performance counter profile +from it, into perf.data - without displaying anything. + +This file can then be inspected later on, using 'perf report'. + + +OPTIONS +------- +<command>...:: + Any command you can specify in a shell. + +-e:: +--event=:: + Select the PMU event. Selection can be: + + - a symbolic event name (use 'perf list' to list all events) + + - a raw PMU event in the form of rN where N is a hexadecimal value + that represents the raw register encoding with the layout of the + event control registers as described by entries in + /sys/bus/event_source/devices/cpu/format/*. + + - a symbolic or raw PMU event followed by an optional colon + and a list of event modifiers, e.g., cpu-cycles:p. See the + linkperf:perf-list[1] man page for details on event modifiers. + + - a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where + 'param1', 'param2', etc are defined as formats for the PMU in + /sys/bus/event_source/devices/<pmu>/format/*. + + - a symbolically formed event like 'pmu/config=M,config1=N,config3=K/' + + where M, N, K are numbers (in decimal, hex, octal format). Acceptable + values for each of 'config', 'config1' and 'config2' are defined by + corresponding entries in /sys/bus/event_source/devices/<pmu>/format/* + param1 and param2 are defined as formats for the PMU in: + /sys/bus/event_source/devices/<pmu>/format/* + + There are also some parameters which are not defined in .../<pmu>/format/*. + These params can be used to overload default config values per event. + Here are some common parameters: + - 'period': Set event sampling period + - 'freq': Set event sampling frequency + - 'time': Disable/enable time stamping. Acceptable values are 1 for + enabling time stamping. 0 for disabling time stamping. + The default is 1. + - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for + FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and + "no" for disable callgraph. + - 'stack-size': user stack size for dwarf mode + - 'name' : User defined event name. Single quotes (') may be used to + escape symbols in the name from parsing by shell and tool + like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'. + - 'aux-output': Generate AUX records instead of events. This requires + that an AUX area event is also provided. + - 'aux-sample-size': Set sample size for AUX area sampling. If the + '--aux-sample' option has been used, set aux-sample-size=0 to disable + AUX area sampling for the event. + + See the linkperf:perf-list[1] man page for more parameters. + + Note: If user explicitly sets options which conflict with the params, + the value set by the parameters will be overridden. + + Also not defined in .../<pmu>/format/* are PMU driver specific + configuration parameters. Any configuration parameter preceded by + the letter '@' is not interpreted in user space and sent down directly + to the PMU driver. For example: + + perf record -e some_event/@cfg1,@cfg2=config/ ... + + will see 'cfg1' and 'cfg2=config' pushed to the PMU driver associated + with the event for further processing. There is no restriction on + what the configuration parameters are, as long as their semantic is + understood and supported by the PMU driver. + + - a hardware breakpoint event in the form of '\mem:addr[/len][:access]' + where addr is the address in memory you want to break in. + Access is the memory access type (read, write, execute) it can + be passed as follows: '\mem:addr[:[r][w][x]]'. len is the range, + number of bytes from specified addr, which the breakpoint will cover. + If you want to profile read-write accesses in 0x1000, just set + 'mem:0x1000:rw'. + If you want to profile write accesses in [0x1000~1008), just set + 'mem:0x1000/8:w'. + + - a group of events surrounded by a pair of brace ("{event1,event2,...}"). + Each event is separated by commas and the group should be quoted to + prevent the shell interpretation. You also need to use --group on + "perf report" to view group events together. + +--filter=<filter>:: + Event filter. This option should follow an event selector (-e). + If the event is a tracepoint, the filter string will be parsed by + the kernel. If the event is a hardware trace PMU (e.g. Intel PT + or CoreSight), it'll be processed as an address filter. Otherwise + it means a general filter using BPF which can be applied for any + kind of event. + + - tracepoint filters + + In the case of tracepoints, multiple '--filter' options are combined + using '&&'. + + - address filters + + A hardware trace PMU advertises its ability to accept a number of + address filters by specifying a non-zero value in + /sys/bus/event_source/devices/<pmu>/nr_addr_filters. + + Address filters have the format: + + filter|start|stop|tracestop <start> [/ <size>] [@<file name>] + + Where: + - 'filter': defines a region that will be traced. + - 'start': defines an address at which tracing will begin. + - 'stop': defines an address at which tracing will stop. + - 'tracestop': defines a region in which tracing will stop. + + <file name> is the name of the object file, <start> is the offset to the + code to trace in that file, and <size> is the size of the region to + trace. 'start' and 'stop' filters need not specify a <size>. + + If no object file is specified then the kernel is assumed, in which case + the start address must be a current kernel memory address. + + <start> can also be specified by providing the name of a symbol. If the + symbol name is not unique, it can be disambiguated by inserting #n where + 'n' selects the n'th symbol in address order. Alternately #0, #g or #G + select only a global symbol. <size> can also be specified by providing + the name of a symbol, in which case the size is calculated to the end + of that symbol. For 'filter' and 'tracestop' filters, if <size> is + omitted and <start> is a symbol, then the size is calculated to the end + of that symbol. + + If <size> is omitted and <start> is '*', then the start and size will + be calculated from the first and last symbols, i.e. to trace the whole + file. + + If symbol names (or '*') are provided, they must be surrounded by white + space. + + The filter passed to the kernel is not necessarily the same as entered. + To see the filter that is passed, use the -v option. + + The kernel may not be able to configure a trace region if it is not + within a single mapping. MMAP events (or /proc/<pid>/maps) can be + examined to determine if that is a possibility. + + Multiple filters can be separated with space or comma. + + - bpf filters + + A BPF filter can access the sample data and make a decision based on the + data. Users need to set an appropriate sample type to use the BPF + filter. BPF filters need root privilege. + + The sample data field can be specified in lower case letter. Multiple + filters can be separated with comma. For example, + + --filter 'period > 1000, cpu == 1' + or + --filter 'mem_op == load || mem_op == store, mem_lvl > l1' + + The former filter only accept samples with period greater than 1000 AND + CPU number is 1. The latter one accepts either load and store memory + operations but it should have memory level above the L1. Since the + mem_op and mem_lvl fields come from the (memory) data_source, it'd only + work with some events which set the data_source field. + + Also user should request to collect that information (with -d option in + the above case). Otherwise, the following message will be shown. + + $ sudo perf record -e cycles --filter 'mem_op == load' + Error: cycles event does not have PERF_SAMPLE_DATA_SRC + Hint: please add -d option to perf record. + failed to set filter "BPF" on event cycles with 22 (Invalid argument) + + Essentially the BPF filter expression is: + + <term> <operator> <value> (("," | "||") <term> <operator> <value>)* + + The <term> can be one of: + ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr, + code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat, + p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock, + mem_dtlb, mem_blk, mem_hops + + The <operator> can be one of: + ==, !=, >, >=, <, <=, & + + The <value> can be one of: + <number> (for any term) + na, load, store, pfetch, exec (for mem_op) + l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl) + na, none, hit, miss, hitm, fwd, peer (for mem_snoop) + remote (for mem_remote) + na, locked (for mem_locked) + na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb) + na, by_data, by_addr (for mem_blk) + hops0, hops1, hops2, hops3 (for mem_hops) + +--exclude-perf:: + Don't record events issued by perf itself. This option should follow + an event selector (-e) which selects tracepoint event(s). It adds a + filter expression 'common_pid != $PERFPID' to filters. If other + '--filter' exists, the new filter expression will be combined with + them by '&&'. + +-a:: +--all-cpus:: + System-wide collection from all CPUs (default if no target is specified). + +-p:: +--pid=:: + Record events on existing process ID (comma separated list). + +-t:: +--tid=:: + Record events on existing thread ID (comma separated list). + This option also disables inheritance by default. Enable it by adding + --inherit. + +-u:: +--uid=:: + Record events in threads owned by uid. Name or number. + +-r:: +--realtime=:: + Collect data with this RT SCHED_FIFO priority. + +--no-buffering:: + Collect data without buffering. + +-c:: +--count=:: + Event period to sample. + +-o:: +--output=:: + Output file name. + +-i:: +--no-inherit:: + Child tasks do not inherit counters. + +-F:: +--freq=:: + Profile at this frequency. Use 'max' to use the currently maximum + allowed frequency, i.e. the value in the kernel.perf_event_max_sample_rate + sysctl. Will throttle down to the currently maximum allowed frequency. + See --strict-freq. + +--strict-freq:: + Fail if the specified frequency can't be used. + +-m:: +--mmap-pages=:: + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. + Also, by adding a comma, the number of mmap pages for AUX + area tracing can be specified. + +-g:: + Enables call-graph (stack chain/backtrace) recording for both + kernel space and user space. + +--call-graph:: + Setup and enable call-graph (stack chain/backtrace) recording, + implies -g. Default is "fp" (for user space). + + The unwinding method used for kernel space is dependent on the + unwinder used by the active kernel configuration, i.e + CONFIG_UNWINDER_FRAME_POINTER (fp) or CONFIG_UNWINDER_ORC (orc) + + Any option specified here controls the method used for user space. + + Valid options are "fp" (frame pointer), "dwarf" (DWARF's CFI - + Call Frame Information) or "lbr" (Hardware Last Branch Record + facility). + + In some systems, where binaries are build with gcc + --fomit-frame-pointer, using the "fp" method will produce bogus + call graphs, using "dwarf", if available (perf tools linked to + the libunwind or libdw library) should be used instead. + Using the "lbr" method doesn't require any compiler options. It + will produce call graphs from the hardware LBR registers. The + main limitation is that it is only available on new Intel + platforms, such as Haswell. It can only get user call chain. It + doesn't work with branch stack sampling at the same time. + + When "dwarf" recording is used, perf also records (user) stack dump + when sampled. Default size of the stack dump is 8192 (bytes). + User can change the size by passing the size after comma like + "--call-graph dwarf,4096". + + When "fp" recording is used, perf tries to save stack enties + up to the number specified in sysctl.kernel.perf_event_max_stack + by default. User can change the number by passing it after comma + like "--call-graph fp,32". + +-q:: +--quiet:: + Don't print any warnings or messages, useful for scripting. + +-v:: +--verbose:: + Be more verbose (show counter open errors, etc). + +-s:: +--stat:: + Record per-thread event counts. Use it with 'perf report -T' to see + the values. + +-d:: +--data:: + Record the sample virtual addresses. + +--phys-data:: + Record the sample physical addresses. + +--data-page-size:: + Record the sampled data address data page size. + +--code-page-size:: + Record the sampled code address (ip) page size + +-T:: +--timestamp:: + Record the sample timestamps. Use it with 'perf report -D' to see the + timestamps, for instance. + +-P:: +--period:: + Record the sample period. + +--sample-cpu:: + Record the sample cpu. + +--sample-identifier:: + Record the sample identifier i.e. PERF_SAMPLE_IDENTIFIER bit set in + the sample_type member of the struct perf_event_attr argument to the + perf_event_open system call. + +-n:: +--no-samples:: + Don't sample. + +-R:: +--raw-samples:: +Collect raw sample records from all opened counters (default for tracepoint counters). + +-C:: +--cpu:: +Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a +comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. +In per-thread mode with inheritance mode on (default), samples are captured only when +the thread executes on the designated CPUs. Default is to monitor all CPUs. + +-B:: +--no-buildid:: +Do not save the build ids of binaries in the perf.data files. This skips +post processing after recording, which sometimes makes the final step in +the recording process to take a long time, as it needs to process all +events looking for mmap records. The downside is that it can misresolve +symbols if the workload binaries used when recording get locally rebuilt +or upgraded, because the only key available in this case is the +pathname. You can also set the "record.build-id" config variable to +'skip to have this behaviour permanently. + +-N:: +--no-buildid-cache:: +Do not update the buildid cache. This saves some overhead in situations +where the information in the perf.data file (which includes buildids) +is sufficient. You can also set the "record.build-id" config variable to +'no-cache' to have the same effect. + +-G name,...:: +--cgroup name,...:: +monitor only in the container (cgroup) called "name". This option is available only +in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to +container "name" are monitored when they run on the monitored CPUs. Multiple cgroups +can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup +to first event, second cgroup to second event and so on. It is possible to provide +an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have +corresponding events, i.e., they always refer to events defined earlier on the command +line. If the user wants to track multiple events for a specific cgroup, the user can +use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. + +If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this +command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. + +-b:: +--branch-any:: +Enable taken branch stack sampling. Any type of taken branch may be sampled. +This is a shortcut for --branch-filter any. See --branch-filter for more infos. + +-j:: +--branch-filter:: +Enable taken branch stack sampling. Each sample captures a series of consecutive +taken branches. The number of branches captured with each sample depends on the +underlying hardware, the type of branches of interest, and the executed code. +It is possible to select the types of branches captured by enabling filters. The +following filters are defined: + + - any: any type of branches + - any_call: any function call or system call + - any_ret: any function return or system call return + - ind_call: any indirect branch + - ind_jmp: any indirect jump + - call: direct calls, including far (to/from kernel) calls + - u: only when the branch target is at the user level + - k: only when the branch target is in the kernel + - hv: only when the target is at the hypervisor level + - in_tx: only when the target is in a hardware transaction + - no_tx: only when the target is not in a hardware transaction + - abort_tx: only when the target is a hardware transaction abort + - cond: conditional branches + - call_stack: save call stack + - no_flags: don't save branch flags e.g prediction, misprediction etc + - no_cycles: don't save branch cycles + - hw_index: save branch hardware index + - save_type: save branch type during sampling in case binary is not available later + For the platforms with Intel Arch LBR support (12th-Gen+ client or + 4th-Gen Xeon+ server), the save branch type is unconditionally enabled + when the taken branch stack sampling is enabled. + - priv: save privilege state during sampling in case binary is not available later + ++ +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. +The privilege levels may be omitted, in which case, the privilege levels of the associated +event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege +levels are subject to permissions. When sampling on multiple events, branch stack sampling +is enabled for all the sampling events. The sampled branch type is the same for all events. +The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k +Note that this feature may not be available on all processors. + +-W:: +--weight:: +Enable weightened sampling. An additional weight is recorded per sample and can be +displayed with the weight and local_weight sort keys. This currently works for TSX +abort events and some memory events in precise mode on modern Intel CPUs. + +--namespaces:: +Record events of type PERF_RECORD_NAMESPACES. This enables 'cgroup_id' sort key. + +--all-cgroups:: +Record events of type PERF_RECORD_CGROUP. This enables 'cgroup' sort key. + +--transaction:: +Record transaction flags for transaction related events. + +--per-thread:: +Use per-thread mmaps. By default per-cpu mmaps are created. This option +overrides that and uses per-thread mmaps. A side-effect of that is that +inheritance is automatically disabled. --per-thread is ignored with a warning +if combined with -a or -C options. + +-D:: +--delay=:: +After starting the program, wait msecs before measuring (-1: start with events +disabled), or enable events only for specified ranges of msecs (e.g. +-D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable +for 10 msecs, then stop). Note, delaying enabling of events is useful to filter +out the startup phase of the program, which is often very different. + +-I:: +--intr-regs:: +Capture machine state (registers) at interrupt, i.e., on counter overflows for +each sample. List of captured registers depends on the architecture. This option +is off by default. It is possible to select the registers to sample using their +symbolic names, e.g. on x86, ax, si. To list the available registers use +--intr-regs=\?. To name registers, pass a comma separated list such as +--intr-regs=ax,bx. The list of register is architecture dependent. + +--user-regs:: +Similar to -I, but capture user registers at sample time. To list the available +user registers use --user-regs=\?. + +--running-time:: +Record running and enabled time for read events (:S) + +-k:: +--clockid:: +Sets the clock id to use for the various time fields in the perf_event_type +records. See clock_gettime(). In particular CLOCK_MONOTONIC and +CLOCK_MONOTONIC_RAW are supported, some events might also allow +CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI. + +-S:: +--snapshot:: +Select AUX area tracing Snapshot Mode. This option is valid only with an +AUX area tracing event. Optionally, certain snapshot capturing parameters +can be specified in a string that follows this option: + + - 'e': take one last snapshot on exit; guarantees that there is at least one + snapshot in the output file; + - <size>: if the PMU supports this, specify the desired snapshot size. + +In Snapshot Mode trace data is captured only when signal SIGUSR2 is received +and on exit if the above 'e' option is given. + +--aux-sample[=OPTIONS]:: +Select AUX area sampling. At least one of the events selected by the -e option +must be an AUX area event. Samples on other events will be created containing +data from the AUX area. Optionally sample size may be specified, otherwise it +defaults to 4KiB. + +--proc-map-timeout:: +When processing pre-existing threads /proc/XXX/mmap, it may take a long time, +because the file may be huge. A time out is needed in such cases. +This option sets the time out limit. The default value is 500 ms. + +--switch-events:: +Record context switch events i.e. events of type PERF_RECORD_SWITCH or +PERF_RECORD_SWITCH_CPU_WIDE. In some cases (e.g. Intel PT, CoreSight or Arm SPE) +switch events will be enabled automatically, which can be suppressed by +by the option --no-switch-events. + +--vmlinux=PATH:: +Specify vmlinux path which has debuginfo. +(enabled when BPF prologue is on) + +--buildid-all:: +Record build-id of all DSOs regardless whether it's actually hit or not. + +--buildid-mmap:: +Record build ids in mmap2 events, disables build id cache (implies --no-buildid). + +--aio[=n]:: +Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4). +Asynchronous mode is supported only when linking Perf tool with libc library +providing implementation for Posix AIO API. + +--affinity=mode:: +Set affinity mask of trace reading thread according to the policy defined by 'mode' value: + + - node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer + - cpu - thread affinity mask is set to cpu of the processed mmap buffer + +--mmap-flush=number:: + +Specify minimal number of bytes that is extracted from mmap data pages and +processed for output. One can specify the number using B/K/M/G suffixes. + +The maximal allowed value is a quarter of the size of mmaped data pages. + +The default option value is 1 byte which means that every time that the output +writing thread finds some new data in the mmaped buffer the data is extracted, +possibly compressed (-z) and written to the output, perf.data or pipe. + +Larger data chunks are compressed more effectively in comparison to smaller +chunks so extraction of larger chunks from the mmap data pages is preferable +from the perspective of output size reduction. + +Also at some cases executing less output write syscalls with bigger data size +can take less time than executing more output write syscalls with smaller data +size thus lowering runtime profiling overhead. + +-z:: +--compression-level[=n]:: +Produce compressed trace using specified level n (default: 1 - fastest compression, +22 - smallest trace) + +--all-kernel:: +Configure all used events to run in kernel space. + +--all-user:: +Configure all used events to run in user space. + +--kernel-callchains:: +Collect callchains only from kernel space. I.e. this option sets +perf_event_attr.exclude_callchain_user to 1. + +--user-callchains:: +Collect callchains only from user space. I.e. this option sets +perf_event_attr.exclude_callchain_kernel to 1. + +Don't use both --kernel-callchains and --user-callchains at the same time or no +callchains will be collected. + +--timestamp-filename +Append timestamp to output file name. + +--timestamp-boundary:: +Record timestamp boundary (time of first/last samples). + +--switch-output[=mode]:: +Generate multiple perf.data files, timestamp prefixed, switching to a new one +based on 'mode' value: + + - "signal" - when receiving a SIGUSR2 (default value) or + - <size> - when reaching the size threshold, size is expected to + be a number with appended unit character - B/K/M/G + - <time> - when reaching the time threshold, size is expected to + be a number with appended unit character - s/m/h/d + + Note: the precision of the size threshold hugely depends + on your configuration - the number and size of your ring + buffers (-m). It is generally more precise for higher sizes + (like >5M), for lower values expect different sizes. + +A possible use case is to, given an external event, slice the perf.data file +that gets then processed, possibly via a perf script, to decide if that +particular perf.data snapshot should be kept or not. + +Implies --timestamp-filename, --no-buildid and --no-buildid-cache. +The reason for the latter two is to reduce the data file switching +overhead. You can still switch them on with: + + --switch-output --no-no-buildid --no-no-buildid-cache + +--switch-output-event:: +Events that will cause the switch of the perf.data file, auto-selecting +--switch-output=signal, the results are similar as internally the side band +thread will also send a SIGUSR2 to the main one. + +Uses the same syntax as --event, it will just not be recorded, serving only to +switch the perf.data file as soon as the --switch-output event is processed by +a separate sideband thread. + +This sideband thread is also used to other purposes, like processing the +PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF +information, etc. + +--switch-max-files=N:: + +When rotating perf.data with --switch-output, only keep N files. + +--dry-run:: +Parse options then exit. --dry-run can be used to detect errors in cmdline +options. + +'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj +in config file is set to true. + +--synth=TYPE:: +Collect and synthesize given type of events (comma separated). Note that +this option controls the synthesis from the /proc filesystem which represent +task status for pre-existing threads. + +Kernel (and some other) events are recorded regardless of the +choice in this option. For example, --synth=no would have MMAP events for +kernel and modules. + +Available types are: + + - 'task' - synthesize FORK and COMM events for each task + - 'mmap' - synthesize MMAP events for each process (implies 'task') + - 'cgroup' - synthesize CGROUP events for each cgroup + - 'all' - synthesize all events (default) + - 'no' - do not synthesize any of the above events + +--tail-synthesize:: +Instead of collecting non-sample events (for example, fork, comm, mmap) at +the beginning of record, collect them during finalizing an output file. +The collected non-sample events reflects the status of the system when +record is finished. + +--overwrite:: +Makes all events use an overwritable ring buffer. An overwritable ring +buffer works like a flight recorder: when it gets full, the kernel will +overwrite the oldest records, that thus will never make it to the +perf.data file. + +When '--overwrite' and '--switch-output' are used perf records and drops +events until it receives a signal, meaning that something unusual was +detected that warrants taking a snapshot of the most current events, +those fitting in the ring buffer at that moment. + +'overwrite' attribute can also be set or canceled for an event using +config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'. + +Implies --tail-synthesize. + +--kcore:: +Make a copy of /proc/kcore and place it into a directory with the perf data file. + +--max-size=<size>:: +Limit the sample data max size, <size> is expected to be a number with +appended unit character - B/K/M/G + +--num-thread-synthesize:: + The number of threads to run when synthesizing events for existing processes. + By default, the number of threads equals 1. + +ifdef::HAVE_LIBPFM[] +--pfm-events events:: +Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net) +including support for event filters. For example '--pfm-events +inst_retired:any_p:u:c=1:i'. More than one event can be passed to the +option using the comma separator. Hardware events and generic hardware +events cannot be mixed together. The latter must be used with the -e +option. The -e option and this one can be mixed and matched. Events +can be grouped using the {} notation. +endif::HAVE_LIBPFM[] + +--control=fifo:ctl-fifo[,ack-fifo]:: +--control=fd:ctl-fd[,ack-fd]:: +ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows. +Listen on ctl-fd descriptor for command to control measurement. + +Available commands: + + - 'enable' : enable events + - 'disable' : disable events + - 'enable name' : enable event 'name' + - 'disable name' : disable event 'name' + - 'snapshot' : AUX area tracing snapshot). + - 'stop' : stop perf record + - 'ping' : ping + - 'evlist [-v|-g|-F] : display all events + + -F Show just the sample frequency used for each event. + -v Show all fields. + -g Show event group information. + +Measurements can be started with events disabled using --delay=-1 option. Optionally +send control command completion ('ack\n') to ack-fd descriptor to synchronize with the +controlling process. Example of bash shell script to enable and disable events during +measurements: + + #!/bin/bash + + ctl_dir=/tmp/ + + ctl_fifo=${ctl_dir}perf_ctl.fifo + test -p ${ctl_fifo} && unlink ${ctl_fifo} + mkfifo ${ctl_fifo} + exec {ctl_fd}<>${ctl_fifo} + + ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo + test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo} + mkfifo ${ctl_ack_fifo} + exec {ctl_fd_ack}<>${ctl_ack_fifo} + + perf record -D -1 -e cpu-cycles -a \ + --control fd:${ctl_fd},${ctl_fd_ack} \ + -- sleep 30 & + perf_pid=$! + + sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})" + sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})" + + exec {ctl_fd_ack}>&- + unlink ${ctl_ack_fifo} + + exec {ctl_fd}>&- + unlink ${ctl_fifo} + + wait -n ${perf_pid} + exit $? + +--threads=<spec>:: +Write collected trace data into several data files using parallel threads. +<spec> value can be user defined list of masks. Masks separated by colon +define CPUs to be monitored by a thread and affinity mask of that thread +is separated by slash: + + <cpus mask 1>/<affinity mask 1>:<cpus mask 2>/<affinity mask 2>:... + +CPUs or affinity masks must not overlap with other corresponding masks. +Invalid CPUs are ignored, but masks containing only invalid CPUs are not +allowed. + +For example user specification like the following: + + 0,2-4/2-4:1,5-7/5-7 + +specifies parallel threads layout that consists of two threads, +the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4, +the second monitors CPUs 1 and 5-7 with the affinity mask 5-7. + +<spec> value can also be a string meaning predefined parallel threads +layout: + + - cpu - create new data streaming thread for every monitored cpu + - core - create new thread to monitor CPUs grouped by a core + - package - create new thread to monitor CPUs grouped by a package + - numa - create new threed to monitor CPUs grouped by a NUMA domain + +Predefined layouts can be used on systems with large number of CPUs in +order not to spawn multiple per-cpu streaming threads but still avoid LOST +events in data directory files. Option specified with no or empty value +defaults to CPU layout. Masks defined or provided by the option value are +filtered through the mask provided by -C option. + +--debuginfod[=URLs]:: + Specify debuginfod URL to be used when cacheing perf.data binaries, + it follows the same syntax as the DEBUGINFOD_URLS variable, like: + + http://192.168.122.174:8002 + + If the URLs is not specified, the value of DEBUGINFOD_URLS + system environment variable is used. + +--off-cpu:: + Enable off-cpu profiling with BPF. The BPF program will collect + task scheduling information with (user) stacktrace and save them + as sample data of a software event named "offcpu-time". The + sample period will have the time the task slept in nanoseconds. + + Note that BPF can collect stack traces using frame pointer ("fp") + only, as of now. So the applications built without the frame + pointer might see bogus addresses. + +include::intel-hybrid.txt[] + +SEE ALSO +-------- +linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1] |