diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-08-07 13:11:22 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-08-07 13:11:22 +0000 |
commit | b20732900e4636a467c0183a47f7396700f5f743 (patch) | |
tree | 42f079ff82e701ebcb76829974b4caca3e5b6798 /tools/perf/Documentation | |
parent | Adding upstream version 6.8.12. (diff) | |
download | linux-b20732900e4636a467c0183a47f7396700f5f743.tar.xz linux-b20732900e4636a467c0183a47f7396700f5f743.zip |
Adding upstream version 6.9.7.upstream/6.9.7
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/perf-intel-pt.txt | 14 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-list.txt | 1 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-report.txt | 29 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script-python.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script.txt | 20 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-stat.txt | 11 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-top.txt | 32 | ||||
-rw-r--r-- | tools/perf/Documentation/perf.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/tips.txt | 31 |
9 files changed, 121 insertions, 23 deletions
diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt index 2109690b0d..59ab1ff9d7 100644 --- a/tools/perf/Documentation/perf-intel-pt.txt +++ b/tools/perf/Documentation/perf-intel-pt.txt @@ -115,9 +115,13 @@ toggle respectively. perf script also supports higher level ways to dump instruction traces: + perf script --insn-trace=disasm + +or to use the xed disassembler, which requires installing the xed tool +(see XED below): + perf script --insn-trace --xed -Dump all instructions. This requires installing the xed tool (see XED below) Dumping all instructions in a long trace can be fairly slow. It is usually better to start with higher level decoding, like @@ -130,12 +134,12 @@ or and then select a time range of interest. The time range can then be examined in detail with - perf script --time starttime,stoptime --insn-trace --xed + perf script --time starttime,stoptime --insn-trace=disasm While examining the trace it's also useful to filter on specific CPUs using the -C option - perf script --time starttime,stoptime --insn-trace --xed -C 1 + perf script --time starttime,stoptime --insn-trace=disasm -C 1 Dump all instructions in time range on CPU 1. @@ -1306,7 +1310,7 @@ Without timestamps, --per-thread must be specified to distinguish threads. perf script can be used to provide an instruction trace - $ perf script --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21 + $ perf script --guestkallsyms $KALLSYMS --insn-trace=disasm -F+ipc | grep -C10 vmresume | head -21 CPU 0/KVM 1440 ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9 CPU 0/KVM 1440 ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10 CPU 0/KVM 1440 ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11 @@ -1407,7 +1411,7 @@ There were none. 'perf script' can be used to provide an instruction trace showing timestamps - $ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21 + $ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace=disasm -F+ipc | grep -C10 vmresume | head -21 CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9 CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10 CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11 diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 3b12595193..6bf2468f59 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -71,6 +71,7 @@ counted. The following modifiers exist: D - pin the event to the PMU W - group is weak and will fallback to non-group if not schedulable, e - group or event are exclusive and do not share the PMU + b - use BPF aggregration (see perf stat --bpf-counters) The 'p' modifier can be used for specifying how precise the instruction address should be. The 'p' modifier can be specified multiple times: diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 38f59ac064..d8b863e01f 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -531,8 +531,35 @@ include::itrace.txt[] --raw-trace:: When displaying traceevent output, do not use print fmt or plugins. +-H:: --hierarchy:: - Enable hierarchical output. + Enable hierarchical output. In the hierarchy mode, each sort key groups + samples based on the criteria and then sub-divide it using the lower + level sort key. + + For example: + In normal output: + + perf report -s dso,sym + # Overhead Shared Object Symbol + 50.00% [kernel.kallsyms] [k] kfunc1 + 20.00% perf [.] foo + 15.00% [kernel.kallsyms] [k] kfunc2 + 10.00% perf [.] bar + 5.00% libc.so [.] libcall + + In hierarchy output: + + perf report -s dso,sym --hierarchy + # Overhead Shared Object / Symbol + 65.00% [kernel.kallsyms] + 50.00% [k] kfunc1 + 15.00% [k] kfunc2 + 30.00% perf + 20.00% [.] foo + 10.00% [.] bar + 5.00% libc.so + 5.00% [.] libcall --inline:: If a callgraph address belongs to an inlined function, the inline stack diff --git a/tools/perf/Documentation/perf-script-python.txt b/tools/perf/Documentation/perf-script-python.txt index 6a8581012e..13e37e9385 100644 --- a/tools/perf/Documentation/perf-script-python.txt +++ b/tools/perf/Documentation/perf-script-python.txt @@ -642,8 +642,8 @@ SUPPORTED FIELDS Currently supported fields: -ev_name, comm, pid, tid, cpu, ip, time, period, phys_addr, addr, -symbol, symoff, dso, time_enabled, time_running, values, callchain, +ev_name, comm, id, stream_id, pid, tid, cpu, ip, time, period, phys_addr, +addr, symbol, symoff, dso, time_enabled, time_running, values, callchain, brstack, brstacksym, datasrc, datasrc_decode, iregs, uregs, weight, transaction, raw_buf, attr, cpumode. diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index ff9a52e446..005e51df85 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -132,9 +132,10 @@ OPTIONS Comma separated list of fields to print. Options are: comm, tid, pid, time, cpu, event, trace, ip, sym, dso, dsoff, addr, symoff, srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, - brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth, - phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat, - machine_pid, vcpu, cgroup, retire_lat. + brstackinsn, brstackinsnlen, brstackoff, callindent, insn, disasm, + insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size, + code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat. + Field list can be prepended with the type, trace, sw or hw, to indicate to which event type the field list applies. e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace @@ -217,9 +218,9 @@ OPTIONS Instruction Trace decoding. For calls and returns, it will display the name of the symbol indented with spaces to reflect the stack depth. - When doing instruction trace decoding insn and insnlen give the - instruction bytes and the instruction length of the current - instruction. + When doing instruction trace decoding, insn, disasm and insnlen give the + instruction bytes, disassembled instructions (requires libcapstone support) + and the instruction length of the current instruction respectively. The synth field is used by synthesized events which may be created when Instruction Trace decoding. @@ -441,9 +442,10 @@ include::itrace.txt[] will be printed. Each entry has function name and file/line. Enabled by default, disable with --no-inline. ---insn-trace:: - Show instruction stream for intel_pt traces. Combine with --xed to - show disassembly. +--insn-trace[=<raw|disasm>]:: + Show instruction stream in bytes (raw) or disassembled (disasm) + for intel_pt traces. The default is 'raw'. To use xed, combine + 'raw' with --xed to show disassembly done by xed. --xed:: Run xed disassembler on output. Requires installing the xed disassembler. diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 5af2e432b5..29756a87ab 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -308,6 +308,14 @@ use --per-die in addition to -a. (system-wide). The output includes the die number and the number of online processors on that die. This is useful to gauge the amount of aggregation. +--per-cluster:: +Aggregate counts per processor cluster for system-wide mode measurement. This +is a useful mode to detect imbalance between clusters. To enable this mode, +use --per-cluster in addition to -a. (system-wide). The output includes the +cluster number and the number of online processors on that cluster. This is +useful to gauge the amount of aggregation. The information of cluster ID and +related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}. + --per-cache:: Aggregate counts per cache instance for system-wide mode measurements. By default, the aggregation happens for the cache level at the highest index @@ -396,6 +404,9 @@ Aggregate counts per processor socket for system-wide mode measurements. --per-die:: Aggregate counts per processor die for system-wide mode measurements. +--per-cluster:: +Aggregate counts perf processor cluster for system-wide mode measurements. + --per-cache:: Aggregate counts per cache instance for system-wide mode measurements. By default, the aggregation happens for the cache level at the highest index diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 3c202ec080..a754875fa5 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -261,8 +261,38 @@ Default is to monitor all CPUS. --raw-trace:: When displaying traceevent output, do not use print fmt or plugins. +-H:: --hierarchy:: - Enable hierarchy output. + Enable hierarchical output. In the hierarchy mode, each sort key groups + samples based on the criteria and then sub-divide it using the lower + level sort key. + + For example, in normal output: + + perf report -s dso,sym + # + # Overhead Shared Object Symbol + # ........ ................. ........... + 50.00% [kernel.kallsyms] [k] kfunc1 + 20.00% perf [.] foo + 15.00% [kernel.kallsyms] [k] kfunc2 + 10.00% perf [.] bar + 5.00% libc.so [.] libcall + + In hierarchy output: + + perf report -s dso,sym --hierarchy + # + # Overhead Shared Object / Symbol + # .......... ...................... + 65.00% [kernel.kallsyms] + 50.00% [k] kfunc1 + 15.00% [k] kfunc2 + 30.00% perf + 20.00% [.] foo + 10.00% [.] bar + 5.00% libc.so + 5.00% [.] libcall --overwrite:: Enable this to use just the most recent records, which helps in high core count diff --git a/tools/perf/Documentation/perf.txt b/tools/perf/Documentation/perf.txt index a7cf7bc2f9..09f516f3fd 100644 --- a/tools/perf/Documentation/perf.txt +++ b/tools/perf/Documentation/perf.txt @@ -63,6 +63,8 @@ OPTIONS in browser mode perf-event-open - Print perf_event_open() arguments and return value + kmaps - Print kernel and module maps (perf script + and perf report without browser) --debug-file:: Write debug output to a specified file. diff --git a/tools/perf/Documentation/tips.txt b/tools/perf/Documentation/tips.txt index 825745a645..67b326ba00 100644 --- a/tools/perf/Documentation/tips.txt +++ b/tools/perf/Documentation/tips.txt @@ -2,6 +2,7 @@ For a higher level overview, try: perf report --sort comm,dso Sample related events with: perf record -e '{cycles,instructions}:S' Compare performance results with: perf diff [<old file> <new file>] Boolean options have negative forms, e.g.: perf report --no-children +To not accumulate CPU time of children symbols add --no-children Customize output of perf script with: perf script -F event,ip,sym Generate a script for your data: perf script -g <lang> Save output of perf stat using: perf stat record <target workload> @@ -12,32 +13,52 @@ List events using substring match: perf list <keyword> To see list of saved events and attributes: perf evlist -v Use --symfs <dir> if your symbol files are in non-standard locations To see callchains in a more compact form: perf report -g folded +To see call chains by final symbol taking CPU time (bottom up) use perf report -G Show individual samples with: perf script Limit to show entries above 5% only: perf report --percent-limit 5 Profiling branch (mis)predictions with: perf record -b / perf report -To show assembler sample contexts use perf record -b / perf script -F +brstackinsn --xed -Treat branches as callchains: perf report --branch-history -To count events in every 1000 msec: perf stat -I 1000 -Print event counts in CSV format with: perf stat -x, +To show assembler sample context control flow use perf record -b / perf report --samples 10 and then browse context +To adjust path to source files to local file system use perf report --prefix=... --prefix-strip=... +Treat branches as callchains: perf record -b ... ; perf report --branch-history +Show estimate cycles per function and IPC in annotate use perf record -b ... ; perf report --total-cycles +To count events every 1000 msec: perf stat -I 1000 +Print event counts in machine readable CSV format with: perf stat -x\; If you have debuginfo enabled, try: perf report -s sym,srcline For memory address profiling, try: perf mem record / perf mem report For tracepoint events, try: perf report -s trace_fields To record callchains for each sample: perf record -g +If call chains don't work try perf record --call-graph dwarf or --call-graph lbr To record every process run by a user: perf record -u <user> +To show inline functions in call traces add --inline to perf report +To not record events from perf itself add --exclude-perf Skip collecting build-id when recording: perf record -B To change sampling frequency to 100 Hz: perf record -F 100 +To show information about system the samples were collected on use perf report --header +To only collect call graph on one event use perf record -e cpu/cpu-cycles,callgraph=1/,branches ; perf report --show-ref-call-graph +To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ... +To group events which need to be collected together for accuracy use {}: perf record -e {cycles,branches}' ... +To compute metrics for samples use perf record -e '{cycles,instructions}' ... ; perf script -F +metric See assembly instructions with percentage: perf annotate <symbol> If you prefer Intel style assembly, try: perf annotate -M intel +When collecting LBR backtraces use --stitch-lbr to handle more than 32 deep entries: perf record --call-graph lbr ; perf report --stitch-lbr For hierarchical output, try: perf report --hierarchy Order by the overhead of source file name and line number: perf report -s srcline System-wide collection from all CPUs: perf record -a Show current config key-value pairs: perf config --list +To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed) +To trace calls using Processor Trace use perf record -e intel_pt// ... ; perf script --call-trace. Then use perf script --time A-B --insn-trace to look at region of interest. +To measure approximate function latency with Processor Trace use perf record -e intel_pt// ... ; perf script --call-ret-trace +To trace only single function with Processor Trace use perf record --filter 'filter func @ program' -e intel_pt//u ./program ; perf script --insn-trace Show user configuration overrides: perf config --user --list To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node` -To report cacheline events from previous recording: perf c2c report +To analyze cache line scalability issues use perf c2c record ... ; perf c2c report To browse sample contexts use perf report --sample 10 and select in context menu To separate samples by time use perf report --sort time,overhead,sym +To filter subset of samples with report or script add --time X-Y or --cpu A,B,C or --socket-filter ... To set sample time separation other than 100ms with --sort time use --time-quantum Add -I to perf record to sample register values, which will be visible in perf report sample context. To show IPC for sampling periods use perf record -e '{cycles,instructions}:S' and then browse context To show context switches in perf report sample context add --switch-events to perf record. +To show time in nanoseconds in record/report add --ns +To compare hot regions in two workloads use perf record -b -o file ... ; perf diff --stream file1 file2 +To compare scalability of two workload samples use perf diff -c ratio file1 file2 |