diff options
Diffstat (limited to '')
-rw-r--r-- | tools/perf/Documentation/perf-trace.txt | 243 |
1 files changed, 243 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt new file mode 100644 index 000000000..115db9e06 --- /dev/null +++ b/tools/perf/Documentation/perf-trace.txt @@ -0,0 +1,243 @@ +perf-trace(1) +============= + +NAME +---- +perf-trace - strace inspired tool + +SYNOPSIS +-------- +[verse] +'perf trace' +'perf trace record' + +DESCRIPTION +----------- +This command will show the events associated with the target, initially +syscalls, but other system events like pagefaults, task lifetime events, +scheduling events, etc. + +This is a live mode tool in addition to working with perf.data files like +the other perf tools. Files can be generated using the 'perf record' command +but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*'). +Alternatively, 'perf trace record' can be used as a shortcut to +automatically include the raw_syscalls events when writing events to a file. + +The following options apply to perf trace; options to perf trace record are +found in the perf record man page. + +OPTIONS +------- + +-a:: +--all-cpus:: + System-wide collection from all CPUs. + +-e:: +--expr:: +--event:: + List of syscalls and other perf events (tracepoints, HW cache events, + etc) to show. Globbing is supported, e.g.: "epoll_*", "*msg*", etc. + See 'perf list' for a complete list of events. + Prefixing with ! shows all syscalls but the ones specified. You may + need to escape it. + +-D msecs:: +--delay msecs:: +After starting the program, wait msecs before measuring. This is useful to +filter out the startup phase of the program, which is often very different. + +-o:: +--output=:: + Output file name. + +-p:: +--pid=:: + Record events on existing process ID (comma separated list). + +-t:: +--tid=:: + Record events on existing thread ID (comma separated list). + +-u:: +--uid=:: + Record events in threads owned by uid. Name or number. + +-G:: +--cgroup:: + Record events in threads in a cgroup. + + Look for cgroups to set at the /sys/fs/cgroup/perf_event directory, then + remove the /sys/fs/cgroup/perf_event/ part and try: + + perf trace -G A -e sched:*switch + + Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc + _and_ sched:sched_switch to the 'A' cgroup, while: + + perf trace -e sched:*switch -G A + + will only set the sched:sched_switch event to the 'A' cgroup, all the + other events (raw_syscalls:sys_{enter,exit}, etc are left "without" + a cgroup (on the root cgroup, sys wide, etc). + + Multiple cgroups: + + perf trace -G A -e sched:*switch -G B + + the syscall ones go to the 'A' cgroup, the sched:sched_switch goes + to the 'B' cgroup. + +--filter-pids=:: + Filter out events for these pids and for 'trace' itself (comma separated list). + +-v:: +--verbose=:: + Verbosity level. + +--no-inherit:: + Child tasks do not inherit counters. + +-m:: +--mmap-pages=:: + Number of mmap data pages (must be a power of two) or size + specification with appended unit character - B/K/M/G. The + size is rounded up to have nearest pages power of two value. + +-C:: +--cpu:: +Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a +comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. +In per-thread mode with inheritance mode on (default), Events are captured only when +the thread executes on the designated CPUs. Default is to monitor all CPUs. + +--duration:: + Show only events that had a duration greater than N.M ms. + +--sched:: + Accrue thread runtime and provide a summary at the end of the session. + +--failure:: + Show only syscalls that failed, i.e. that returned < 0. + +-i:: +--input:: + Process events from a given perf data file. + +-T:: +--time:: + Print full timestamp rather time relative to first sample. + +--comm:: + Show process COMM right beside its ID, on by default, disable with --no-comm. + +-s:: +--summary:: + Show only a summary of syscalls by thread with min, max, and average times + (in msec) and relative stddev. + +-S:: +--with-summary:: + Show all syscalls followed by a summary by thread with min, max, and + average times (in msec) and relative stddev. + +--tool_stats:: + Show tool stats such as number of times fd->pathname was discovered thru + hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc. + +-f:: +--force:: + Don't complain, do it. + +-F=[all|min|maj]:: +--pf=[all|min|maj]:: + Trace pagefaults. Optionally, you can specify whether you want minor, + major or all pagefaults. Default value is maj. + +--syscalls:: + Trace system calls. This options is enabled by default, disable with + --no-syscalls. + +--call-graph [mode,type,min[,limit],order[,key][,branch]]:: + Setup and enable call-graph (stack chain/backtrace) recording. + See `--call-graph` section in perf-record and perf-report + man pages for details. The ones that are most useful in 'perf trace' + are 'dwarf' and 'lbr', where available, try: 'perf trace --call-graph dwarf'. + + Using this will, for the root user, bump the value of --mmap-pages to 4 + times the maximum for non-root users, based on the kernel.perf_event_mlock_kb + sysctl. This is done only if the user doesn't specify a --mmap-pages value. + +--kernel-syscall-graph:: + Show the kernel callchains on the syscall exit path. + +--max-stack:: + Set the stack depth limit when parsing the callchain, anything + beyond the specified depth will be ignored. Note that at this point + this is just about the presentation part, i.e. the kernel is still + not limiting, the overhead of callchains needs to be set via the + knobs in --call-graph dwarf. + + Implies '--call-graph dwarf' when --call-graph not present on the + command line, on systems where DWARF unwinding was built in. + + Default: /proc/sys/kernel/perf_event_max_stack when present for + live sessions (without --input/-i), 127 otherwise. + +--min-stack:: + Set the stack depth limit when parsing the callchain, anything + below the specified depth will be ignored. Disabled by default. + + Implies '--call-graph dwarf' when --call-graph not present on the + command line, on systems where DWARF unwinding was built in. + +--print-sample:: + Print the PERF_RECORD_SAMPLE PERF_SAMPLE_ info for the + raw_syscalls:sys_{enter,exit} tracepoints, for debugging. + +--proc-map-timeout:: + When processing pre-existing threads /proc/XXX/mmap, it may take a long time, + because the file may be huge. A time out is needed in such cases. + This option sets the time out limit. The default value is 500 ms. + +PAGEFAULTS +---------- + +When tracing pagefaults, the format of the trace is as follows: + +<min|maj>fault [<ip.symbol>+<ip.offset>] => <addr.dso@addr.offset> (<map type><addr level>). + +- min/maj indicates whether fault event is minor or major; +- ip.symbol shows symbol for instruction pointer (the code that generated the + fault); if no debug symbols available, perf trace will print raw IP; +- addr.dso shows DSO for the faulted address; +- map type is either 'd' for non-executable maps or 'x' for executable maps; +- addr level is either 'k' for kernel dso or '.' for user dso. + +For symbols resolution you may need to install debugging symbols. + +Please be aware that duration is currently always 0 and doesn't reflect actual +time it took for fault to be handled! + +When --verbose specified, perf trace tries to print all available information +for both IP and fault address in the form of dso@symbol+offset. + +EXAMPLES +-------- + +Trace only major pagefaults: + + $ perf trace --no-syscalls -F + +Trace syscalls, major and minor pagefaults: + + $ perf trace -F all + + 1416.547 ( 0.000 ms): python/20235 majfault [CRYPTO_push_info_+0x0] => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0@0x61be0 (x.) + + As you can see, there was major pagefault in python process, from + CRYPTO_push_info_ routine which faulted somewhere in libcrypto.so. + +SEE ALSO +-------- +linkperf:perf-record[1], linkperf:perf-script[1] |