summaryrefslogtreecommitdiffstats
path: root/third_party/rust/packed_simd/perf-guide/src/prof/linux.md
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/rust/packed_simd/perf-guide/src/prof/linux.md')
-rw-r--r--third_party/rust/packed_simd/perf-guide/src/prof/linux.md107
1 files changed, 0 insertions, 107 deletions
diff --git a/third_party/rust/packed_simd/perf-guide/src/prof/linux.md b/third_party/rust/packed_simd/perf-guide/src/prof/linux.md
deleted file mode 100644
index 96c7d67bc4..0000000000
--- a/third_party/rust/packed_simd/perf-guide/src/prof/linux.md
+++ /dev/null
@@ -1,107 +0,0 @@
-# Performance profiling on Linux
-
-## Using `perf`
-
-[perf](https://perf.wiki.kernel.org/) is the most powerful performance profiler
-for Linux, featuring support for various hardware Performance Monitoring Units,
-as well as integration with the kernel's performance events framework.
-
-We will only look at how can the `perf` command can be used to profile SIMD code.
-Full system profiling is outside of the scope of this book.
-
-### Recording
-
-The first step is to record a program's execution during an average workload.
-It helps if you can isolate the parts of your program which have performance
-issues, and set up a benchmark which can be easily (re)run.
-
-Build the benchmark binary in release mode, after having enabled debug info:
-
-```sh
-$ cargo build --release
-Finished release [optimized + debuginfo] target(s) in 0.02s
-```
-
-Then use the `perf record` subcommand:
-
-```sh
-$ perf record --call-graph=dwarf ./target/release/my-program
-[ perf record: Woken up 10 times to write data ]
-[ perf record: Captured and wrote 2,356 MB perf.data (292 samples) ]
-```
-
-Instead of using `--call-graph=dwarf`, which can become pretty slow, you can use
-`--call-graph=lbr` if you have a processor with support for Last Branch Record
-(i.e. Intel Haswell and newer).
-
-`perf` will, by default, record the count of CPU cycles it takes to execute
-various parts of your program. You can use the `-e` command line option
-to enable other performance events, such as `cache-misses`. Use `perf list`
-to get a list of all hardware counters supported by your CPU.
-
-### Viewing the report
-
-The next step is getting a bird's eye view of the program's execution.
-`perf` provides a `ncurses`-based interface which will get you started.
-
-Use `perf report` to open a visualization of your program's performance:
-
-```sh
-perf report --hierarchy -M intel
-```
-
-`--hierarchy` will display a tree-like structure of where your program spent
-most of its time. `-M intel` enables disassembly output with Intel syntax, which
-is subjectively more readable than the default AT&T syntax.
-
-Here is the output from profiling the `nbody` benchmark:
-
-```
-- 100,00% nbody
- - 94,18% nbody
- + 93,48% [.] nbody_lib::simd::advance
- + 0,70% [.] nbody_lib::run
- + 5,06% libc-2.28.so
-```
-
-If you move with the arrow keys to any node in the tree, you can the press `a`
-to have `perf` _annotate_ that node. This means it will:
-
-- disassemble the function
-
-- associate every instruction with the percentage of time which was spent executing it
-
-- interleaves the disassembly with the source code,
- assuming it found the debug symbols
- (you can use `s` to toggle this behaviour)
-
-`perf` will, by default, open the instruction which it identified as being the
-hottest spot in the function:
-
-```
-0,76 │ movapd xmm2,xmm0
-0,38 │ movhlps xmm2,xmm0
- │ addpd xmm2,xmm0
- │ unpcklpd xmm1,xmm2
-12,50 │ sqrtpd xmm0,xmm1
-1,52 │ mulpd xmm0,xmm1
-```
-
-In this case, `sqrtpd` will be highlighted in red, since that's the instruction
-which the CPU spends most of its time executing.
-
-## Using Valgrind
-
-Valgrind is a set of tools which initially helped C/C++ programmers find unsafe
-memory accesses in their code. Nowadays the project also has
-
-- a heap profiler called `massif`
-
-- a cache utilization profiler called `cachegrind`
-
-- a call-graph performance profiler called `callgrind`
-
-<!--
-TODO: explain valgrind's dynamic binary translation, warn about massive
-slowdown, talk about `kcachegrind` for a GUI
--->