diff options
Diffstat (limited to 'third_party/rust/packed_simd/perf-guide/src/target-feature')
6 files changed, 136 insertions, 0 deletions
diff --git a/third_party/rust/packed_simd/perf-guide/src/target-feature/attribute.md b/third_party/rust/packed_simd/perf-guide/src/target-feature/attribute.md new file mode 100644 index 0000000000..ee670fea5b --- /dev/null +++ b/third_party/rust/packed_simd/perf-guide/src/target-feature/attribute.md @@ -0,0 +1,5 @@ +# The `target_feature` attribute + +<!-- TODO: +Explain the `#[target_feature]` attribute +--> diff --git a/third_party/rust/packed_simd/perf-guide/src/target-feature/features.md b/third_party/rust/packed_simd/perf-guide/src/target-feature/features.md new file mode 100644 index 0000000000..b93030ca67 --- /dev/null +++ b/third_party/rust/packed_simd/perf-guide/src/target-feature/features.md @@ -0,0 +1,13 @@ +# Enabling target features + +Not all processors of a certain architecture will have SIMD processing units, +and using a SIMD instruction which is not supported will trigger undefined behavior. + +To allow building safe, portable programs, the Rust compiler will **not**, by default, +generate any sort of vector instructions, unless it can statically determine +they are supported. For example, on AMD64, SSE2 support is architecturally guaranteed. +The `x86_64-apple-darwin` target enables up to SSSE3. The get a defintive list of +which features are enabled by default on various platforms, refer to the target +specifications [in the compiler's source code][targets]. + +[targets]: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec diff --git a/third_party/rust/packed_simd/perf-guide/src/target-feature/inlining.md b/third_party/rust/packed_simd/perf-guide/src/target-feature/inlining.md new file mode 100644 index 0000000000..86705102a7 --- /dev/null +++ b/third_party/rust/packed_simd/perf-guide/src/target-feature/inlining.md @@ -0,0 +1,5 @@ +# Inlining + +<!-- TODO: +Explain how the `#[target_feature]` attribute interacts with inlining +--> diff --git a/third_party/rust/packed_simd/perf-guide/src/target-feature/practice.md b/third_party/rust/packed_simd/perf-guide/src/target-feature/practice.md new file mode 100644 index 0000000000..5b55c61c26 --- /dev/null +++ b/third_party/rust/packed_simd/perf-guide/src/target-feature/practice.md @@ -0,0 +1,31 @@ +# Target features in practice + +Using `RUSTFLAGS` will allow the crate being compiled, as well as all its +transitive dependencies to use certain target features. + +A tehnique used to avoid undefined behavior at runtime is to compile and +ship multiple binaries, each compiled with a certain set of features. +This might not be feasible in some cases, and can quickly get out of hand +as more and more vector extensions are added to an architecture. + +Rust can be more flexible: you can build a single binary/library which automatically +picks the best supported vector instructions depending on the host machine. +The trick consists of monomorphizing parts of the code during building, and then +using run-time feature detection to select the right code path when running. + +<!-- TODO +Explain how to create efficient functions that dispatch to different +implementations at run-time without issues (e.g. using `#[inline(always)]` for +the impls, wrapping in `#[target_feature]`, and the wrapping those in a function +that does run-time feature detection). +--> + +**NOTE** (x86 specific): because the AVX (256-bit) registers extend the existing +SSE (128-bit) registers, mixing SSE and AVX instructions in a program can cause +performance issues. + +The solution is to compile all code, even the code written with 128-bit vectors, +with the AVX target feature enabled. This will cause the compiler to prefix the +generated instructions with the [VEX] prefix. + +[VEX]: https://en.wikipedia.org/wiki/VEX_prefix diff --git a/third_party/rust/packed_simd/perf-guide/src/target-feature/runtime.md b/third_party/rust/packed_simd/perf-guide/src/target-feature/runtime.md new file mode 100644 index 0000000000..47ddcc8660 --- /dev/null +++ b/third_party/rust/packed_simd/perf-guide/src/target-feature/runtime.md @@ -0,0 +1,5 @@ +# Detecting host features at runtime + +<!-- TODO: +Explain cost (how it works). +--> diff --git a/third_party/rust/packed_simd/perf-guide/src/target-feature/rustflags.md b/third_party/rust/packed_simd/perf-guide/src/target-feature/rustflags.md new file mode 100644 index 0000000000..f4c1d1304a --- /dev/null +++ b/third_party/rust/packed_simd/perf-guide/src/target-feature/rustflags.md @@ -0,0 +1,77 @@ +# Using RUSTFLAGS + +One of the easiest ways to benefit from SIMD is to allow the compiler +to generate code using certain vector instruction extensions. + +The environment variable `RUSTFLAGS` can be used to pass options for code +generation to the Rust compiler. These flags will affect **all** compiled crates. + +There are two flags which can be used to enable specific vector extensions: + +## target-feature + +- Syntax: `-C target-feature=<features>` + +- Provides the compiler with a comma-separated set of instruction extensions + to enable. + + **Example**: Use `-C target-feature=+sse3,+avx` to enable generating instructions + for [Streaming SIMD Extensions 3](https://en.wikipedia.org/wiki/SSE3) and + [Advanced Vector Extensions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions). + +- To list target triples for all targets supported by Rust, use: + + ```sh + rustc --print target-list + ``` + +- To list all support target features for a certain target triple, use: + + ```sh + rustc --target=${TRIPLE} --print target-features + ``` + +- Note that all CPU features are independent, and will have to be enabled individually. + + **Example**: Setting `-C target-feature=+avx2` will _not_ enable `fma`, even though + all CPUs which support AVX2 also support FMA. To enable both, one has to use + `-C target-feature=+avx2,+fma` + +- Some features also depend on other features, which need to be enabled for the + target instructions to be generated. + + **Example**: Unless `v7` is specified as the target CPU (see below), to enable + NEON on ARM it is necessary to use `-C target-feature=+v7,+neon`. + +## target-cpu + +- Syntax: `-C target-cpu=<cpu>` + +- Sets the identifier of a CPU family / model for which to build and optimize the code. + + **Example**: `RUSTFLAGS='-C target-cpu=cortex-a75'` + +- To list all supported target CPUs for a certain target triple, use: + + ```sh + rustc --target=${TRIPLE} --print target-cpus + ``` + + **Example**: + + ```sh + rustc --target=i686-pc-windows-msvc --print target-cpus + ``` + +- The compiler will translate this into a list of target features. Therefore, + individual feature checks (`#[cfg(target_feature = "...")]`) will still + work properly. + +- It will cause the code generator to optimize the generated code for that + specific CPU model. + +- Using `native` as the CPU model will cause Rust to generate and optimize code + for the CPU running the compiler. It is useful when building programs which you + plan to only use locally. This should never be used when the generated programs + are meant to be run on other computers, such as when packaging for distribution + or cross-compiling. |