summaryrefslogtreecommitdiffstats
path: root/src/doc/reference/src/attributes/codegen.md
blob: 69ad341d1b4c29b38da8a6235e8335bbb2ad4ee9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
# Code generation attributes

The following [attributes] are used for controlling code generation.

## Optimization hints

The `cold` and `inline` [attributes] give suggestions to generate code in a
way that may be faster than what it would do without the hint. The attributes
are only hints, and may be ignored.

Both attributes can be used on [functions]. When applied to a function in a
[trait], they apply only to that function when used as a default function for
a trait implementation and not to all trait implementations. The attributes
have no effect on a trait function without a body.

### The `inline` attribute

The *`inline` [attribute]* suggests that a copy of the attributed function
should be placed in the caller, rather than generating code to call the
function where it is defined.

> ***Note***: The `rustc` compiler automatically inlines functions based on
> internal heuristics. Incorrectly inlining functions can make the program
> slower, so this attribute should be used with care.

There are three ways to use the inline attribute:

* `#[inline]` *suggests* performing an inline expansion.
* `#[inline(always)]` *suggests* that an inline expansion should always be
  performed.
* `#[inline(never)]` *suggests* that an inline expansion should never be
  performed.

> ***Note***: `#[inline]` in every form is a hint, with no *requirements*
> on the language to place a copy of the attributed function in the caller.

### The `cold` attribute

The *`cold` [attribute]* suggests that the attributed function is unlikely to
be called.

## The `no_builtins` attribute

The *`no_builtins` [attribute]* may be applied at the crate level to disable
optimizing certain code patterns to invocations of library functions that are
assumed to exist.

## The `target_feature` attribute

The *`target_feature` [attribute]* may be applied to a function to
enable code generation of that function for specific platform architecture
features. It uses the [_MetaListNameValueStr_] syntax with a single key of
`enable` whose value is a string of comma-separated feature names to enable.

```rust
# #[cfg(target_feature = "avx2")]
#[target_feature(enable = "avx2")]
unsafe fn foo_avx2() {}
```

Each [target architecture] has a set of features that may be enabled. It is an
error to specify a feature for a target architecture that the crate is not
being compiled for.

It is [undefined behavior] to call a function that is compiled with a feature
that is not supported on the current platform the code is running on, *except*
if the platform explicitly documents this to be safe.

Functions marked with `target_feature` are not inlined into a context that
does not support the given features. The `#[inline(always)]` attribute may not
be used with a `target_feature` attribute.

### Available features

The following is a list of the available feature names.

#### `x86` or `x86_64`

Executing code with unsupported features is undefined behavior on this platform.
Hence this platform requires that `#[target_feature]` is only applied to [`unsafe`
functions][unsafe function].

Feature     | Implicitly Enables | Description
------------|--------------------|-------------------
`adx`       |          | [ADX] — Multi-Precision Add-Carry Instruction Extensions
`aes`       | `sse2`   | [AES] — Advanced Encryption Standard
`avx`       | `sse4.2` | [AVX] — Advanced Vector Extensions
`avx2`      | `avx`    | [AVX2] — Advanced Vector Extensions 2
`bmi1`      |          | [BMI1] — Bit Manipulation Instruction Sets
`bmi2`      |          | [BMI2] — Bit Manipulation Instruction Sets 2
`fma`       | `avx`    | [FMA3] — Three-operand fused multiply-add
`fxsr`      |          | [`fxsave`] and [`fxrstor`] — Save and restore x87 FPU, MMX Technology, and SSE State
`lzcnt`     |          | [`lzcnt`] — Leading zeros count
`pclmulqdq` | `sse2`   | [`pclmulqdq`] — Packed carry-less multiplication quadword
`popcnt`    |          | [`popcnt`] — Count of bits set to 1
`rdrand`    |          | [`rdrand`] — Read random number
`rdseed`    |          | [`rdseed`] — Read random seed
`sha`       | `sse2`   | [SHA] — Secure Hash Algorithm
`sse`       |          | [SSE] — Streaming <abbr title="Single Instruction Multiple Data">SIMD</abbr> Extensions
`sse2`      | `sse`    | [SSE2] — Streaming SIMD Extensions 2
`sse3`      | `sse2`   | [SSE3] — Streaming SIMD Extensions 3
`sse4.1`    | `ssse3`  | [SSE4.1] — Streaming SIMD Extensions 4.1
`sse4.2`    | `sse4.1` | [SSE4.2] — Streaming SIMD Extensions 4.2
`ssse3`     | `sse3`   | [SSSE3] — Supplemental Streaming SIMD Extensions 3
`xsave`     |          | [`xsave`] — Save processor extended states
`xsavec`    |          | [`xsavec`] — Save processor extended states with compaction
`xsaveopt`  |          | [`xsaveopt`] — Save processor extended states optimized
`xsaves`    |          | [`xsaves`] — Save processor extended states supervisor

<!-- Keep links near each table to make it easier to move and update. -->

[ADX]: https://en.wikipedia.org/wiki/Intel_ADX
[AES]: https://en.wikipedia.org/wiki/AES_instruction_set
[AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
[AVX2]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX2
[BMI1]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets
[BMI2]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#BMI2
[FMA3]: https://en.wikipedia.org/wiki/FMA_instruction_set
[`fxsave`]: https://www.felixcloutier.com/x86/fxsave
[`fxrstor`]: https://www.felixcloutier.com/x86/fxrstor
[`lzcnt`]: https://www.felixcloutier.com/x86/lzcnt
[`pclmulqdq`]: https://www.felixcloutier.com/x86/pclmulqdq
[`popcnt`]: https://www.felixcloutier.com/x86/popcnt
[`rdrand`]: https://en.wikipedia.org/wiki/RdRand
[`rdseed`]: https://en.wikipedia.org/wiki/RdRand
[SHA]: https://en.wikipedia.org/wiki/Intel_SHA_extensions
[SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
[SSE2]: https://en.wikipedia.org/wiki/SSE2
[SSE3]: https://en.wikipedia.org/wiki/SSE3
[SSE4.1]: https://en.wikipedia.org/wiki/SSE4#SSE4.1
[SSE4.2]: https://en.wikipedia.org/wiki/SSE4#SSE4.2
[SSSE3]: https://en.wikipedia.org/wiki/SSSE3
[`xsave`]: https://www.felixcloutier.com/x86/xsave
[`xsavec`]: https://www.felixcloutier.com/x86/xsavec
[`xsaveopt`]: https://www.felixcloutier.com/x86/xsaveopt
[`xsaves`]: https://www.felixcloutier.com/x86/xsaves

#### `aarch64`

This platform requires that `#[target_feature]` is only applied to [`unsafe`
functions][unsafe function].

Further documentation on these features can be found in the [ARM Architecture
Reference Manual], or elsewhere on [developer.arm.com].

[ARM Architecture Reference Manual]: https://developer.arm.com/documentation/ddi0487/latest
[developer.arm.com]: https://developer.arm.com

> ***Note***: The following pairs of features should both be marked as enabled
> or disabled together if used:
> - `paca` and `pacg`, which LLVM currently implements as one feature.


Feature        | Implicitly Enables | Feature Name
---------------|--------------------|-------------------
`aes`          | `neon`         | FEAT_AES - Advanced <abbr title="Single Instruction Multiple Data">SIMD</abbr> AES instructions
`bf16`         |                | FEAT_BF16 - BFloat16 instructions
`bti`          |                | FEAT_BTI - Branch Target Identification
`crc`          |                | FEAT_CRC - CRC32 checksum instructions
`dit`          |                | FEAT_DIT - Data Independent Timing instructions
`dotprod`      |                | FEAT_DotProd - Advanced SIMD Int8 dot product instructions
`dpb`          |                | FEAT_DPB - Data cache clean to point of persistence
`dpb2`         |                | FEAT_DPB2 - Data cache clean to point of deep persistence
`f32mm`        | `sve`          | FEAT_F32MM - SVE single-precision FP matrix multiply instruction
`f64mm`        | `sve`          | FEAT_F64MM - SVE double-precision FP matrix multiply instruction
`fcma`         | `neon`         | FEAT_FCMA - Floating point complex number support
`fhm`          | `fp16`         | FEAT_FHM - Half-precision FP FMLAL instructions
`flagm`        |                | FEAT_FlagM - Conditional flag manipulation
`fp16`         | `neon`         | FEAT_FP16 - Half-precision FP data processing
`frintts`      |                | FEAT_FRINTTS - Floating-point to int helper instructions
`i8mm`         |                | FEAT_I8MM - Int8 Matrix Multiplication
`jsconv`       | `neon`         | FEAT_JSCVT - JavaScript conversion instruction
`lse`          |                | FEAT_LSE - Large System Extension
`lor`          |                | FEAT_LOR - Limited Ordering Regions extension
`mte`          |                | FEAT_MTE - Memory Tagging Extension
`neon`         |                | FEAT_FP & FEAT_AdvSIMD - Floating Point and Advanced SIMD extension
`pan`          |                | FEAT_PAN - Privileged Access-Never extension
`paca`         |                | FEAT_PAuth - Pointer Authentication (address authentication)
`pacg`         |                | FEAT_PAuth - Pointer Authentication (generic authentication)
`pmuv3`        |                | FEAT_PMUv3 - Performance Monitors extension (v3)
`rand`         |                | FEAT_RNG - Random Number Generator
`ras`          |                | FEAT_RAS - Reliability, Availability and Serviceability extension
`rcpc`         |                | FEAT_LRCPC - Release consistent Processor Consistent
`rcpc2`        | `rcpc`         | FEAT_LRCPC2 - RcPc with immediate offsets
`rdm`          |                | FEAT_RDM - Rounding Double Multiply accumulate
`sb`           |                | FEAT_SB - Speculation Barrier
`sha2`         | `neon`         | FEAT_SHA1 & FEAT_SHA256 - Advanced SIMD SHA instructions
`sha3`         | `sha2`         | FEAT_SHA512 & FEAT_SHA3 - Advanced SIMD SHA instructions
`sm4`          | `neon`         | FEAT_SM3 & FEAT_SM4 - Advanced SIMD SM3/4 instructions
`spe`          |                | FEAT_SPE - Statistical Profiling Extension
`ssbs`         |                | FEAT_SSBS - Speculative Store Bypass Safe
`sve`          | `fp16`         | FEAT_SVE - Scalable Vector Extension
`sve2`         | `sve`          | FEAT_SVE2 - Scalable Vector Extension 2
`sve2-aes`     | `sve2`, `aes`  | FEAT_SVE_AES - SVE AES instructions
`sve2-sm4`     | `sve2`, `sm4`  | FEAT_SVE_SM4 - SVE SM4 instructions
`sve2-sha3`    | `sve2`, `sha3` | FEAT_SVE_SHA3 - SVE SHA3 instructions
`sve2-bitperm` | `sve2`         | FEAT_SVE_BitPerm - SVE Bit Permute
`tme`          |                | FEAT_TME - Transactional Memory Extension
`vh`           |                | FEAT_VHE - Virtualization Host Extensions

#### `wasm32` or `wasm64`

`#[target_feature]` may be used with both safe and
[`unsafe` functions][unsafe function] on Wasm platforms. It is impossible to
cause undefined behavior via the `#[target_feature]` attribute because
attempting to use instructions unsupported by the Wasm engine will fail at load
time without the risk of being interpreted in a way different from what the
compiler expected.

Feature     | Description
------------|-------------------
`simd128`   | [WebAssembly simd proposal][simd128]

[simd128]: https://github.com/webassembly/simd

### Additional information

See the [`target_feature` conditional compilation option] for selectively
enabling or disabling compilation of code based on compile-time settings. Note
that this option is not affected by the `target_feature` attribute, and is
only driven by the features enabled for the entire crate.

See the [`is_x86_feature_detected`] or [`is_aarch64_feature_detected`] macros
in the standard library for runtime feature detection on these platforms.

> Note: `rustc` has a default set of features enabled for each target and CPU.
> The CPU may be chosen with the [`-C target-cpu`] flag. Individual features
> may be enabled or disabled for an entire crate with the
> [`-C target-feature`] flag.

## The `track_caller` attribute

The `track_caller` attribute may be applied to any function with [`"Rust"` ABI][rust-abi]
with the exception of the entry point `fn main`. When applied to functions and methods in
trait declarations, the attribute applies to all implementations. If the trait provides a
default implementation with the attribute, then the attribute also applies to override implementations.

When applied to a function in an `extern` block the attribute must also be applied to any linked
implementations, otherwise undefined behavior results. When applied to a function which is made
available to an `extern` block, the declaration in the `extern` block must also have the attribute,
otherwise undefined behavior results.

### Behavior

Applying the attribute to a function `f` allows code within `f` to get a hint of the [`Location`] of
the "topmost" tracked call that led to `f`'s invocation. At the point of observation, an
implementation behaves as if it walks up the stack from `f`'s frame to find the nearest frame of an
*unattributed* function `outer`, and it returns the [`Location`] of the tracked call in `outer`.

```rust
#[track_caller]
fn f() {
    println!("{}", std::panic::Location::caller());
}
```

> Note: `core` provides [`core::panic::Location::caller`] for observing caller locations. It wraps
> the [`core::intrinsics::caller_location`] intrinsic implemented by `rustc`.

> Note: because the resulting `Location` is a hint, an implementation may halt its walk up the stack
> early. See [Limitations](#limitations) for important caveats.

#### Examples

When `f` is called directly by `calls_f`, code in `f` observes its callsite within `calls_f`:

```rust
# #[track_caller]
# fn f() {
#     println!("{}", std::panic::Location::caller());
# }
fn calls_f() {
    f(); // <-- f() prints this location
}
```

When `f` is called by another attributed function `g` which is in turn called by `calls_g`, code in
both `f` and `g` observes `g`'s callsite within `calls_g`:

```rust
# #[track_caller]
# fn f() {
#     println!("{}", std::panic::Location::caller());
# }
#[track_caller]
fn g() {
    println!("{}", std::panic::Location::caller());
    f();
}

fn calls_g() {
    g(); // <-- g() prints this location twice, once itself and once from f()
}
```

When `g` is called by another attributed function `h` which is in turn called by `calls_h`, all code
in `f`, `g`, and `h` observes `h`'s callsite within `calls_h`:

```rust
# #[track_caller]
# fn f() {
#     println!("{}", std::panic::Location::caller());
# }
# #[track_caller]
# fn g() {
#     println!("{}", std::panic::Location::caller());
#     f();
# }
#[track_caller]
fn h() {
    println!("{}", std::panic::Location::caller());
    g();
}

fn calls_h() {
    h(); // <-- prints this location three times, once itself, once from g(), once from f()
}
```

And so on.

### Limitations

This information is a hint and implementations are not required to preserve it.

In particular, coercing a function with `#[track_caller]` to a function pointer creates a shim which
appears to observers to have been called at the attributed function's definition site, losing actual
caller information across virtual calls. A common example of this coercion is the creation of a
trait object whose methods are attributed.

> Note: The aforementioned shim for function pointers is necessary because `rustc` implements
> `track_caller` in a codegen context by appending an implicit parameter to the function ABI, but
> this would be unsound for an indirect call because the parameter is not a part of the function's
> type and a given function pointer type may or may not refer to a function with the attribute. The
> creation of a shim hides the implicit parameter from callers of the function pointer, preserving
> soundness.

[_MetaListNameValueStr_]: ../attributes.md#meta-item-attribute-syntax
[`-C target-cpu`]: ../../rustc/codegen-options/index.html#target-cpu
[`-C target-feature`]: ../../rustc/codegen-options/index.html#target-feature
[`is_x86_feature_detected`]: ../../std/arch/macro.is_x86_feature_detected.html
[`is_aarch64_feature_detected`]: ../../std/arch/macro.is_aarch64_feature_detected.html
[`target_feature` conditional compilation option]: ../conditional-compilation.md#target_feature
[attribute]: ../attributes.md
[attributes]: ../attributes.md
[functions]: ../items/functions.md
[target architecture]: ../conditional-compilation.md#target_arch
[trait]: ../items/traits.md
[undefined behavior]: ../behavior-considered-undefined.md
[unsafe function]: ../unsafe-keyword.md
[rust-abi]: ../items/external-blocks.md#abi
[`core::intrinsics::caller_location`]: ../../core/intrinsics/fn.caller_location.html
[`core::panic::Location::caller`]: ../../core/panic/struct.Location.html#method.caller
[`Location`]: ../../core/panic/struct.Location.html

## The `instruction_set` attribute

The *`instruction_set` attribute* may be applied to a function to enable code generation for a specific
instruction set supported by the target architecture. It uses the [_MetaListPath_] syntax and a path
comprised of the architecture and instruction set to specify how to generate the code for
architectures where a single program may utilize multiple instruction sets.

The following values are available on targets for the `ARMv4` and `ARMv5te` architectures:

* `arm::a32` - Uses ARM code.
* `arm::t32` - Uses Thumb code.

<!-- ignore: arm-only -->
```rust,ignore
#[instruction_set(arm::a32)]
fn foo_arm_code() {}

#[instruction_set(arm::t32)]
fn bar_thumb_code() {}
```

[_MetaListPath_]: ../attributes.md#meta-item-attribute-syntax