summaryrefslogtreecommitdiffstats
path: root/src/doc/nomicon/src/unchecked-uninit.md
blob: c61415c970b500264e9ff8f43ec2fe94c497927f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Unchecked Uninitialized Memory

One interesting exception to this rule is working with arrays. Safe Rust doesn't
permit you to partially initialize an array. When you initialize an array, you
can either set every value to the same thing with `let x = [val; N]`, or you can
specify each member individually with `let x = [val1, val2, val3]`.
Unfortunately this is pretty rigid, especially if you need to initialize your
array in a more incremental or dynamic way.

Unsafe Rust gives us a powerful tool to handle this problem:
[`MaybeUninit`]. This type can be used to handle memory that has not been fully
initialized yet.

With `MaybeUninit`, we can initialize an array element-for-element as follows:

```rust
use std::mem::{self, MaybeUninit};

// Size of the array is hard-coded but easy to change (meaning, changing just
// the constant is sufficient). This means we can't use [a, b, c] syntax to
// initialize the array, though, as we would have to keep that in sync
// with `SIZE`!
const SIZE: usize = 10;

let x = {
    // Create an uninitialized array of `MaybeUninit`. The `assume_init` is
    // safe because the type we are claiming to have initialized here is a
    // bunch of `MaybeUninit`s, which do not require initialization.
    let mut x: [MaybeUninit<Box<u32>>; SIZE] = unsafe {
        MaybeUninit::uninit().assume_init()
    };

    // Dropping a `MaybeUninit` does nothing. Thus using raw pointer
    // assignment instead of `ptr::write` does not cause the old
    // uninitialized value to be dropped.
    // Exception safety is not a concern because Box can't panic
    for i in 0..SIZE {
        x[i] = MaybeUninit::new(Box::new(i as u32));
    }

    // Everything is initialized. Transmute the array to the
    // initialized type.
    unsafe { mem::transmute::<_, [Box<u32>; SIZE]>(x) }
};

dbg!(x);
```

This code proceeds in three steps:

1. Create an array of `MaybeUninit<T>`. With current stable Rust, we have to use
   unsafe code for this: we take some uninitialized piece of memory
   (`MaybeUninit::uninit()`) and claim we have fully initialized it
   ([`assume_init()`][assume_init]). This seems ridiculous, because we didn't!
   The reason this is correct is that the array consists itself entirely of
   `MaybeUninit`, which do not actually require initialization. For most other
   types, doing `MaybeUninit::uninit().assume_init()` produces an invalid
   instance of said type, so you got yourself some Undefined Behavior.

2. Initialize the array. The subtle aspect of this is that usually, when we use
   `=` to assign to a value that the Rust type checker considers to already be
   initialized (like `x[i]`), the old value stored on the left-hand side gets
   dropped. This would be a disaster. However, in this case, the type of the
   left-hand side is `MaybeUninit<Box<u32>>`, and dropping that does not do
   anything! See below for some more discussion of this `drop` issue.

3. Finally, we have to change the type of our array to remove the
   `MaybeUninit`. With current stable Rust, this requires a `transmute`.
   This transmute is legal because in memory, `MaybeUninit<T>` looks the same as `T`.

    However, note that in general, `Container<MaybeUninit<T>>>` does *not* look
   the same as `Container<T>`! Imagine if `Container` was `Option`, and `T` was
   `bool`, then `Option<bool>` exploits that `bool` only has two valid values,
   but `Option<MaybeUninit<bool>>` cannot do that because the `bool` does not
   have to be initialized.

    So, it depends on `Container` whether transmuting away the `MaybeUninit` is
   allowed. For arrays, it is (and eventually the standard library will
   acknowledge that by providing appropriate methods).

It's worth spending a bit more time on the loop in the middle, and in particular
the assignment operator and its interaction with `drop`. If we would have
written something like:

<!-- ignore: simplified code -->
```rust,ignore
*x[i].as_mut_ptr() = Box::new(i as u32); // WRONG!
```

we would actually overwrite a `Box<u32>`, leading to `drop` of uninitialized
data, which will cause much sadness and pain.

The correct alternative, if for some reason we cannot use `MaybeUninit::new`, is
to use the [`ptr`] module. In particular, it provides three functions that allow
us to assign bytes to a location in memory without dropping the old value:
[`write`], [`copy`], and [`copy_nonoverlapping`].

* `ptr::write(ptr, val)` takes a `val` and moves it into the address pointed
  to by `ptr`.
* `ptr::copy(src, dest, count)` copies the bits that `count` T's would occupy
  from src to dest. (this is equivalent to C's memmove -- note that the argument
  order is reversed!)
* `ptr::copy_nonoverlapping(src, dest, count)` does what `copy` does, but a
  little faster on the assumption that the two ranges of memory don't overlap.
  (this is equivalent to C's memcpy -- note that the argument order is reversed!)

It should go without saying that these functions, if misused, will cause serious
havoc or just straight up Undefined Behavior. The only things that these
functions *themselves* require is that the locations you want to read and write
are allocated and properly aligned. However, the ways writing arbitrary bits to
arbitrary locations of memory can break things are basically uncountable!

It's worth noting that you don't need to worry about `ptr::write`-style
shenanigans with types which don't implement `Drop` or contain `Drop` types,
because Rust knows not to try to drop them. This is what we relied on in the
above example.

However when working with uninitialized memory you need to be ever-vigilant for
Rust trying to drop values you make like this before they're fully initialized.
Every control path through that variable's scope must initialize the value
before it ends, if it has a destructor.
*[This includes code panicking](unwinding.html)*. `MaybeUninit` helps a bit
here, because it does not implicitly drop its content - but all this really
means in case of a panic is that instead of a double-free of the not yet
initialized parts, you end up with a memory leak of the already initialized
parts.

Note that, to use the `ptr` methods, you need to first obtain a *raw pointer* to
the data you want to initialize. It is illegal to construct a *reference* to
uninitialized data, which implies that you have to be careful when obtaining
said raw pointer:

* For an array of `T`, you can use `base_ptr.add(idx)` where `base_ptr: *mut T`
to compute the address of array index `idx`. This relies on
how arrays are laid out in memory.
* For a struct, however, in general we do not know how it is laid out, and we
also cannot use `&mut base_ptr.field` as that would be creating a
reference. So, you must carefully use the [`addr_of_mut`] macro. This creates
a raw pointer to the field without creating an intermediate reference:

```rust
use std::{ptr, mem::MaybeUninit};

struct Demo {
    field: bool,
}

let mut uninit = MaybeUninit::<Demo>::uninit();
// `&uninit.as_mut().field` would create a reference to an uninitialized `bool`,
// and thus be Undefined Behavior!
let f1_ptr = unsafe { ptr::addr_of_mut!((*uninit.as_mut_ptr()).field) };
unsafe { f1_ptr.write(true); }

let init = unsafe { uninit.assume_init() };
```

One last remark: when reading old Rust code, you might stumble upon the
deprecated `mem::uninitialized` function.  That function used to be the only way
to deal with uninitialized memory on the stack, but it turned out to be
impossible to properly integrate with the rest of the language.  Always use
`MaybeUninit` instead in new code, and port old code over when you get the
opportunity.

And that's about it for working with uninitialized memory! Basically nothing
anywhere expects to be handed uninitialized memory, so if you're going to pass
it around at all, be sure to be *really* careful.

[`MaybeUninit`]: ../core/mem/union.MaybeUninit.html
[assume_init]: ../core/mem/union.MaybeUninit.html#method.assume_init
[`ptr`]: ../core/ptr/index.html
[`addr_of_mut`]: ../core/ptr/macro.addr_of_mut.html
[`write`]: ../core/ptr/fn.write.html
[`copy`]: ../std/ptr/fn.copy.html
[`copy_nonoverlapping`]: ../std/ptr/fn.copy_nonoverlapping.html