1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
|
//! Implementation of mio for Windows using IOCP
//!
//! This module uses I/O Completion Ports (IOCP) on Windows to implement mio's
//! Unix epoll-like interface. Unfortunately these two I/O models are
//! fundamentally incompatible:
//!
//! * IOCP is a completion-based model where work is submitted to the kernel and
//! a program is notified later when the work finished.
//! * epoll is a readiness-based model where the kernel is queried as to what
//! work can be done, and afterwards the work is done.
//!
//! As a result, this implementation for Windows is much less "low level" than
//! the Unix implementation of mio. This design decision was intentional,
//! however.
//!
//! ## What is IOCP?
//!
//! The [official docs][docs] have a comprehensive explanation of what IOCP is,
//! but at a high level it requires the following operations to be executed to
//! perform some I/O:
//!
//! 1. A completion port is created
//! 2. An I/O handle and a token is registered with this completion port
//! 3. Some I/O is issued on the handle. This generally means that an API was
//! invoked with a zeroed `OVERLAPPED` structure. The API will immediately
//! return.
//! 4. After some time, the application queries the I/O port for completed
//! events. The port will returned a pointer to the `OVERLAPPED` along with
//! the token presented at registration time.
//!
//! Many I/O operations can be fired off before waiting on a port, and the port
//! will block execution of the calling thread until an I/O event has completed
//! (or a timeout has elapsed).
//!
//! Currently all of these low-level operations are housed in a separate `miow`
//! crate to provide a 0-cost abstraction over IOCP. This crate uses that to
//! implement all fiddly bits so there's very few actual Windows API calls or
//! `unsafe` blocks as a result.
//!
//! [docs]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%28v=vs.85%29.aspx
//!
//! ## Safety of IOCP
//!
//! Unfortunately for us, IOCP is pretty unsafe in terms of Rust lifetimes and
//! such. When an I/O operation is submitted to the kernel, it involves handing
//! the kernel a few pointers like a buffer to read/write, an `OVERLAPPED`
//! structure pointer, and perhaps some other buffers such as for socket
//! addresses. These pointers all have to remain valid **for the entire I/O
//! operation's duration**.
//!
//! There's no way to define a safe lifetime for these pointers/buffers over
//! the span of an I/O operation, so we're forced to add a layer of abstraction
//! (not 0-cost) to make these APIs safe. Currently this implementation
//! basically just boxes everything up on the heap to give it a stable address
//! and then keys off that most of the time.
//!
//! ## From completion to readiness
//!
//! Translating a completion-based model to a readiness-based model is also no
//! easy task, and a significant portion of this implementation is managing this
//! translation. The basic idea behind this implementation is to issue I/O
//! operations preemptively and then translate their completions to a "I'm
//! ready" event.
//!
//! For example, in the case of reading a `TcpSocket`, as soon as a socket is
//! connected (or registered after an accept) a read operation is executed.
//! While the read is in progress calls to `read` will return `WouldBlock`, and
//! once the read is completed we translate the completion notification into a
//! `readable` event. Once the internal buffer is drained (e.g. all data from it
//! has been read) a read operation is re-issued.
//!
//! Write operations are a little different from reads, and the current
//! implementation is to just schedule a write as soon as `write` is first
//! called. While that write operation is in progress all future calls to
//! `write` will return `WouldBlock`. Completion of the write then translates to
//! a `writable` event. Note that this will probably want to add some layer of
//! internal buffering in the future.
//!
//! ## Buffer Management
//!
//! As there's lots of I/O operations in flight at any one point in time,
//! there's lots of live buffers that need to be juggled around (e.g. this
//! implementation's own internal buffers).
//!
//! Currently all buffers are created for the I/O operation at hand and are then
//! discarded when it completes (this is listed as future work below).
//!
//! ## Callback Management
//!
//! When the main event loop receives a notification that an I/O operation has
//! completed, some work needs to be done to translate that to a set of events
//! or perhaps some more I/O needs to be scheduled. For example after a
//! `TcpStream` is connected it generates a writable event and also schedules a
//! read.
//!
//! To manage all this the `Selector` uses the `OVERLAPPED` pointer from the
//! completion status. The selector assumes that all `OVERLAPPED` pointers are
//! actually pointers to the interior of a `selector::Overlapped` which means
//! that right after the `OVERLAPPED` itself there's a function pointer. This
//! function pointer is given the completion status as well as another callback
//! to push events onto the selector.
//!
//! The callback for each I/O operation doesn't have any environment, so it
//! relies on memory layout and unsafe casting to translate an `OVERLAPPED`
//! pointer (or in this case a `selector::Overlapped` pointer) to a type of
//! `FromRawArc<T>` (see module docs for why this type exists).
//!
//! ## Thread Safety
//!
//! Currently all of the I/O primitives make liberal use of `Arc` and `Mutex`
//! as an implementation detail. The main reason for this is to ensure that the
//! types are `Send` and `Sync`, but the implementations have not been stressed
//! in multithreaded situations yet. As a result, there are bound to be
//! functional surprises in using these concurrently.
//!
//! ## Future Work
//!
//! First up, let's take a look at unimplemented portions of this module:
//!
//! * The `PollOpt::level()` option is currently entirely unimplemented.
//! * Each `EventLoop` currently owns its completion port, but this prevents an
//! I/O handle from being added to multiple event loops (something that can be
//! done on Unix). Additionally, it hinders event loops moving across threads.
//! This should be solved by likely having a global `Selector` which all
//! others then communicate with.
//! * Although Unix sockets don't exist on Windows, there are named pipes and
//! those should likely be bound here in a similar fashion to `TcpStream`.
//!
//! Next up, there are a few performance improvements and optimizations that can
//! still be implemented
//!
//! * Buffer management right now is pretty bad, they're all just allocated
//! right before an I/O operation and discarded right after. There should at
//! least be some form of buffering buffers.
//! * No calls to `write` are internally buffered before being scheduled, which
//! means that writing performance is abysmal compared to Unix. There should
//! be some level of buffering of writes probably.
use std::io;
use std::os::windows::prelude::*;
mod kernel32 {
pub use ::winapi::um::ioapiset::CancelIoEx;
pub use ::winapi::um::winbase::SetFileCompletionNotificationModes;
}
mod winapi {
pub use ::winapi::shared::minwindef::{TRUE, UCHAR};
pub use ::winapi::um::winnt::HANDLE;
}
mod awakener;
#[macro_use]
mod selector;
mod tcp;
mod udp;
mod from_raw_arc;
mod buffer_pool;
pub use self::awakener::Awakener;
pub use self::selector::{Events, Selector, Overlapped, Binding};
pub use self::tcp::{TcpStream, TcpListener};
pub use self::udp::UdpSocket;
#[derive(Copy, Clone)]
enum Family {
V4, V6,
}
unsafe fn cancel(socket: &AsRawSocket,
overlapped: &Overlapped) -> io::Result<()> {
let handle = socket.as_raw_socket() as winapi::HANDLE;
let ret = kernel32::CancelIoEx(handle, overlapped.as_mut_ptr());
if ret == 0 {
Err(io::Error::last_os_error())
} else {
Ok(())
}
}
unsafe fn no_notify_on_instant_completion(handle: winapi::HANDLE) -> io::Result<()> {
// TODO: move those to winapi
const FILE_SKIP_COMPLETION_PORT_ON_SUCCESS: winapi::UCHAR = 1;
const FILE_SKIP_SET_EVENT_ON_HANDLE: winapi::UCHAR = 2;
let flags = FILE_SKIP_COMPLETION_PORT_ON_SUCCESS | FILE_SKIP_SET_EVENT_ON_HANDLE;
let r = kernel32::SetFileCompletionNotificationModes(handle, flags);
if r == winapi::TRUE {
Ok(())
} else {
Err(io::Error::last_os_error())
}
}
|