1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
|
.\" Copyright (c) International Business Machines orp., 2006
.\"
.\" SPDX-License-Identifier: GPL-2.0-or-later
.\"
.\" HISTORY:
.\" 2006-04-27, created by Eduardo M. Fleury <efleury@br.ibm.com>
.\" with various additions by Michael Kerrisk <mtk.manpages@gmail.com>
.\"
.\"
.TH ioprio_set 2 2024-05-02 "Linux man-pages (unreleased)"
.SH NAME
ioprio_get, ioprio_set \- get/set I/O scheduling class and priority
.SH LIBRARY
Standard C library
.RI ( libc ", " \-lc )
.SH SYNOPSIS
.nf
.BR "#include <linux/ioprio.h> " "/* Definition of " IOPRIO_* " constants */"
.BR "#include <sys/syscall.h> " "/* Definition of " SYS_* " constants */"
.B #include <unistd.h>
.P
.BI "int syscall(SYS_ioprio_get, int " which ", int " who );
.BI "int syscall(SYS_ioprio_set, int " which ", int " who ", int " ioprio );
.fi
.P
.IR Note :
glibc provides no wrappers for these system calls,
necessitating the use of
.BR syscall (2).
.SH DESCRIPTION
The
.BR ioprio_get ()
and
.BR ioprio_set ()
system calls get and set the I/O scheduling class and
priority of one or more threads.
.P
The
.I which
and
.I who
arguments identify the thread(s) on which the system
calls operate.
The
.I which
argument determines how
.I who
is interpreted, and has one of the following values:
.TP
.B IOPRIO_WHO_PROCESS
.I who
is a process ID or thread ID identifying a single process or thread.
If
.I who
is 0, then operate on the calling thread.
.TP
.B IOPRIO_WHO_PGRP
.I who
is a process group ID identifying all the members of a process group.
If
.I who
is 0, then operate on the process group of which the caller is a member.
.TP
.B IOPRIO_WHO_USER
.I who
is a user ID identifying all of the processes that
have a matching real UID.
.\" FIXME . Need to document the behavior when 'who" is specified as 0
.\" See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=652443
.P
If
.I which
is specified as
.B IOPRIO_WHO_PGRP
or
.B IOPRIO_WHO_USER
when calling
.BR ioprio_get (),
and more than one process matches
.IR who ,
then the returned priority will be the highest one found among
all of the matching processes.
One priority is said to be
higher than another one if it belongs to a higher priority
class
.RB ( IOPRIO_CLASS_RT
is the highest priority class;
.B IOPRIO_CLASS_IDLE
is the lowest)
or if it belongs to the same priority class as the other process but
has a higher priority level (a lower priority number means a
higher priority level).
.P
The
.I ioprio
argument given to
.BR ioprio_set ()
is a bit mask that specifies both the scheduling class and the
priority to be assigned to the target process(es).
The following macros are used for assembling and dissecting
.I ioprio
values:
.TP
.BI IOPRIO_PRIO_VALUE( class ", " data )
Given a scheduling
.I class
and priority
.RI ( data ),
this macro combines the two values to produce an
.I ioprio
value, which is returned as the result of the macro.
.TP
.BI IOPRIO_PRIO_CLASS( mask )
Given
.I mask
(an
.I ioprio
value), this macro returns its I/O class component, that is,
one of the values
.BR IOPRIO_CLASS_RT ,
.BR IOPRIO_CLASS_BE ,
or
.BR IOPRIO_CLASS_IDLE .
.TP
.BI IOPRIO_PRIO_DATA( mask )
Given
.I mask
(an
.I ioprio
value), this macro returns its priority
.RI ( data )
component.
.P
See the NOTES section for more
information on scheduling classes and priorities,
as well as the meaning of specifying
.I ioprio
as 0.
.P
I/O priorities are supported for reads and for synchronous
.RB ( O_DIRECT ,
.BR O_SYNC )
writes.
I/O priorities are not supported for asynchronous
writes because they are issued outside the context of the program
dirtying the memory, and thus program-specific priorities do not apply.
.SH RETURN VALUE
On success,
.BR ioprio_get ()
returns the
.I ioprio
value of the process with highest I/O priority of any of the processes
that match the criteria specified in
.I which
and
.IR who .
On error, \-1 is returned, and
.I errno
is set to indicate the error.
.P
On success,
.BR ioprio_set ()
returns 0.
On error, \-1 is returned, and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EINVAL
Invalid value for
.I which
or
.IR ioprio .
Refer to the NOTES section for available scheduler
classes and priority levels for
.IR ioprio .
.TP
.B EPERM
The calling process does not have the privilege needed to assign this
.I ioprio
to the specified process(es).
See the NOTES section for more information on required
privileges for
.BR ioprio_set ().
.TP
.B ESRCH
No process(es) could be found that matched the specification in
.I which
and
.IR who .
.SH STANDARDS
Linux.
.SH HISTORY
Linux 2.6.13.
.SH NOTES
Two or more processes or threads can share an I/O context.
This will be the case when
.BR clone (2)
was called with the
.B CLONE_IO
flag.
However, by default, the distinct threads of a process will
.B not
share the same I/O context.
This means that if you want to change the I/O
priority of all threads in a process, you may need to call
.BR ioprio_set ()
on each of the threads.
The thread ID that you would need for this operation
is the one that is returned by
.BR gettid (2)
or
.BR clone (2).
.P
These system calls have an effect only when used
in conjunction with an I/O scheduler that supports I/O priorities.
As at kernel 2.6.17 the only such scheduler is the Completely Fair Queuing
(CFQ) I/O scheduler.
.P
If no I/O scheduler has been set for a thread,
then by default the I/O priority will follow the CPU nice value
.RB ( setpriority (2)).
Before Linux 2.6.24,
once an I/O priority had been set using
.BR ioprio_set (),
there was no way to reset the I/O scheduling behavior to the default.
Since Linux 2.6.24,
.\" commit 8ec680e4c3ec818efd1652f15199ed1c216ab550
specifying
.I ioprio
as 0 can be used to reset to the default I/O scheduling behavior.
.SS Selecting an I/O scheduler
I/O schedulers are selected on a per-device basis via the special
file
.IR /sys/block/ device /queue/scheduler .
.P
One can view the current I/O scheduler via the
.I /sys
filesystem.
For example, the following command
displays a list of all schedulers currently loaded in the kernel:
.P
.in +4n
.EX
.RB "$" " cat /sys/block/sda/queue/scheduler"
noop anticipatory deadline [cfq]
.EE
.in
.P
The scheduler surrounded by brackets is the one actually
in use for the device
.RI ( sda
in the example).
Setting another scheduler is done by writing the name of the
new scheduler to this file.
For example, the following command will set the
scheduler for the
.I sda
device to
.IR cfq :
.P
.in +4n
.EX
.RB "$" " su"
Password:
.RB "#" " echo cfq > /sys/block/sda/queue/scheduler"
.EE
.in
.\"
.SS The Completely Fair Queuing (CFQ) I/O scheduler
Since version 3 (also known as CFQ Time Sliced), CFQ implements
I/O nice levels similar to those
of CPU scheduling.
These nice levels are grouped into three scheduling classes,
each one containing one or more priority levels:
.TP
.BR IOPRIO_CLASS_RT " (1)"
This is the real-time I/O class.
This scheduling class is given
higher priority than any other class:
processes from this class are
given first access to the disk every time.
Thus, this I/O class needs to be used with some
care: one I/O real-time process can starve the entire system.
Within the real-time class,
there are 8 levels of class data (priority) that determine exactly
how much time this process needs the disk for on each service.
The highest real-time priority level is 0; the lowest is 7.
In the future, this might change to be more directly mappable to
performance, by passing in a desired data rate instead.
.TP
.BR IOPRIO_CLASS_BE " (2)"
This is the best-effort scheduling class,
which is the default for any process
that hasn't set a specific I/O priority.
The class data (priority) determines how much
I/O bandwidth the process will get.
Best-effort priority levels are analogous to CPU nice values
(see
.BR getpriority (2)).
The priority level determines a priority relative
to other processes in the best-effort scheduling class.
Priority levels range from 0 (highest) to 7 (lowest).
.TP
.BR IOPRIO_CLASS_IDLE " (3)"
This is the idle scheduling class.
Processes running at this level get I/O
time only when no one else needs the disk.
The idle class has no class data.
Attention is required when assigning this priority class to a process,
since it may become starved if higher priority processes are
constantly accessing the disk.
.P
Refer to the kernel source file
.I Documentation/block/ioprio.txt
for more information on the CFQ I/O Scheduler and an example program.
.SS Required permissions to set I/O priorities
Permission to change a process's priority is granted or denied based
on two criteria:
.TP
.B "Process ownership"
An unprivileged process may set the I/O priority only for a process
whose real UID
matches the real or effective UID of the calling process.
A process which has the
.B CAP_SYS_NICE
capability can change the priority of any process.
.TP
.B "What is the desired priority"
Attempts to set very high priorities
.RB ( IOPRIO_CLASS_RT )
require the
.B CAP_SYS_ADMIN
capability.
Up to Linux 2.6.24 also required
.B CAP_SYS_ADMIN
to set a very low priority
.RB ( IOPRIO_CLASS_IDLE ),
but since Linux 2.6.25, this is no longer required.
.P
A call to
.BR ioprio_set ()
must follow both rules, or the call will fail with the error
.BR EPERM .
.SH BUGS
.\" 6 May 07: Bug report raised:
.\" https://www.sourceware.org/bugzilla/show_bug.cgi?id=4464
.\" Ulrich Drepper replied that he wasn't going to add these
.\" to glibc.
glibc does not yet provide a suitable header file defining
the function prototypes and macros described on this page.
Suitable definitions can be found in
.IR linux/ioprio.h .
.SH SEE ALSO
.BR ionice (1),
.BR getpriority (2),
.BR open (2),
.BR capabilities (7),
.BR cgroups (7)
.P
.I Documentation/block/ioprio.txt
in the Linux kernel source tree
|