summaryrefslogtreecommitdiffstats
path: root/man7/sched.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/sched.7')
-rw-r--r--man7/sched.7130
1 files changed, 65 insertions, 65 deletions
diff --git a/man7/sched.7 b/man7/sched.7
index 7854505..7e1212f 100644
--- a/man7/sched.7
+++ b/man7/sched.7
@@ -10,7 +10,7 @@
.\"
.\" Worth looking at: http://rt.wiki.kernel.org/index.php
.\"
-.TH sched 7 2023-02-10 "Linux man-pages 6.05.01"
+.TH sched 7 2024-02-18 "Linux man-pages 6.7"
.SH NAME
sched \- overview of CPU scheduling
.SH DESCRIPTION
@@ -91,12 +91,12 @@ scheduling priority,
.IR sched_priority .
The scheduler makes its decisions based on knowledge of the scheduling
policy and static priority of all threads on the system.
-.PP
+.P
For threads scheduled under one of the normal scheduling policies
(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
\fIsched_priority\fP is not used in scheduling
decisions (it must be specified as 0).
-.PP
+.P
Processes scheduled under one of the real-time policies
(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
\fIsched_priority\fP value in the range 1 (low) to 99 (high).
@@ -110,17 +110,17 @@ Portable programs should use
and
.BR sched_get_priority_max (2)
to find the range of priorities supported for a particular policy.
-.PP
+.P
Conceptually, the scheduler maintains a list of runnable
threads for each possible \fIsched_priority\fP value.
In order to determine which thread runs next, the scheduler looks for
the nonempty list with the highest static priority and selects the
thread at the head of this list.
-.PP
+.P
A thread's scheduling policy determines
where it will be inserted into the list of threads
with equal static priority and how it will move inside this list.
-.PP
+.P
All scheduling is preemptive: if a thread with a higher static
priority becomes ready to run, the currently running thread
will be preempted and
@@ -158,7 +158,7 @@ changes the priority of the running or runnable
thread identified by
.I pid
the effect on the thread's position in the list depends on
-the direction of the change to threads priority:
+the direction of the change to the thread's priority:
.RS
.IP (a) 5
If the thread's priority is raised,
@@ -184,11 +184,11 @@ the list for its priority.
A thread calling
.BR sched_yield (2)
will be put at the end of the list.
-.PP
+.P
No other events will move a thread
scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
runnable threads with equal static priority.
-.PP
+.P
A \fBSCHED_FIFO\fP
thread runs until either it is blocked by an I/O request, it is
preempted by a higher priority thread, or it calls
@@ -224,7 +224,7 @@ one must use the Linux-specific
and
.BR sched_getattr (2)
system calls.
-.PP
+.P
A sporadic task is one that has a sequence of jobs, where each
job is activated at most once per period.
Each job also has a
@@ -242,9 +242,9 @@ is the time at which a task starts its execution.
The
.I absolute deadline
is thus obtained by adding the relative deadline to the arrival time.
-.PP
+.P
The following diagram clarifies these terms:
-.PP
+.P
.in +4n
.EX
arrival/wakeup absolute deadline
@@ -257,7 +257,7 @@ arrival/wakeup absolute deadline
|<-------------- period ------------------->|
.EE
.in
-.PP
+.P
When setting a
.B SCHED_DEADLINE
policy for a thread using
@@ -274,7 +274,7 @@ Deadline to the relative deadline, and Period to the period of the task.
Thus, for
.B SCHED_DEADLINE
scheduling, we have:
-.PP
+.P
.in +4n
.EX
arrival/wakeup absolute deadline
@@ -287,7 +287,7 @@ arrival/wakeup absolute deadline
|<-------------- Period ------------------->|
.EE
.in
-.PP
+.P
The three deadline-scheduling parameters correspond to the
.IR sched_runtime ,
.IR sched_deadline ,
@@ -305,15 +305,15 @@ If
.I sched_period
is specified as 0, then it is made the same as
.IR sched_deadline .
-.PP
+.P
The kernel requires that:
-.PP
+.P
.in +4n
.EX
sched_runtime <= sched_deadline <= sched_period
.EE
.in
-.PP
+.P
.\" See __checkparam_dl in kernel/sched/core.c
In addition, under the current implementation,
all of the parameter values must be at least 1024
@@ -323,10 +323,10 @@ If any of these checks fails,
.BR sched_setattr (2)
fails with the error
.BR EINVAL .
-.PP
+.P
The CBS guarantees non-interference between tasks, by throttling
threads that attempt to over-run their specified Runtime.
-.PP
+.P
To ensure deadline scheduling guarantees,
the kernel must prevent situations where the set of
.B SCHED_DEADLINE
@@ -339,13 +339,13 @@ if it is not,
.BR sched_setattr (2)
fails with the error
.BR EBUSY .
-.PP
+.P
For example, it is required (but not necessarily sufficient) for
the total utilization to be less than or equal to the total number of
CPUs available, where, since each thread can maximally run for
Runtime per Period, that thread's utilization is its
Runtime divided by its Period.
-.PP
+.P
In order to fulfill the guarantees that are made when
a thread is admitted to the
.B SCHED_DEADLINE
@@ -356,7 +356,7 @@ system; if any
.B SCHED_DEADLINE
thread is runnable,
it will preempt any thread scheduled under one of the other policies.
-.PP
+.P
A call to
.BR fork (2)
by a thread scheduled under the
@@ -364,7 +364,7 @@ by a thread scheduled under the
policy fails with the error
.BR EAGAIN ,
unless the thread has its reset-on-fork flag set (see below).
-.PP
+.P
A
.B SCHED_DEADLINE
thread that calls
@@ -383,7 +383,7 @@ processes).
\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
intended for all threads that do not require the special
real-time mechanisms.
-.PP
+.P
The thread to run is chosen from the static
priority 0 list based on a \fIdynamic\fP priority that is determined only
inside this list.
@@ -391,7 +391,7 @@ The dynamic priority is based on the nice value (see below)
and is increased for each time quantum the thread is ready to run,
but denied to run by the scheduler.
This ensures fair progress among all \fBSCHED_OTHER\fP threads.
-.PP
+.P
In the Linux kernel source code, the
.B SCHED_OTHER
policy is actually named
@@ -411,12 +411,12 @@ The nice value can be modified using
.BR setpriority (2),
or
.BR sched_setattr (2).
-.PP
+.P
According to POSIX.1, the nice value is a per-process attribute;
that is, the threads in a process should share a nice value.
However, on Linux, the nice value is a per-thread attribute:
different threads in the same process may have different nice values.
-.PP
+.P
The range of the nice value
varies across UNIX systems.
On modern Linux, the range is \-20 (high priority) to +19 (low priority).
@@ -424,12 +424,12 @@ On some other systems, the range is \-20..20.
Very early Linux kernels (before Linux 2.0) had the range \-infinity..15.
.\" Linux before 1.3.36 had \-infinity..15.
.\" Since Linux 1.3.43, Linux has the range \-20..19.
-.PP
+.P
The degree to which the nice value affects the relative scheduling of
.B SCHED_OTHER
processes likewise varies across UNIX systems and
across Linux kernel versions.
-.PP
+.P
With the advent of the CFS scheduler in Linux 2.6.23,
Linux adopted an algorithm that causes
relative differences in nice values to have a much stronger effect.
@@ -441,14 +441,14 @@ to a process whenever there is any other
higher priority load on the system,
and makes high nice values (\-20) deliver most of the CPU to applications
that require it (e.g., some audio applications).
-.PP
+.P
On Linux, the
.B RLIMIT_NICE
resource limit can be used to define a limit to which
an unprivileged process's nice value can be raised; see
.BR setrlimit (2)
for details.
-.PP
+.P
For further details on the nice value, see the subsections on
the autogroup feature and group scheduling, below.
.\"
@@ -464,7 +464,7 @@ that the thread is CPU-intensive.
Consequently, the scheduler will apply a small scheduling
penalty with respect to wakeup behavior,
so that this thread is mildly disfavored in scheduling decisions.
-.PP
+.P
.\" The following paragraph is drawn largely from the text that
.\" accompanied Ingo Molnar's patch for the implementation of
.\" SCHED_BATCH.
@@ -478,7 +478,7 @@ interactivity causing extra preemptions (between the workload's tasks).
(Since Linux 2.6.23.)
\fBSCHED_IDLE\fP can be used only at static priority 0;
the process nice value has no influence for this policy.
-.PP
+.P
This policy is intended for running jobs at extremely low
priority (lower even than a +19 nice value with the
.B SCHED_OTHER
@@ -508,20 +508,20 @@ flag in
.I attr.sched_flags
when calling
.BR sched_setattr (2).
-.PP
+.P
Note that the constants used with these two APIs have different names.
The state of the reset-on-fork flag can analogously be retrieved using
.BR sched_getscheduler (2)
and
.BR sched_getattr (2).
-.PP
+.P
The reset-on-fork feature is intended for media-playback applications,
and can be used to prevent applications evading the
.B RLIMIT_RTTIME
resource limit (see
.BR getrlimit (2))
by creating multiple child processes.
-.PP
+.P
More precisely, if the reset-on-fork flag is set,
the following rules apply for subsequently created children:
.IP \[bu] 3
@@ -535,7 +535,7 @@ in child processes.
.IP \[bu]
If the calling process has a negative nice value,
the nice value is reset to zero in child processes.
-.PP
+.P
After the reset-on-fork flag has been enabled,
it can be reset only if the thread has the
.B CAP_SYS_NICE
@@ -555,13 +555,13 @@ matches the real or effective user ID of the target thread
(i.e., the thread specified by
.IR pid )
whose policy is being changed.
-.PP
+.P
A thread must be privileged
.RB ( CAP_SYS_NICE )
in order to set or modify a
.B SCHED_DEADLINE
policy.
-.PP
+.P
Since Linux 2.6.12, the
.B RLIMIT_RTPRIO
resource limit defines a ceiling on an unprivileged thread's
@@ -608,7 +608,7 @@ policy so long as its nice value falls within the range permitted by its
.B RLIMIT_NICE
resource limit (see
.BR getrlimit (2)).
-.PP
+.P
Privileged
.RB ( CAP_SYS_NICE )
threads ignore the
@@ -632,7 +632,7 @@ process from freezing the system was to run (at the console)
a shell scheduled under a higher static priority than the tested application.
This allows an emergency kill of tested
real-time applications that do not block or terminate as expected.
-.PP
+.P
Since Linux 2.6.25, there are other techniques for dealing with runaway
real-time and deadline processes.
One of these is to use the
@@ -642,7 +642,7 @@ a real-time process may consume.
See
.BR getrlimit (2)
for details.
-.PP
+.P
Since Linux 2.6.25, Linux also provides two
.I /proc
files that can be used to reserve a certain amount of CPU time
@@ -684,7 +684,7 @@ Child processes inherit the scheduling policy and parameters across a
.BR fork (2).
The scheduling policy and parameters are preserved across
.BR execve (2).
-.PP
+.P
Memory locking is usually needed for real-time processes to avoid
paging delays; this can be done with
.BR mlock (2)
@@ -701,7 +701,7 @@ parallel build processes (i.e., the
.BR make (1)
.B \-j
flag).
-.PP
+.P
This feature operates in conjunction with the
CFS scheduler and requires a kernel that is configured with
.BR CONFIG_SCHED_AUTOGROUP .
@@ -711,7 +711,7 @@ a value of 0 disables the feature, while a value of 1 enables it.
The default value in this file is 1, unless the kernel was booted with the
.I noautogroup
parameter.
-.PP
+.P
A new autogroup is created when a new session is created via
.BR setsid (2);
this happens, for example, when a new terminal window is started.
@@ -721,14 +721,14 @@ inherits its parent's autogroup membership.
Thus, all of the processes in a session are members of the same autogroup.
An autogroup is automatically destroyed when the last process
in the group terminates.
-.PP
+.P
When autogrouping is enabled, all of the members of an autogroup
are placed in the same kernel scheduler "task group".
The CFS scheduler employs an algorithm that equalizes the
distribution of CPU cycles across task groups.
The benefits of this for interactive desktop performance
can be described via the following example.
-.PP
+.P
Suppose that there are two autogroups competing for the same CPU
(i.e., presume either a single CPU system or the use of
.BR taskset (1)
@@ -759,17 +759,17 @@ the scheduler distributes CPU cycles across task groups such that
an autogroup that contains a large number of CPU-bound processes
does not end up hogging CPU cycles at the expense of the other
jobs on the system.
-.PP
+.P
A process's autogroup (task group) membership can be viewed via the file
.IR /proc/ pid /autogroup :
-.PP
+.P
.in +4n
.EX
$ \fBcat /proc/1/autogroup\fP
/autogroup\-1 nice 0
.EE
.in
-.PP
+.P
This file can also be used to modify the CPU bandwidth allocated
to an autogroup.
This is done by writing a number in the "nice" range to the file
@@ -791,7 +791,7 @@ to fail with the error
.\" A patch was posted on 23 Nov 2016
.\" ("sched/autogroup: Fix 64bit kernel nice adjustment";
.\" check later to see in which kernel version it lands.
-.PP
+.P
The autogroup nice setting has the same meaning as the process nice value,
but applies to distribution of CPU cycles to the autogroup as a whole,
based on the relative nice values of other autogroups.
@@ -800,12 +800,12 @@ will be a product of the autogroup's nice value
(compared to other autogroups)
and the process's nice value
(compared to other processes in the same autogroup.
-.PP
+.P
The use of the
.BR cgroups (7)
CPU controller to place processes in cgroups other than the
root CPU cgroup overrides the effect of autogrouping.
-.PP
+.P
The autogroup feature groups only processes scheduled under
non-real-time policies
.RB ( SCHED_OTHER ,
@@ -826,7 +826,7 @@ policies), the CFS scheduler employs a technique known as "group scheduling",
if the kernel was configured with the
.B CONFIG_FAIR_GROUP_SCHED
option (which is typical).
-.PP
+.P
Under group scheduling, threads are scheduled in "task groups".
Task groups have a hierarchical relationship,
rooted under the initial task group on the system,
@@ -856,7 +856,7 @@ If group scheduling was disabled (i.e., the kernel was configured without
.BR CONFIG_FAIR_GROUP_SCHED ),
then all of the processes on the system are notionally placed
in a single task group.
-.PP
+.P
Under group scheduling,
a thread's nice value has an effect for scheduling decisions
.IR "only relative to other threads in the same task group" .
@@ -870,7 +870,7 @@ or
on a process has an effect only for scheduling relative
to other processes executed in the same session
(typically: the same terminal window).
-.PP
+.P
Conversely, for two processes that are (for example)
the sole CPU-bound processes in different sessions
(e.g., different terminal windows,
@@ -886,7 +886,7 @@ A possibly useful workaround here is to use a command such as
the following to modify the autogroup nice value for
.I all
of the processes in a terminal session:
-.PP
+.P
.in +4n
.EX
$ \fBecho 10 > /proc/self/autogroup\fP
@@ -904,17 +904,17 @@ Until the patches have been completely merged into the
mainline kernel,
they must be installed to achieve the best real-time performance.
These patches are named:
-.PP
+.P
.in +4n
.EX
patch\-\fIkernelversion\fP\-rt\fIpatchversion\fP
.EE
.in
-.PP
+.P
and can be downloaded from
.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
.UE .
-.PP
+.P
Without the patches and prior to their full inclusion into the mainline
kernel, the kernel configuration offers only the three preemption classes
.BR CONFIG_PREEMPT_NONE ,
@@ -923,7 +923,7 @@ and
.B CONFIG_PREEMPT_DESKTOP
which respectively provide no, some, and considerable
reduction of the worst-case scheduling latency.
-.PP
+.P
With the patches applied or after their full inclusion into the mainline
kernel, the additional configuration item
.B CONFIG_PREEMPT_RT
@@ -937,7 +937,7 @@ The
.BR cgroups (7)
CPU controller can be used to limit the CPU consumption of
groups of processes.
-.PP
+.P
Originally, Standard Linux was intended as a general-purpose operating
system being able to handle background processes, interactive
applications, and less demanding real-time applications (applications that
@@ -980,10 +980,10 @@ was not possible up to Linux 2.6.17.
.BR capabilities (7),
.BR cpuset (7)
.ad
-.PP
+.P
.I Programming for the real world \- POSIX.4
by Bill O.\& Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
-.PP
+.P
The Linux kernel source files
.IR \%Documentation/\:scheduler/\:sched\-deadline\:.txt ,
.IR \%Documentation/\:scheduler/\:sched\-rt\-group\:.txt ,