summaryrefslogtreecommitdiffstats
path: root/man7/sched.7
diff options
context:
space:
mode:
Diffstat (limited to 'man7/sched.7')
-rw-r--r--man7/sched.7992
1 files changed, 0 insertions, 992 deletions
diff --git a/man7/sched.7 b/man7/sched.7
deleted file mode 100644
index 7e1212f..0000000
--- a/man7/sched.7
+++ /dev/null
@@ -1,992 +0,0 @@
-.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
-.\" and Copyright (C) 2014 Peter Zijlstra <peterz@infradead.org>
-.\" and Copyright (C) 2014 Juri Lelli <juri.lelli@gmail.com>
-.\" Various pieces from the old sched_setscheduler(2) page
-.\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
-.\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
-.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
-.\"
-.\" SPDX-License-Identifier: GPL-2.0-or-later
-.\"
-.\" Worth looking at: http://rt.wiki.kernel.org/index.php
-.\"
-.TH sched 7 2024-02-18 "Linux man-pages 6.7"
-.SH NAME
-sched \- overview of CPU scheduling
-.SH DESCRIPTION
-Since Linux 2.6.23, the default scheduler is CFS,
-the "Completely Fair Scheduler".
-The CFS scheduler replaced the earlier "O(1)" scheduler.
-.\"
-.SS API summary
-Linux provides the following system calls for controlling
-the CPU scheduling behavior, policy, and priority of processes
-(or, more precisely, threads).
-.TP
-.BR nice (2)
-Set a new nice value for the calling thread,
-and return the new nice value.
-.TP
-.BR getpriority (2)
-Return the nice value of a thread, a process group,
-or the set of threads owned by a specified user.
-.TP
-.BR setpriority (2)
-Set the nice value of a thread, a process group,
-or the set of threads owned by a specified user.
-.TP
-.BR sched_setscheduler (2)
-Set the scheduling policy and parameters of a specified thread.
-.TP
-.BR sched_getscheduler (2)
-Return the scheduling policy of a specified thread.
-.TP
-.BR sched_setparam (2)
-Set the scheduling parameters of a specified thread.
-.TP
-.BR sched_getparam (2)
-Fetch the scheduling parameters of a specified thread.
-.TP
-.BR sched_get_priority_max (2)
-Return the maximum priority available in a specified scheduling policy.
-.TP
-.BR sched_get_priority_min (2)
-Return the minimum priority available in a specified scheduling policy.
-.TP
-.BR sched_rr_get_interval (2)
-Fetch the quantum used for threads that are scheduled under
-the "round-robin" scheduling policy.
-.TP
-.BR sched_yield (2)
-Cause the caller to relinquish the CPU,
-so that some other thread be executed.
-.TP
-.BR sched_setaffinity (2)
-(Linux-specific)
-Set the CPU affinity of a specified thread.
-.TP
-.BR sched_getaffinity (2)
-(Linux-specific)
-Get the CPU affinity of a specified thread.
-.TP
-.BR sched_setattr (2)
-Set the scheduling policy and parameters of a specified thread.
-This (Linux-specific) system call provides a superset of the functionality of
-.BR sched_setscheduler (2)
-and
-.BR sched_setparam (2).
-.TP
-.BR sched_getattr (2)
-Fetch the scheduling policy and parameters of a specified thread.
-This (Linux-specific) system call provides a superset of the functionality of
-.BR sched_getscheduler (2)
-and
-.BR sched_getparam (2).
-.\"
-.SS Scheduling policies
-The scheduler is the kernel component that decides which runnable thread
-will be executed by the CPU next.
-Each thread has an associated scheduling policy and a \fIstatic\fP
-scheduling priority,
-.IR sched_priority .
-The scheduler makes its decisions based on knowledge of the scheduling
-policy and static priority of all threads on the system.
-.P
-For threads scheduled under one of the normal scheduling policies
-(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
-\fIsched_priority\fP is not used in scheduling
-decisions (it must be specified as 0).
-.P
-Processes scheduled under one of the real-time policies
-(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
-\fIsched_priority\fP value in the range 1 (low) to 99 (high).
-(As the numbers imply, real-time threads always have higher priority
-than normal threads.)
-Note well: POSIX.1 requires an implementation to support only a
-minimum 32 distinct priority levels for the real-time policies,
-and some systems supply just this minimum.
-Portable programs should use
-.BR sched_get_priority_min (2)
-and
-.BR sched_get_priority_max (2)
-to find the range of priorities supported for a particular policy.
-.P
-Conceptually, the scheduler maintains a list of runnable
-threads for each possible \fIsched_priority\fP value.
-In order to determine which thread runs next, the scheduler looks for
-the nonempty list with the highest static priority and selects the
-thread at the head of this list.
-.P
-A thread's scheduling policy determines
-where it will be inserted into the list of threads
-with equal static priority and how it will move inside this list.
-.P
-All scheduling is preemptive: if a thread with a higher static
-priority becomes ready to run, the currently running thread
-will be preempted and
-returned to the wait list for its static priority level.
-The scheduling policy determines the
-ordering only within the list of runnable threads with equal static
-priority.
-.SS SCHED_FIFO: First in-first out scheduling
-\fBSCHED_FIFO\fP can be used only with static priorities higher than
-0, which means that when a \fBSCHED_FIFO\fP thread becomes runnable,
-it will always immediately preempt any currently running
-\fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
-\fBSCHED_FIFO\fP is a simple scheduling
-algorithm without time slicing.
-For threads scheduled under the
-\fBSCHED_FIFO\fP policy, the following rules apply:
-.IP \[bu] 3
-A running \fBSCHED_FIFO\fP thread that has been preempted by another thread of
-higher priority will stay at the head of the list for its priority and
-will resume execution as soon as all threads of higher priority are
-blocked again.
-.IP \[bu]
-When a blocked \fBSCHED_FIFO\fP thread becomes runnable, it
-will be inserted at the end of the list for its priority.
-.IP \[bu]
-If a call to
-.BR sched_setscheduler (2),
-.BR sched_setparam (2),
-.BR sched_setattr (2),
-.BR pthread_setschedparam (3),
-or
-.BR pthread_setschedprio (3)
-changes the priority of the running or runnable
-.B SCHED_FIFO
-thread identified by
-.I pid
-the effect on the thread's position in the list depends on
-the direction of the change to the thread's priority:
-.RS
-.IP (a) 5
-If the thread's priority is raised,
-it is placed at the end of the list for its new priority.
-As a consequence,
-it may preempt a currently running thread with the same priority.
-.IP (b)
-If the thread's priority is unchanged,
-its position in the run list is unchanged.
-.IP (c)
-If the thread's priority is lowered,
-it is placed at the front of the list for its new priority.
-.RE
-.IP
-According to POSIX.1-2008,
-changes to a thread's priority (or policy) using any mechanism other than
-.BR pthread_setschedprio (3)
-should result in the thread being placed at the end of
-the list for its priority.
-.\" In Linux 2.2.x and Linux 2.4.x, the thread is placed at the front of the queue
-.\" In Linux 2.0.x, the Right Thing happened: the thread went to the back -- MTK
-.IP \[bu]
-A thread calling
-.BR sched_yield (2)
-will be put at the end of the list.
-.P
-No other events will move a thread
-scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
-runnable threads with equal static priority.
-.P
-A \fBSCHED_FIFO\fP
-thread runs until either it is blocked by an I/O request, it is
-preempted by a higher priority thread, or it calls
-.BR sched_yield (2).
-.SS SCHED_RR: Round-robin scheduling
-\fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
-Everything
-described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
-except that each thread is allowed to run only for a maximum time
-quantum.
-If a \fBSCHED_RR\fP thread has been running for a time
-period equal to or longer than the time quantum, it will be put at the
-end of the list for its priority.
-A \fBSCHED_RR\fP thread that has
-been preempted by a higher priority thread and subsequently resumes
-execution as a running thread will complete the unexpired portion of
-its round-robin time quantum.
-The length of the time quantum can be
-retrieved using
-.BR sched_rr_get_interval (2).
-.\" On Linux 2.4, the length of the RR interval is influenced
-.\" by the process nice value -- MTK
-.\"
-.SS SCHED_DEADLINE: Sporadic task model deadline scheduling
-Since Linux 3.14, Linux provides a deadline scheduling policy
-.RB ( SCHED_DEADLINE ).
-This policy is currently implemented using
-GEDF (Global Earliest Deadline First)
-in conjunction with CBS (Constant Bandwidth Server).
-To set and fetch this policy and associated attributes,
-one must use the Linux-specific
-.BR sched_setattr (2)
-and
-.BR sched_getattr (2)
-system calls.
-.P
-A sporadic task is one that has a sequence of jobs, where each
-job is activated at most once per period.
-Each job also has a
-.IR "relative deadline" ,
-before which it should finish execution, and a
-.IR "computation time" ,
-which is the CPU time necessary for executing the job.
-The moment when a task wakes up
-because a new job has to be executed is called the
-.I arrival time
-(also referred to as the request time or release time).
-The
-.I start time
-is the time at which a task starts its execution.
-The
-.I absolute deadline
-is thus obtained by adding the relative deadline to the arrival time.
-.P
-The following diagram clarifies these terms:
-.P
-.in +4n
-.EX
-arrival/wakeup absolute deadline
- | start time |
- | | |
- v v v
------x--------xooooooooooooooooo--------x--------x---
- |<- comp. time ->|
- |<------- relative deadline ------>|
- |<-------------- period ------------------->|
-.EE
-.in
-.P
-When setting a
-.B SCHED_DEADLINE
-policy for a thread using
-.BR sched_setattr (2),
-one can specify three parameters:
-.IR Runtime ,
-.IR Deadline ,
-and
-.IR Period .
-These parameters do not necessarily correspond to the aforementioned terms:
-usual practice is to set Runtime to something bigger than the average
-computation time (or worst-case execution time for hard real-time tasks),
-Deadline to the relative deadline, and Period to the period of the task.
-Thus, for
-.B SCHED_DEADLINE
-scheduling, we have:
-.P
-.in +4n
-.EX
-arrival/wakeup absolute deadline
- | start time |
- | | |
- v v v
------x--------xooooooooooooooooo--------x--------x---
- |<-- Runtime ------->|
- |<----------- Deadline ----------->|
- |<-------------- Period ------------------->|
-.EE
-.in
-.P
-The three deadline-scheduling parameters correspond to the
-.IR sched_runtime ,
-.IR sched_deadline ,
-and
-.I sched_period
-fields of the
-.I sched_attr
-structure; see
-.BR sched_setattr (2).
-These fields express values in nanoseconds.
-.\" FIXME It looks as though specifying sched_period as 0 means
-.\" "make sched_period the same as sched_deadline".
-.\" This needs to be documented.
-If
-.I sched_period
-is specified as 0, then it is made the same as
-.IR sched_deadline .
-.P
-The kernel requires that:
-.P
-.in +4n
-.EX
-sched_runtime <= sched_deadline <= sched_period
-.EE
-.in
-.P
-.\" See __checkparam_dl in kernel/sched/core.c
-In addition, under the current implementation,
-all of the parameter values must be at least 1024
-(i.e., just over one microsecond,
-which is the resolution of the implementation), and less than 2\[ha]63.
-If any of these checks fails,
-.BR sched_setattr (2)
-fails with the error
-.BR EINVAL .
-.P
-The CBS guarantees non-interference between tasks, by throttling
-threads that attempt to over-run their specified Runtime.
-.P
-To ensure deadline scheduling guarantees,
-the kernel must prevent situations where the set of
-.B SCHED_DEADLINE
-threads is not feasible (schedulable) within the given constraints.
-The kernel thus performs an admittance test when setting or changing
-.B SCHED_DEADLINE
-policy and attributes.
-This admission test calculates whether the change is feasible;
-if it is not,
-.BR sched_setattr (2)
-fails with the error
-.BR EBUSY .
-.P
-For example, it is required (but not necessarily sufficient) for
-the total utilization to be less than or equal to the total number of
-CPUs available, where, since each thread can maximally run for
-Runtime per Period, that thread's utilization is its
-Runtime divided by its Period.
-.P
-In order to fulfill the guarantees that are made when
-a thread is admitted to the
-.B SCHED_DEADLINE
-policy,
-.B SCHED_DEADLINE
-threads are the highest priority (user controllable) threads in the
-system; if any
-.B SCHED_DEADLINE
-thread is runnable,
-it will preempt any thread scheduled under one of the other policies.
-.P
-A call to
-.BR fork (2)
-by a thread scheduled under the
-.B SCHED_DEADLINE
-policy fails with the error
-.BR EAGAIN ,
-unless the thread has its reset-on-fork flag set (see below).
-.P
-A
-.B SCHED_DEADLINE
-thread that calls
-.BR sched_yield (2)
-will yield the current job and wait for a new period to begin.
-.\"
-.\" FIXME Calling sched_getparam() on a SCHED_DEADLINE thread
-.\" fails with EINVAL, but sched_getscheduler() succeeds.
-.\" Is that intended? (Why?)
-.\"
-.SS SCHED_OTHER: Default Linux time-sharing scheduling
-\fBSCHED_OTHER\fP can be used at only static priority 0
-(i.e., threads under real-time policies always have priority over
-.B SCHED_OTHER
-processes).
-\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
-intended for all threads that do not require the special
-real-time mechanisms.
-.P
-The thread to run is chosen from the static
-priority 0 list based on a \fIdynamic\fP priority that is determined only
-inside this list.
-The dynamic priority is based on the nice value (see below)
-and is increased for each time quantum the thread is ready to run,
-but denied to run by the scheduler.
-This ensures fair progress among all \fBSCHED_OTHER\fP threads.
-.P
-In the Linux kernel source code, the
-.B SCHED_OTHER
-policy is actually named
-.BR SCHED_NORMAL .
-.\"
-.SS The nice value
-The nice value is an attribute
-that can be used to influence the CPU scheduler to
-favor or disfavor a process in scheduling decisions.
-It affects the scheduling of
-.B SCHED_OTHER
-and
-.B SCHED_BATCH
-(see below) processes.
-The nice value can be modified using
-.BR nice (2),
-.BR setpriority (2),
-or
-.BR sched_setattr (2).
-.P
-According to POSIX.1, the nice value is a per-process attribute;
-that is, the threads in a process should share a nice value.
-However, on Linux, the nice value is a per-thread attribute:
-different threads in the same process may have different nice values.
-.P
-The range of the nice value
-varies across UNIX systems.
-On modern Linux, the range is \-20 (high priority) to +19 (low priority).
-On some other systems, the range is \-20..20.
-Very early Linux kernels (before Linux 2.0) had the range \-infinity..15.
-.\" Linux before 1.3.36 had \-infinity..15.
-.\" Since Linux 1.3.43, Linux has the range \-20..19.
-.P
-The degree to which the nice value affects the relative scheduling of
-.B SCHED_OTHER
-processes likewise varies across UNIX systems and
-across Linux kernel versions.
-.P
-With the advent of the CFS scheduler in Linux 2.6.23,
-Linux adopted an algorithm that causes
-relative differences in nice values to have a much stronger effect.
-In the current implementation, each unit of difference in the
-nice values of two processes results in a factor of 1.25
-in the degree to which the scheduler favors the higher priority process.
-This causes very low nice values (+19) to truly provide little CPU
-to a process whenever there is any other
-higher priority load on the system,
-and makes high nice values (\-20) deliver most of the CPU to applications
-that require it (e.g., some audio applications).
-.P
-On Linux, the
-.B RLIMIT_NICE
-resource limit can be used to define a limit to which
-an unprivileged process's nice value can be raised; see
-.BR setrlimit (2)
-for details.
-.P
-For further details on the nice value, see the subsections on
-the autogroup feature and group scheduling, below.
-.\"
-.SS SCHED_BATCH: Scheduling batch processes
-(Since Linux 2.6.16.)
-\fBSCHED_BATCH\fP can be used only at static priority 0.
-This policy is similar to \fBSCHED_OTHER\fP in that it schedules
-the thread according to its dynamic priority
-(based on the nice value).
-The difference is that this policy
-will cause the scheduler to always assume
-that the thread is CPU-intensive.
-Consequently, the scheduler will apply a small scheduling
-penalty with respect to wakeup behavior,
-so that this thread is mildly disfavored in scheduling decisions.
-.P
-.\" The following paragraph is drawn largely from the text that
-.\" accompanied Ingo Molnar's patch for the implementation of
-.\" SCHED_BATCH.
-.\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
-This policy is useful for workloads that are noninteractive,
-but do not want to lower their nice value,
-and for workloads that want a deterministic scheduling policy without
-interactivity causing extra preemptions (between the workload's tasks).
-.\"
-.SS SCHED_IDLE: Scheduling very low priority jobs
-(Since Linux 2.6.23.)
-\fBSCHED_IDLE\fP can be used only at static priority 0;
-the process nice value has no influence for this policy.
-.P
-This policy is intended for running jobs at extremely low
-priority (lower even than a +19 nice value with the
-.B SCHED_OTHER
-or
-.B SCHED_BATCH
-policies).
-.\"
-.SS Resetting scheduling policy for child processes
-Each thread has a reset-on-fork scheduling flag.
-When this flag is set, children created by
-.BR fork (2)
-do not inherit privileged scheduling policies.
-The reset-on-fork flag can be set by either:
-.IP \[bu] 3
-ORing the
-.B SCHED_RESET_ON_FORK
-flag into the
-.I policy
-argument when calling
-.BR sched_setscheduler (2)
-(since Linux 2.6.32);
-or
-.IP \[bu]
-specifying the
-.B SCHED_FLAG_RESET_ON_FORK
-flag in
-.I attr.sched_flags
-when calling
-.BR sched_setattr (2).
-.P
-Note that the constants used with these two APIs have different names.
-The state of the reset-on-fork flag can analogously be retrieved using
-.BR sched_getscheduler (2)
-and
-.BR sched_getattr (2).
-.P
-The reset-on-fork feature is intended for media-playback applications,
-and can be used to prevent applications evading the
-.B RLIMIT_RTTIME
-resource limit (see
-.BR getrlimit (2))
-by creating multiple child processes.
-.P
-More precisely, if the reset-on-fork flag is set,
-the following rules apply for subsequently created children:
-.IP \[bu] 3
-If the calling thread has a scheduling policy of
-.B SCHED_FIFO
-or
-.BR SCHED_RR ,
-the policy is reset to
-.B SCHED_OTHER
-in child processes.
-.IP \[bu]
-If the calling process has a negative nice value,
-the nice value is reset to zero in child processes.
-.P
-After the reset-on-fork flag has been enabled,
-it can be reset only if the thread has the
-.B CAP_SYS_NICE
-capability.
-This flag is disabled in child processes created by
-.BR fork (2).
-.\"
-.SS Privileges and resource limits
-Before Linux 2.6.12, only privileged
-.RB ( CAP_SYS_NICE )
-threads can set a nonzero static priority (i.e., set a real-time
-scheduling policy).
-The only change that an unprivileged thread can make is to set the
-.B SCHED_OTHER
-policy, and this can be done only if the effective user ID of the caller
-matches the real or effective user ID of the target thread
-(i.e., the thread specified by
-.IR pid )
-whose policy is being changed.
-.P
-A thread must be privileged
-.RB ( CAP_SYS_NICE )
-in order to set or modify a
-.B SCHED_DEADLINE
-policy.
-.P
-Since Linux 2.6.12, the
-.B RLIMIT_RTPRIO
-resource limit defines a ceiling on an unprivileged thread's
-static priority for the
-.B SCHED_RR
-and
-.B SCHED_FIFO
-policies.
-The rules for changing scheduling policy and priority are as follows:
-.IP \[bu] 3
-If an unprivileged thread has a nonzero
-.B RLIMIT_RTPRIO
-soft limit, then it can change its scheduling policy and priority,
-subject to the restriction that the priority cannot be set to a
-value higher than the maximum of its current priority and its
-.B RLIMIT_RTPRIO
-soft limit.
-.IP \[bu]
-If the
-.B RLIMIT_RTPRIO
-soft limit is 0, then the only permitted changes are to lower the priority,
-or to switch to a non-real-time policy.
-.IP \[bu]
-Subject to the same rules,
-another unprivileged thread can also make these changes,
-as long as the effective user ID of the thread making the change
-matches the real or effective user ID of the target thread.
-.IP \[bu]
-Special rules apply for the
-.B SCHED_IDLE
-policy.
-Before Linux 2.6.39,
-an unprivileged thread operating under this policy cannot
-change its policy, regardless of the value of its
-.B RLIMIT_RTPRIO
-resource limit.
-Since Linux 2.6.39,
-.\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
-an unprivileged thread can switch to either the
-.B SCHED_BATCH
-or the
-.B SCHED_OTHER
-policy so long as its nice value falls within the range permitted by its
-.B RLIMIT_NICE
-resource limit (see
-.BR getrlimit (2)).
-.P
-Privileged
-.RB ( CAP_SYS_NICE )
-threads ignore the
-.B RLIMIT_RTPRIO
-limit; as with older kernels,
-they can make arbitrary changes to scheduling policy and priority.
-See
-.BR getrlimit (2)
-for further information on
-.BR RLIMIT_RTPRIO .
-.SS Limiting the CPU usage of real-time and deadline processes
-A nonblocking infinite loop in a thread scheduled under the
-.BR SCHED_FIFO ,
-.BR SCHED_RR ,
-or
-.B SCHED_DEADLINE
-policy can potentially block all other threads from accessing
-the CPU forever.
-Before Linux 2.6.25, the only way of preventing a runaway real-time
-process from freezing the system was to run (at the console)
-a shell scheduled under a higher static priority than the tested application.
-This allows an emergency kill of tested
-real-time applications that do not block or terminate as expected.
-.P
-Since Linux 2.6.25, there are other techniques for dealing with runaway
-real-time and deadline processes.
-One of these is to use the
-.B RLIMIT_RTTIME
-resource limit to set a ceiling on the CPU time that
-a real-time process may consume.
-See
-.BR getrlimit (2)
-for details.
-.P
-Since Linux 2.6.25, Linux also provides two
-.I /proc
-files that can be used to reserve a certain amount of CPU time
-to be used by non-real-time processes.
-Reserving CPU time in this fashion allows some CPU time to be
-allocated to (say) a root shell that can be used to kill a runaway process.
-Both of these files specify time values in microseconds:
-.TP
-.I /proc/sys/kernel/sched_rt_period_us
-This file specifies a scheduling period that is equivalent to
-100% CPU bandwidth.
-The value in this file can range from 1 to
-.BR INT_MAX ,
-giving an operating range of 1 microsecond to around 35 minutes.
-The default value in this file is 1,000,000 (1 second).
-.TP
-.I /proc/sys/kernel/sched_rt_runtime_us
-The value in this file specifies how much of the "period" time
-can be used by all real-time and deadline scheduled processes
-on the system.
-The value in this file can range from \-1 to
-.BR INT_MAX \-1.
-Specifying \-1 makes the run time the same as the period;
-that is, no CPU time is set aside for non-real-time processes
-(which was the behavior before Linux 2.6.25).
-The default value in this file is 950,000 (0.95 seconds),
-meaning that 5% of the CPU time is reserved for processes that
-don't run under a real-time or deadline scheduling policy.
-.SS Response time
-A blocked high priority thread waiting for I/O has a certain
-response time before it is scheduled again.
-The device driver writer
-can greatly reduce this response time by using a "slow interrupt"
-interrupt handler.
-.\" as described in
-.\" .BR request_irq (9).
-.SS Miscellaneous
-Child processes inherit the scheduling policy and parameters across a
-.BR fork (2).
-The scheduling policy and parameters are preserved across
-.BR execve (2).
-.P
-Memory locking is usually needed for real-time processes to avoid
-paging delays; this can be done with
-.BR mlock (2)
-or
-.BR mlockall (2).
-.\"
-.SS The autogroup feature
-.\" commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a
-Since Linux 2.6.38,
-the kernel provides a feature known as autogrouping to improve interactive
-desktop performance in the face of multiprocess, CPU-intensive
-workloads such as building the Linux kernel with large numbers of
-parallel build processes (i.e., the
-.BR make (1)
-.B \-j
-flag).
-.P
-This feature operates in conjunction with the
-CFS scheduler and requires a kernel that is configured with
-.BR CONFIG_SCHED_AUTOGROUP .
-On a running system, this feature is enabled or disabled via the file
-.IR /proc/sys/kernel/sched_autogroup_enabled ;
-a value of 0 disables the feature, while a value of 1 enables it.
-The default value in this file is 1, unless the kernel was booted with the
-.I noautogroup
-parameter.
-.P
-A new autogroup is created when a new session is created via
-.BR setsid (2);
-this happens, for example, when a new terminal window is started.
-A new process created by
-.BR fork (2)
-inherits its parent's autogroup membership.
-Thus, all of the processes in a session are members of the same autogroup.
-An autogroup is automatically destroyed when the last process
-in the group terminates.
-.P
-When autogrouping is enabled, all of the members of an autogroup
-are placed in the same kernel scheduler "task group".
-The CFS scheduler employs an algorithm that equalizes the
-distribution of CPU cycles across task groups.
-The benefits of this for interactive desktop performance
-can be described via the following example.
-.P
-Suppose that there are two autogroups competing for the same CPU
-(i.e., presume either a single CPU system or the use of
-.BR taskset (1)
-to confine all the processes to the same CPU on an SMP system).
-The first group contains ten CPU-bound processes from
-a kernel build started with
-.IR "make\~\-j10" .
-The other contains a single CPU-bound process: a video player.
-The effect of autogrouping is that the two groups will
-each receive half of the CPU cycles.
-That is, the video player will receive 50% of the CPU cycles,
-rather than just 9% of the cycles,
-which would likely lead to degraded video playback.
-The situation on an SMP system is more complex,
-.\" Mike Galbraith, 25 Nov 2016:
-.\" I'd say something more wishy-washy here, like cycles are
-.\" distributed fairly across groups and leave it at that, as your
-.\" detailed example is incorrect due to SMP fairness (which I don't
-.\" like much because [very unlikely] worst case scenario
-.\" renders a box sized group incapable of utilizing more that
-.\" a single CPU total). For example, if a group of NR_CPUS
-.\" size competes with a singleton, load balancing will try to give
-.\" the singleton a full CPU of its very own. If groups intersect for
-.\" whatever reason on say my quad lappy, distribution is 80/20 in
-.\" favor of the singleton.
-but the general effect is the same:
-the scheduler distributes CPU cycles across task groups such that
-an autogroup that contains a large number of CPU-bound processes
-does not end up hogging CPU cycles at the expense of the other
-jobs on the system.
-.P
-A process's autogroup (task group) membership can be viewed via the file
-.IR /proc/ pid /autogroup :
-.P
-.in +4n
-.EX
-$ \fBcat /proc/1/autogroup\fP
-/autogroup\-1 nice 0
-.EE
-.in
-.P
-This file can also be used to modify the CPU bandwidth allocated
-to an autogroup.
-This is done by writing a number in the "nice" range to the file
-to set the autogroup's nice value.
-The allowed range is from +19 (low priority) to \-20 (high priority).
-(Writing values outside of this range causes
-.BR write (2)
-to fail with the error
-.BR EINVAL .)
-.\" FIXME .
-.\" Because of a bug introduced in Linux 4.7
-.\" (commit 2159197d66770ec01f75c93fb11dc66df81fd45b made changes
-.\" that exposed the fact that autogroup didn't call scale_load()),
-.\" it happened that *all* values in this range caused a task group
-.\" to be further disfavored by the scheduler, with \-20 resulting
-.\" in the scheduler mildly disfavoring the task group and +19 greatly
-.\" disfavoring it.
-.\"
-.\" A patch was posted on 23 Nov 2016
-.\" ("sched/autogroup: Fix 64bit kernel nice adjustment";
-.\" check later to see in which kernel version it lands.
-.P
-The autogroup nice setting has the same meaning as the process nice value,
-but applies to distribution of CPU cycles to the autogroup as a whole,
-based on the relative nice values of other autogroups.
-For a process inside an autogroup, the CPU cycles that it receives
-will be a product of the autogroup's nice value
-(compared to other autogroups)
-and the process's nice value
-(compared to other processes in the same autogroup.
-.P
-The use of the
-.BR cgroups (7)
-CPU controller to place processes in cgroups other than the
-root CPU cgroup overrides the effect of autogrouping.
-.P
-The autogroup feature groups only processes scheduled under
-non-real-time policies
-.RB ( SCHED_OTHER ,
-.BR SCHED_BATCH ,
-and
-.BR SCHED_IDLE ).
-It does not group processes scheduled under real-time and
-deadline policies.
-Those processes are scheduled according to the rules described earlier.
-.\"
-.SS The nice value and group scheduling
-When scheduling non-real-time processes (i.e., those scheduled under the
-.BR SCHED_OTHER ,
-.BR SCHED_BATCH ,
-and
-.B SCHED_IDLE
-policies), the CFS scheduler employs a technique known as "group scheduling",
-if the kernel was configured with the
-.B CONFIG_FAIR_GROUP_SCHED
-option (which is typical).
-.P
-Under group scheduling, threads are scheduled in "task groups".
-Task groups have a hierarchical relationship,
-rooted under the initial task group on the system,
-known as the "root task group".
-Task groups are formed in the following circumstances:
-.IP \[bu] 3
-All of the threads in a CPU cgroup form a task group.
-The parent of this task group is the task group of the
-corresponding parent cgroup.
-.IP \[bu]
-If autogrouping is enabled,
-then all of the threads that are (implicitly) placed in an autogroup
-(i.e., the same session, as created by
-.BR setsid (2))
-form a task group.
-Each new autogroup is thus a separate task group.
-The root task group is the parent of all such autogroups.
-.IP \[bu]
-If autogrouping is enabled, then the root task group consists of
-all processes in the root CPU cgroup that were not
-otherwise implicitly placed into a new autogroup.
-.IP \[bu]
-If autogrouping is disabled, then the root task group consists of
-all processes in the root CPU cgroup.
-.IP \[bu]
-If group scheduling was disabled (i.e., the kernel was configured without
-.BR CONFIG_FAIR_GROUP_SCHED ),
-then all of the processes on the system are notionally placed
-in a single task group.
-.P
-Under group scheduling,
-a thread's nice value has an effect for scheduling decisions
-.IR "only relative to other threads in the same task group" .
-This has some surprising consequences in terms of the traditional semantics
-of the nice value on UNIX systems.
-In particular, if autogrouping
-is enabled (which is the default in various distributions), then employing
-.BR setpriority (2)
-or
-.BR nice (1)
-on a process has an effect only for scheduling relative
-to other processes executed in the same session
-(typically: the same terminal window).
-.P
-Conversely, for two processes that are (for example)
-the sole CPU-bound processes in different sessions
-(e.g., different terminal windows,
-each of whose jobs are tied to different autogroups),
-.I modifying the nice value of the process in one of the sessions
-.I has no effect
-in terms of the scheduler's decisions relative to the
-process in the other session.
-.\" More succinctly: the nice(1) command is in many cases a no-op since
-.\" Linux 2.6.38.
-.\"
-A possibly useful workaround here is to use a command such as
-the following to modify the autogroup nice value for
-.I all
-of the processes in a terminal session:
-.P
-.in +4n
-.EX
-$ \fBecho 10 > /proc/self/autogroup\fP
-.EE
-.in
-.SS Real-time features in the mainline Linux kernel
-.\" FIXME . Probably this text will need some minor tweaking
-.\" ask Carsten Emde about this.
-Since Linux 2.6.18, Linux is gradually
-becoming equipped with real-time capabilities,
-most of which are derived from the former
-.I realtime\-preempt
-patch set.
-Until the patches have been completely merged into the
-mainline kernel,
-they must be installed to achieve the best real-time performance.
-These patches are named:
-.P
-.in +4n
-.EX
-patch\-\fIkernelversion\fP\-rt\fIpatchversion\fP
-.EE
-.in
-.P
-and can be downloaded from
-.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
-.UE .
-.P
-Without the patches and prior to their full inclusion into the mainline
-kernel, the kernel configuration offers only the three preemption classes
-.BR CONFIG_PREEMPT_NONE ,
-.BR CONFIG_PREEMPT_VOLUNTARY ,
-and
-.B CONFIG_PREEMPT_DESKTOP
-which respectively provide no, some, and considerable
-reduction of the worst-case scheduling latency.
-.P
-With the patches applied or after their full inclusion into the mainline
-kernel, the additional configuration item
-.B CONFIG_PREEMPT_RT
-becomes available.
-If this is selected, Linux is transformed into a regular
-real-time operating system.
-The FIFO and RR scheduling policies are then used to run a thread
-with true real-time priority and a minimum worst-case scheduling latency.
-.SH NOTES
-The
-.BR cgroups (7)
-CPU controller can be used to limit the CPU consumption of
-groups of processes.
-.P
-Originally, Standard Linux was intended as a general-purpose operating
-system being able to handle background processes, interactive
-applications, and less demanding real-time applications (applications that
-need to usually meet timing deadlines).
-Although the Linux 2.6
-allowed for kernel preemption and the newly introduced O(1) scheduler
-ensures that the time needed to schedule is fixed and deterministic
-irrespective of the number of active tasks, true real-time computing
-was not possible up to Linux 2.6.17.
-.SH SEE ALSO
-.ad l
-.nh
-.BR chcpu (1),
-.BR chrt (1),
-.BR lscpu (1),
-.BR ps (1),
-.BR taskset (1),
-.BR top (1),
-.BR getpriority (2),
-.BR mlock (2),
-.BR mlockall (2),
-.BR munlock (2),
-.BR munlockall (2),
-.BR nice (2),
-.BR sched_get_priority_max (2),
-.BR sched_get_priority_min (2),
-.BR sched_getaffinity (2),
-.BR sched_getparam (2),
-.BR sched_getscheduler (2),
-.BR sched_rr_get_interval (2),
-.BR sched_setaffinity (2),
-.BR sched_setparam (2),
-.BR sched_setscheduler (2),
-.BR sched_yield (2),
-.BR setpriority (2),
-.BR pthread_getaffinity_np (3),
-.BR pthread_getschedparam (3),
-.BR pthread_setaffinity_np (3),
-.BR sched_getcpu (3),
-.BR capabilities (7),
-.BR cpuset (7)
-.ad
-.P
-.I Programming for the real world \- POSIX.4
-by Bill O.\& Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
-.P
-The Linux kernel source files
-.IR \%Documentation/\:scheduler/\:sched\-deadline\:.txt ,
-.IR \%Documentation/\:scheduler/\:sched\-rt\-group\:.txt ,
-.IR \%Documentation/\:scheduler/\:sched\-design\-CFS\:.txt ,
-and
-.I \%Documentation/\:scheduler/\:sched\-nice\-design\:.txt