diff options
Diffstat (limited to 'man7/sched.7')
-rw-r--r-- | man7/sched.7 | 992 |
1 files changed, 0 insertions, 992 deletions
diff --git a/man7/sched.7 b/man7/sched.7 deleted file mode 100644 index 7e1212f..0000000 --- a/man7/sched.7 +++ /dev/null @@ -1,992 +0,0 @@ -.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com> -.\" and Copyright (C) 2014 Peter Zijlstra <peterz@infradead.org> -.\" and Copyright (C) 2014 Juri Lelli <juri.lelli@gmail.com> -.\" Various pieces from the old sched_setscheduler(2) page -.\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999 -.\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org> -.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: GPL-2.0-or-later -.\" -.\" Worth looking at: http://rt.wiki.kernel.org/index.php -.\" -.TH sched 7 2024-02-18 "Linux man-pages 6.7" -.SH NAME -sched \- overview of CPU scheduling -.SH DESCRIPTION -Since Linux 2.6.23, the default scheduler is CFS, -the "Completely Fair Scheduler". -The CFS scheduler replaced the earlier "O(1)" scheduler. -.\" -.SS API summary -Linux provides the following system calls for controlling -the CPU scheduling behavior, policy, and priority of processes -(or, more precisely, threads). -.TP -.BR nice (2) -Set a new nice value for the calling thread, -and return the new nice value. -.TP -.BR getpriority (2) -Return the nice value of a thread, a process group, -or the set of threads owned by a specified user. -.TP -.BR setpriority (2) -Set the nice value of a thread, a process group, -or the set of threads owned by a specified user. -.TP -.BR sched_setscheduler (2) -Set the scheduling policy and parameters of a specified thread. -.TP -.BR sched_getscheduler (2) -Return the scheduling policy of a specified thread. -.TP -.BR sched_setparam (2) -Set the scheduling parameters of a specified thread. -.TP -.BR sched_getparam (2) -Fetch the scheduling parameters of a specified thread. -.TP -.BR sched_get_priority_max (2) -Return the maximum priority available in a specified scheduling policy. -.TP -.BR sched_get_priority_min (2) -Return the minimum priority available in a specified scheduling policy. -.TP -.BR sched_rr_get_interval (2) -Fetch the quantum used for threads that are scheduled under -the "round-robin" scheduling policy. -.TP -.BR sched_yield (2) -Cause the caller to relinquish the CPU, -so that some other thread be executed. -.TP -.BR sched_setaffinity (2) -(Linux-specific) -Set the CPU affinity of a specified thread. -.TP -.BR sched_getaffinity (2) -(Linux-specific) -Get the CPU affinity of a specified thread. -.TP -.BR sched_setattr (2) -Set the scheduling policy and parameters of a specified thread. -This (Linux-specific) system call provides a superset of the functionality of -.BR sched_setscheduler (2) -and -.BR sched_setparam (2). -.TP -.BR sched_getattr (2) -Fetch the scheduling policy and parameters of a specified thread. -This (Linux-specific) system call provides a superset of the functionality of -.BR sched_getscheduler (2) -and -.BR sched_getparam (2). -.\" -.SS Scheduling policies -The scheduler is the kernel component that decides which runnable thread -will be executed by the CPU next. -Each thread has an associated scheduling policy and a \fIstatic\fP -scheduling priority, -.IR sched_priority . -The scheduler makes its decisions based on knowledge of the scheduling -policy and static priority of all threads on the system. -.P -For threads scheduled under one of the normal scheduling policies -(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP), -\fIsched_priority\fP is not used in scheduling -decisions (it must be specified as 0). -.P -Processes scheduled under one of the real-time policies -(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a -\fIsched_priority\fP value in the range 1 (low) to 99 (high). -(As the numbers imply, real-time threads always have higher priority -than normal threads.) -Note well: POSIX.1 requires an implementation to support only a -minimum 32 distinct priority levels for the real-time policies, -and some systems supply just this minimum. -Portable programs should use -.BR sched_get_priority_min (2) -and -.BR sched_get_priority_max (2) -to find the range of priorities supported for a particular policy. -.P -Conceptually, the scheduler maintains a list of runnable -threads for each possible \fIsched_priority\fP value. -In order to determine which thread runs next, the scheduler looks for -the nonempty list with the highest static priority and selects the -thread at the head of this list. -.P -A thread's scheduling policy determines -where it will be inserted into the list of threads -with equal static priority and how it will move inside this list. -.P -All scheduling is preemptive: if a thread with a higher static -priority becomes ready to run, the currently running thread -will be preempted and -returned to the wait list for its static priority level. -The scheduling policy determines the -ordering only within the list of runnable threads with equal static -priority. -.SS SCHED_FIFO: First in-first out scheduling -\fBSCHED_FIFO\fP can be used only with static priorities higher than -0, which means that when a \fBSCHED_FIFO\fP thread becomes runnable, -it will always immediately preempt any currently running -\fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread. -\fBSCHED_FIFO\fP is a simple scheduling -algorithm without time slicing. -For threads scheduled under the -\fBSCHED_FIFO\fP policy, the following rules apply: -.IP \[bu] 3 -A running \fBSCHED_FIFO\fP thread that has been preempted by another thread of -higher priority will stay at the head of the list for its priority and -will resume execution as soon as all threads of higher priority are -blocked again. -.IP \[bu] -When a blocked \fBSCHED_FIFO\fP thread becomes runnable, it -will be inserted at the end of the list for its priority. -.IP \[bu] -If a call to -.BR sched_setscheduler (2), -.BR sched_setparam (2), -.BR sched_setattr (2), -.BR pthread_setschedparam (3), -or -.BR pthread_setschedprio (3) -changes the priority of the running or runnable -.B SCHED_FIFO -thread identified by -.I pid -the effect on the thread's position in the list depends on -the direction of the change to the thread's priority: -.RS -.IP (a) 5 -If the thread's priority is raised, -it is placed at the end of the list for its new priority. -As a consequence, -it may preempt a currently running thread with the same priority. -.IP (b) -If the thread's priority is unchanged, -its position in the run list is unchanged. -.IP (c) -If the thread's priority is lowered, -it is placed at the front of the list for its new priority. -.RE -.IP -According to POSIX.1-2008, -changes to a thread's priority (or policy) using any mechanism other than -.BR pthread_setschedprio (3) -should result in the thread being placed at the end of -the list for its priority. -.\" In Linux 2.2.x and Linux 2.4.x, the thread is placed at the front of the queue -.\" In Linux 2.0.x, the Right Thing happened: the thread went to the back -- MTK -.IP \[bu] -A thread calling -.BR sched_yield (2) -will be put at the end of the list. -.P -No other events will move a thread -scheduled under the \fBSCHED_FIFO\fP policy in the wait list of -runnable threads with equal static priority. -.P -A \fBSCHED_FIFO\fP -thread runs until either it is blocked by an I/O request, it is -preempted by a higher priority thread, or it calls -.BR sched_yield (2). -.SS SCHED_RR: Round-robin scheduling -\fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP. -Everything -described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP, -except that each thread is allowed to run only for a maximum time -quantum. -If a \fBSCHED_RR\fP thread has been running for a time -period equal to or longer than the time quantum, it will be put at the -end of the list for its priority. -A \fBSCHED_RR\fP thread that has -been preempted by a higher priority thread and subsequently resumes -execution as a running thread will complete the unexpired portion of -its round-robin time quantum. -The length of the time quantum can be -retrieved using -.BR sched_rr_get_interval (2). -.\" On Linux 2.4, the length of the RR interval is influenced -.\" by the process nice value -- MTK -.\" -.SS SCHED_DEADLINE: Sporadic task model deadline scheduling -Since Linux 3.14, Linux provides a deadline scheduling policy -.RB ( SCHED_DEADLINE ). -This policy is currently implemented using -GEDF (Global Earliest Deadline First) -in conjunction with CBS (Constant Bandwidth Server). -To set and fetch this policy and associated attributes, -one must use the Linux-specific -.BR sched_setattr (2) -and -.BR sched_getattr (2) -system calls. -.P -A sporadic task is one that has a sequence of jobs, where each -job is activated at most once per period. -Each job also has a -.IR "relative deadline" , -before which it should finish execution, and a -.IR "computation time" , -which is the CPU time necessary for executing the job. -The moment when a task wakes up -because a new job has to be executed is called the -.I arrival time -(also referred to as the request time or release time). -The -.I start time -is the time at which a task starts its execution. -The -.I absolute deadline -is thus obtained by adding the relative deadline to the arrival time. -.P -The following diagram clarifies these terms: -.P -.in +4n -.EX -arrival/wakeup absolute deadline - | start time | - | | | - v v v ------x--------xooooooooooooooooo--------x--------x--- - |<- comp. time ->| - |<------- relative deadline ------>| - |<-------------- period ------------------->| -.EE -.in -.P -When setting a -.B SCHED_DEADLINE -policy for a thread using -.BR sched_setattr (2), -one can specify three parameters: -.IR Runtime , -.IR Deadline , -and -.IR Period . -These parameters do not necessarily correspond to the aforementioned terms: -usual practice is to set Runtime to something bigger than the average -computation time (or worst-case execution time for hard real-time tasks), -Deadline to the relative deadline, and Period to the period of the task. -Thus, for -.B SCHED_DEADLINE -scheduling, we have: -.P -.in +4n -.EX -arrival/wakeup absolute deadline - | start time | - | | | - v v v ------x--------xooooooooooooooooo--------x--------x--- - |<-- Runtime ------->| - |<----------- Deadline ----------->| - |<-------------- Period ------------------->| -.EE -.in -.P -The three deadline-scheduling parameters correspond to the -.IR sched_runtime , -.IR sched_deadline , -and -.I sched_period -fields of the -.I sched_attr -structure; see -.BR sched_setattr (2). -These fields express values in nanoseconds. -.\" FIXME It looks as though specifying sched_period as 0 means -.\" "make sched_period the same as sched_deadline". -.\" This needs to be documented. -If -.I sched_period -is specified as 0, then it is made the same as -.IR sched_deadline . -.P -The kernel requires that: -.P -.in +4n -.EX -sched_runtime <= sched_deadline <= sched_period -.EE -.in -.P -.\" See __checkparam_dl in kernel/sched/core.c -In addition, under the current implementation, -all of the parameter values must be at least 1024 -(i.e., just over one microsecond, -which is the resolution of the implementation), and less than 2\[ha]63. -If any of these checks fails, -.BR sched_setattr (2) -fails with the error -.BR EINVAL . -.P -The CBS guarantees non-interference between tasks, by throttling -threads that attempt to over-run their specified Runtime. -.P -To ensure deadline scheduling guarantees, -the kernel must prevent situations where the set of -.B SCHED_DEADLINE -threads is not feasible (schedulable) within the given constraints. -The kernel thus performs an admittance test when setting or changing -.B SCHED_DEADLINE -policy and attributes. -This admission test calculates whether the change is feasible; -if it is not, -.BR sched_setattr (2) -fails with the error -.BR EBUSY . -.P -For example, it is required (but not necessarily sufficient) for -the total utilization to be less than or equal to the total number of -CPUs available, where, since each thread can maximally run for -Runtime per Period, that thread's utilization is its -Runtime divided by its Period. -.P -In order to fulfill the guarantees that are made when -a thread is admitted to the -.B SCHED_DEADLINE -policy, -.B SCHED_DEADLINE -threads are the highest priority (user controllable) threads in the -system; if any -.B SCHED_DEADLINE -thread is runnable, -it will preempt any thread scheduled under one of the other policies. -.P -A call to -.BR fork (2) -by a thread scheduled under the -.B SCHED_DEADLINE -policy fails with the error -.BR EAGAIN , -unless the thread has its reset-on-fork flag set (see below). -.P -A -.B SCHED_DEADLINE -thread that calls -.BR sched_yield (2) -will yield the current job and wait for a new period to begin. -.\" -.\" FIXME Calling sched_getparam() on a SCHED_DEADLINE thread -.\" fails with EINVAL, but sched_getscheduler() succeeds. -.\" Is that intended? (Why?) -.\" -.SS SCHED_OTHER: Default Linux time-sharing scheduling -\fBSCHED_OTHER\fP can be used at only static priority 0 -(i.e., threads under real-time policies always have priority over -.B SCHED_OTHER -processes). -\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is -intended for all threads that do not require the special -real-time mechanisms. -.P -The thread to run is chosen from the static -priority 0 list based on a \fIdynamic\fP priority that is determined only -inside this list. -The dynamic priority is based on the nice value (see below) -and is increased for each time quantum the thread is ready to run, -but denied to run by the scheduler. -This ensures fair progress among all \fBSCHED_OTHER\fP threads. -.P -In the Linux kernel source code, the -.B SCHED_OTHER -policy is actually named -.BR SCHED_NORMAL . -.\" -.SS The nice value -The nice value is an attribute -that can be used to influence the CPU scheduler to -favor or disfavor a process in scheduling decisions. -It affects the scheduling of -.B SCHED_OTHER -and -.B SCHED_BATCH -(see below) processes. -The nice value can be modified using -.BR nice (2), -.BR setpriority (2), -or -.BR sched_setattr (2). -.P -According to POSIX.1, the nice value is a per-process attribute; -that is, the threads in a process should share a nice value. -However, on Linux, the nice value is a per-thread attribute: -different threads in the same process may have different nice values. -.P -The range of the nice value -varies across UNIX systems. -On modern Linux, the range is \-20 (high priority) to +19 (low priority). -On some other systems, the range is \-20..20. -Very early Linux kernels (before Linux 2.0) had the range \-infinity..15. -.\" Linux before 1.3.36 had \-infinity..15. -.\" Since Linux 1.3.43, Linux has the range \-20..19. -.P -The degree to which the nice value affects the relative scheduling of -.B SCHED_OTHER -processes likewise varies across UNIX systems and -across Linux kernel versions. -.P -With the advent of the CFS scheduler in Linux 2.6.23, -Linux adopted an algorithm that causes -relative differences in nice values to have a much stronger effect. -In the current implementation, each unit of difference in the -nice values of two processes results in a factor of 1.25 -in the degree to which the scheduler favors the higher priority process. -This causes very low nice values (+19) to truly provide little CPU -to a process whenever there is any other -higher priority load on the system, -and makes high nice values (\-20) deliver most of the CPU to applications -that require it (e.g., some audio applications). -.P -On Linux, the -.B RLIMIT_NICE -resource limit can be used to define a limit to which -an unprivileged process's nice value can be raised; see -.BR setrlimit (2) -for details. -.P -For further details on the nice value, see the subsections on -the autogroup feature and group scheduling, below. -.\" -.SS SCHED_BATCH: Scheduling batch processes -(Since Linux 2.6.16.) -\fBSCHED_BATCH\fP can be used only at static priority 0. -This policy is similar to \fBSCHED_OTHER\fP in that it schedules -the thread according to its dynamic priority -(based on the nice value). -The difference is that this policy -will cause the scheduler to always assume -that the thread is CPU-intensive. -Consequently, the scheduler will apply a small scheduling -penalty with respect to wakeup behavior, -so that this thread is mildly disfavored in scheduling decisions. -.P -.\" The following paragraph is drawn largely from the text that -.\" accompanied Ingo Molnar's patch for the implementation of -.\" SCHED_BATCH. -.\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785 -This policy is useful for workloads that are noninteractive, -but do not want to lower their nice value, -and for workloads that want a deterministic scheduling policy without -interactivity causing extra preemptions (between the workload's tasks). -.\" -.SS SCHED_IDLE: Scheduling very low priority jobs -(Since Linux 2.6.23.) -\fBSCHED_IDLE\fP can be used only at static priority 0; -the process nice value has no influence for this policy. -.P -This policy is intended for running jobs at extremely low -priority (lower even than a +19 nice value with the -.B SCHED_OTHER -or -.B SCHED_BATCH -policies). -.\" -.SS Resetting scheduling policy for child processes -Each thread has a reset-on-fork scheduling flag. -When this flag is set, children created by -.BR fork (2) -do not inherit privileged scheduling policies. -The reset-on-fork flag can be set by either: -.IP \[bu] 3 -ORing the -.B SCHED_RESET_ON_FORK -flag into the -.I policy -argument when calling -.BR sched_setscheduler (2) -(since Linux 2.6.32); -or -.IP \[bu] -specifying the -.B SCHED_FLAG_RESET_ON_FORK -flag in -.I attr.sched_flags -when calling -.BR sched_setattr (2). -.P -Note that the constants used with these two APIs have different names. -The state of the reset-on-fork flag can analogously be retrieved using -.BR sched_getscheduler (2) -and -.BR sched_getattr (2). -.P -The reset-on-fork feature is intended for media-playback applications, -and can be used to prevent applications evading the -.B RLIMIT_RTTIME -resource limit (see -.BR getrlimit (2)) -by creating multiple child processes. -.P -More precisely, if the reset-on-fork flag is set, -the following rules apply for subsequently created children: -.IP \[bu] 3 -If the calling thread has a scheduling policy of -.B SCHED_FIFO -or -.BR SCHED_RR , -the policy is reset to -.B SCHED_OTHER -in child processes. -.IP \[bu] -If the calling process has a negative nice value, -the nice value is reset to zero in child processes. -.P -After the reset-on-fork flag has been enabled, -it can be reset only if the thread has the -.B CAP_SYS_NICE -capability. -This flag is disabled in child processes created by -.BR fork (2). -.\" -.SS Privileges and resource limits -Before Linux 2.6.12, only privileged -.RB ( CAP_SYS_NICE ) -threads can set a nonzero static priority (i.e., set a real-time -scheduling policy). -The only change that an unprivileged thread can make is to set the -.B SCHED_OTHER -policy, and this can be done only if the effective user ID of the caller -matches the real or effective user ID of the target thread -(i.e., the thread specified by -.IR pid ) -whose policy is being changed. -.P -A thread must be privileged -.RB ( CAP_SYS_NICE ) -in order to set or modify a -.B SCHED_DEADLINE -policy. -.P -Since Linux 2.6.12, the -.B RLIMIT_RTPRIO -resource limit defines a ceiling on an unprivileged thread's -static priority for the -.B SCHED_RR -and -.B SCHED_FIFO -policies. -The rules for changing scheduling policy and priority are as follows: -.IP \[bu] 3 -If an unprivileged thread has a nonzero -.B RLIMIT_RTPRIO -soft limit, then it can change its scheduling policy and priority, -subject to the restriction that the priority cannot be set to a -value higher than the maximum of its current priority and its -.B RLIMIT_RTPRIO -soft limit. -.IP \[bu] -If the -.B RLIMIT_RTPRIO -soft limit is 0, then the only permitted changes are to lower the priority, -or to switch to a non-real-time policy. -.IP \[bu] -Subject to the same rules, -another unprivileged thread can also make these changes, -as long as the effective user ID of the thread making the change -matches the real or effective user ID of the target thread. -.IP \[bu] -Special rules apply for the -.B SCHED_IDLE -policy. -Before Linux 2.6.39, -an unprivileged thread operating under this policy cannot -change its policy, regardless of the value of its -.B RLIMIT_RTPRIO -resource limit. -Since Linux 2.6.39, -.\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8 -an unprivileged thread can switch to either the -.B SCHED_BATCH -or the -.B SCHED_OTHER -policy so long as its nice value falls within the range permitted by its -.B RLIMIT_NICE -resource limit (see -.BR getrlimit (2)). -.P -Privileged -.RB ( CAP_SYS_NICE ) -threads ignore the -.B RLIMIT_RTPRIO -limit; as with older kernels, -they can make arbitrary changes to scheduling policy and priority. -See -.BR getrlimit (2) -for further information on -.BR RLIMIT_RTPRIO . -.SS Limiting the CPU usage of real-time and deadline processes -A nonblocking infinite loop in a thread scheduled under the -.BR SCHED_FIFO , -.BR SCHED_RR , -or -.B SCHED_DEADLINE -policy can potentially block all other threads from accessing -the CPU forever. -Before Linux 2.6.25, the only way of preventing a runaway real-time -process from freezing the system was to run (at the console) -a shell scheduled under a higher static priority than the tested application. -This allows an emergency kill of tested -real-time applications that do not block or terminate as expected. -.P -Since Linux 2.6.25, there are other techniques for dealing with runaway -real-time and deadline processes. -One of these is to use the -.B RLIMIT_RTTIME -resource limit to set a ceiling on the CPU time that -a real-time process may consume. -See -.BR getrlimit (2) -for details. -.P -Since Linux 2.6.25, Linux also provides two -.I /proc -files that can be used to reserve a certain amount of CPU time -to be used by non-real-time processes. -Reserving CPU time in this fashion allows some CPU time to be -allocated to (say) a root shell that can be used to kill a runaway process. -Both of these files specify time values in microseconds: -.TP -.I /proc/sys/kernel/sched_rt_period_us -This file specifies a scheduling period that is equivalent to -100% CPU bandwidth. -The value in this file can range from 1 to -.BR INT_MAX , -giving an operating range of 1 microsecond to around 35 minutes. -The default value in this file is 1,000,000 (1 second). -.TP -.I /proc/sys/kernel/sched_rt_runtime_us -The value in this file specifies how much of the "period" time -can be used by all real-time and deadline scheduled processes -on the system. -The value in this file can range from \-1 to -.BR INT_MAX \-1. -Specifying \-1 makes the run time the same as the period; -that is, no CPU time is set aside for non-real-time processes -(which was the behavior before Linux 2.6.25). -The default value in this file is 950,000 (0.95 seconds), -meaning that 5% of the CPU time is reserved for processes that -don't run under a real-time or deadline scheduling policy. -.SS Response time -A blocked high priority thread waiting for I/O has a certain -response time before it is scheduled again. -The device driver writer -can greatly reduce this response time by using a "slow interrupt" -interrupt handler. -.\" as described in -.\" .BR request_irq (9). -.SS Miscellaneous -Child processes inherit the scheduling policy and parameters across a -.BR fork (2). -The scheduling policy and parameters are preserved across -.BR execve (2). -.P -Memory locking is usually needed for real-time processes to avoid -paging delays; this can be done with -.BR mlock (2) -or -.BR mlockall (2). -.\" -.SS The autogroup feature -.\" commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a -Since Linux 2.6.38, -the kernel provides a feature known as autogrouping to improve interactive -desktop performance in the face of multiprocess, CPU-intensive -workloads such as building the Linux kernel with large numbers of -parallel build processes (i.e., the -.BR make (1) -.B \-j -flag). -.P -This feature operates in conjunction with the -CFS scheduler and requires a kernel that is configured with -.BR CONFIG_SCHED_AUTOGROUP . -On a running system, this feature is enabled or disabled via the file -.IR /proc/sys/kernel/sched_autogroup_enabled ; -a value of 0 disables the feature, while a value of 1 enables it. -The default value in this file is 1, unless the kernel was booted with the -.I noautogroup -parameter. -.P -A new autogroup is created when a new session is created via -.BR setsid (2); -this happens, for example, when a new terminal window is started. -A new process created by -.BR fork (2) -inherits its parent's autogroup membership. -Thus, all of the processes in a session are members of the same autogroup. -An autogroup is automatically destroyed when the last process -in the group terminates. -.P -When autogrouping is enabled, all of the members of an autogroup -are placed in the same kernel scheduler "task group". -The CFS scheduler employs an algorithm that equalizes the -distribution of CPU cycles across task groups. -The benefits of this for interactive desktop performance -can be described via the following example. -.P -Suppose that there are two autogroups competing for the same CPU -(i.e., presume either a single CPU system or the use of -.BR taskset (1) -to confine all the processes to the same CPU on an SMP system). -The first group contains ten CPU-bound processes from -a kernel build started with -.IR "make\~\-j10" . -The other contains a single CPU-bound process: a video player. -The effect of autogrouping is that the two groups will -each receive half of the CPU cycles. -That is, the video player will receive 50% of the CPU cycles, -rather than just 9% of the cycles, -which would likely lead to degraded video playback. -The situation on an SMP system is more complex, -.\" Mike Galbraith, 25 Nov 2016: -.\" I'd say something more wishy-washy here, like cycles are -.\" distributed fairly across groups and leave it at that, as your -.\" detailed example is incorrect due to SMP fairness (which I don't -.\" like much because [very unlikely] worst case scenario -.\" renders a box sized group incapable of utilizing more that -.\" a single CPU total). For example, if a group of NR_CPUS -.\" size competes with a singleton, load balancing will try to give -.\" the singleton a full CPU of its very own. If groups intersect for -.\" whatever reason on say my quad lappy, distribution is 80/20 in -.\" favor of the singleton. -but the general effect is the same: -the scheduler distributes CPU cycles across task groups such that -an autogroup that contains a large number of CPU-bound processes -does not end up hogging CPU cycles at the expense of the other -jobs on the system. -.P -A process's autogroup (task group) membership can be viewed via the file -.IR /proc/ pid /autogroup : -.P -.in +4n -.EX -$ \fBcat /proc/1/autogroup\fP -/autogroup\-1 nice 0 -.EE -.in -.P -This file can also be used to modify the CPU bandwidth allocated -to an autogroup. -This is done by writing a number in the "nice" range to the file -to set the autogroup's nice value. -The allowed range is from +19 (low priority) to \-20 (high priority). -(Writing values outside of this range causes -.BR write (2) -to fail with the error -.BR EINVAL .) -.\" FIXME . -.\" Because of a bug introduced in Linux 4.7 -.\" (commit 2159197d66770ec01f75c93fb11dc66df81fd45b made changes -.\" that exposed the fact that autogroup didn't call scale_load()), -.\" it happened that *all* values in this range caused a task group -.\" to be further disfavored by the scheduler, with \-20 resulting -.\" in the scheduler mildly disfavoring the task group and +19 greatly -.\" disfavoring it. -.\" -.\" A patch was posted on 23 Nov 2016 -.\" ("sched/autogroup: Fix 64bit kernel nice adjustment"; -.\" check later to see in which kernel version it lands. -.P -The autogroup nice setting has the same meaning as the process nice value, -but applies to distribution of CPU cycles to the autogroup as a whole, -based on the relative nice values of other autogroups. -For a process inside an autogroup, the CPU cycles that it receives -will be a product of the autogroup's nice value -(compared to other autogroups) -and the process's nice value -(compared to other processes in the same autogroup. -.P -The use of the -.BR cgroups (7) -CPU controller to place processes in cgroups other than the -root CPU cgroup overrides the effect of autogrouping. -.P -The autogroup feature groups only processes scheduled under -non-real-time policies -.RB ( SCHED_OTHER , -.BR SCHED_BATCH , -and -.BR SCHED_IDLE ). -It does not group processes scheduled under real-time and -deadline policies. -Those processes are scheduled according to the rules described earlier. -.\" -.SS The nice value and group scheduling -When scheduling non-real-time processes (i.e., those scheduled under the -.BR SCHED_OTHER , -.BR SCHED_BATCH , -and -.B SCHED_IDLE -policies), the CFS scheduler employs a technique known as "group scheduling", -if the kernel was configured with the -.B CONFIG_FAIR_GROUP_SCHED -option (which is typical). -.P -Under group scheduling, threads are scheduled in "task groups". -Task groups have a hierarchical relationship, -rooted under the initial task group on the system, -known as the "root task group". -Task groups are formed in the following circumstances: -.IP \[bu] 3 -All of the threads in a CPU cgroup form a task group. -The parent of this task group is the task group of the -corresponding parent cgroup. -.IP \[bu] -If autogrouping is enabled, -then all of the threads that are (implicitly) placed in an autogroup -(i.e., the same session, as created by -.BR setsid (2)) -form a task group. -Each new autogroup is thus a separate task group. -The root task group is the parent of all such autogroups. -.IP \[bu] -If autogrouping is enabled, then the root task group consists of -all processes in the root CPU cgroup that were not -otherwise implicitly placed into a new autogroup. -.IP \[bu] -If autogrouping is disabled, then the root task group consists of -all processes in the root CPU cgroup. -.IP \[bu] -If group scheduling was disabled (i.e., the kernel was configured without -.BR CONFIG_FAIR_GROUP_SCHED ), -then all of the processes on the system are notionally placed -in a single task group. -.P -Under group scheduling, -a thread's nice value has an effect for scheduling decisions -.IR "only relative to other threads in the same task group" . -This has some surprising consequences in terms of the traditional semantics -of the nice value on UNIX systems. -In particular, if autogrouping -is enabled (which is the default in various distributions), then employing -.BR setpriority (2) -or -.BR nice (1) -on a process has an effect only for scheduling relative -to other processes executed in the same session -(typically: the same terminal window). -.P -Conversely, for two processes that are (for example) -the sole CPU-bound processes in different sessions -(e.g., different terminal windows, -each of whose jobs are tied to different autogroups), -.I modifying the nice value of the process in one of the sessions -.I has no effect -in terms of the scheduler's decisions relative to the -process in the other session. -.\" More succinctly: the nice(1) command is in many cases a no-op since -.\" Linux 2.6.38. -.\" -A possibly useful workaround here is to use a command such as -the following to modify the autogroup nice value for -.I all -of the processes in a terminal session: -.P -.in +4n -.EX -$ \fBecho 10 > /proc/self/autogroup\fP -.EE -.in -.SS Real-time features in the mainline Linux kernel -.\" FIXME . Probably this text will need some minor tweaking -.\" ask Carsten Emde about this. -Since Linux 2.6.18, Linux is gradually -becoming equipped with real-time capabilities, -most of which are derived from the former -.I realtime\-preempt -patch set. -Until the patches have been completely merged into the -mainline kernel, -they must be installed to achieve the best real-time performance. -These patches are named: -.P -.in +4n -.EX -patch\-\fIkernelversion\fP\-rt\fIpatchversion\fP -.EE -.in -.P -and can be downloaded from -.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/ -.UE . -.P -Without the patches and prior to their full inclusion into the mainline -kernel, the kernel configuration offers only the three preemption classes -.BR CONFIG_PREEMPT_NONE , -.BR CONFIG_PREEMPT_VOLUNTARY , -and -.B CONFIG_PREEMPT_DESKTOP -which respectively provide no, some, and considerable -reduction of the worst-case scheduling latency. -.P -With the patches applied or after their full inclusion into the mainline -kernel, the additional configuration item -.B CONFIG_PREEMPT_RT -becomes available. -If this is selected, Linux is transformed into a regular -real-time operating system. -The FIFO and RR scheduling policies are then used to run a thread -with true real-time priority and a minimum worst-case scheduling latency. -.SH NOTES -The -.BR cgroups (7) -CPU controller can be used to limit the CPU consumption of -groups of processes. -.P -Originally, Standard Linux was intended as a general-purpose operating -system being able to handle background processes, interactive -applications, and less demanding real-time applications (applications that -need to usually meet timing deadlines). -Although the Linux 2.6 -allowed for kernel preemption and the newly introduced O(1) scheduler -ensures that the time needed to schedule is fixed and deterministic -irrespective of the number of active tasks, true real-time computing -was not possible up to Linux 2.6.17. -.SH SEE ALSO -.ad l -.nh -.BR chcpu (1), -.BR chrt (1), -.BR lscpu (1), -.BR ps (1), -.BR taskset (1), -.BR top (1), -.BR getpriority (2), -.BR mlock (2), -.BR mlockall (2), -.BR munlock (2), -.BR munlockall (2), -.BR nice (2), -.BR sched_get_priority_max (2), -.BR sched_get_priority_min (2), -.BR sched_getaffinity (2), -.BR sched_getparam (2), -.BR sched_getscheduler (2), -.BR sched_rr_get_interval (2), -.BR sched_setaffinity (2), -.BR sched_setparam (2), -.BR sched_setscheduler (2), -.BR sched_yield (2), -.BR setpriority (2), -.BR pthread_getaffinity_np (3), -.BR pthread_getschedparam (3), -.BR pthread_setaffinity_np (3), -.BR sched_getcpu (3), -.BR capabilities (7), -.BR cpuset (7) -.ad -.P -.I Programming for the real world \- POSIX.4 -by Bill O.\& Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0. -.P -The Linux kernel source files -.IR \%Documentation/\:scheduler/\:sched\-deadline\:.txt , -.IR \%Documentation/\:scheduler/\:sched\-rt\-group\:.txt , -.IR \%Documentation/\:scheduler/\:sched\-design\-CFS\:.txt , -and -.I \%Documentation/\:scheduler/\:sched\-nice\-design\:.txt |