diff options
Diffstat (limited to 'Documentation/scheduler')
-rw-r--r-- | Documentation/scheduler/sched-arch.rst | 4 | ||||
-rw-r--r-- | Documentation/scheduler/sched-capacity.rst | 13 | ||||
-rw-r--r-- | Documentation/scheduler/sched-energy.rst | 29 | ||||
-rw-r--r-- | Documentation/scheduler/sched-rt-group.rst | 40 |
4 files changed, 32 insertions, 54 deletions
diff --git a/Documentation/scheduler/sched-arch.rst b/Documentation/scheduler/sched-arch.rst index 505cd27f9a..ed07efea7d 100644 --- a/Documentation/scheduler/sched-arch.rst +++ b/Documentation/scheduler/sched-arch.rst @@ -10,7 +10,7 @@ Context switch By default, the switch_to arch function is called with the runqueue locked. This is usually not a problem unless switch_to may need to take the runqueue lock. This is usually due to a wake up operation in -the context switch. See arch/ia64/include/asm/switch_to.h for an example. +the context switch. To request the scheduler call switch_to with the runqueue unlocked, you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file @@ -68,7 +68,5 @@ Possible arch/ problems Possible arch problems I found (and either tried to fix or didn't): -ia64 - is safe_halt call racy vs interrupts? (does it sleep?) (See #4a) - sparc - IRQs on at this point(?), change local_irq_save to _disable. - TODO: needs secondary CPUs to disable preempt (See #1) diff --git a/Documentation/scheduler/sched-capacity.rst b/Documentation/scheduler/sched-capacity.rst index e2c1cf7431..de414b33dd 100644 --- a/Documentation/scheduler/sched-capacity.rst +++ b/Documentation/scheduler/sched-capacity.rst @@ -39,14 +39,15 @@ per Hz, leading to:: ------------------- Two different capacity values are used within the scheduler. A CPU's -``capacity_orig`` is its maximum attainable capacity, i.e. its maximum -attainable performance level. A CPU's ``capacity`` is its ``capacity_orig`` to -which some loss of available performance (e.g. time spent handling IRQs) is -subtracted. +``original capacity`` is its maximum attainable capacity, i.e. its maximum +attainable performance level. This original capacity is returned by +the function arch_scale_cpu_capacity(). A CPU's ``capacity`` is its ``original +capacity`` to which some loss of available performance (e.g. time spent +handling IRQs) is subtracted. Note that a CPU's ``capacity`` is solely intended to be used by the CFS class, -while ``capacity_orig`` is class-agnostic. The rest of this document will use -the term ``capacity`` interchangeably with ``capacity_orig`` for the sake of +while ``original capacity`` is class-agnostic. The rest of this document will use +the term ``capacity`` interchangeably with ``original capacity`` for the sake of brevity. 1.3 Platform examples diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst index fc853c8cc3..70e2921ef7 100644 --- a/Documentation/scheduler/sched-energy.rst +++ b/Documentation/scheduler/sched-energy.rst @@ -359,32 +359,9 @@ in milli-Watts or in an 'abstract scale'. 6.3 - Energy Model complexity ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The task wake-up path is very latency-sensitive. When the EM of a platform is -too complex (too many CPUs, too many performance domains, too many performance -states, ...), the cost of using it in the wake-up path can become prohibitive. -The energy-aware wake-up algorithm has a complexity of: - - C = Nd * (Nc + Ns) - -with: Nd the number of performance domains; Nc the number of CPUs; and Ns the -total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8). - -A complexity check is performed at the root domain level, when scheduling -domains are built. EAS will not start on a root domain if its C happens to be -higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the -time of writing). - -If you really want to use EAS but the complexity of your platform's Energy -Model is too high to be used with a single root domain, you're left with only -two possible options: - - 1. split your system into separate, smaller, root domains using exclusive - cpusets and enable EAS locally on each of them. This option has the - benefit to work out of the box but the drawback of preventing load - balance between root domains, which can result in an unbalanced system - overall; - 2. submit patches to reduce the complexity of the EAS wake-up algorithm, - hence enabling it to cope with larger EMs in reasonable time. +EAS does not impose any complexity limit on the number of PDs/OPPs/CPUs but +restricts the number of CPUs to EM_MAX_NUM_CPUS to prevent overflows during +the energy estimation. 6.4 - Schedutil governor diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/scheduler/sched-rt-group.rst index 655a096ec8..d685609ed3 100644 --- a/Documentation/scheduler/sched-rt-group.rst +++ b/Documentation/scheduler/sched-rt-group.rst @@ -39,10 +39,10 @@ Most notable: 1.1 The problem --------------- -Realtime scheduling is all about determinism, a group has to be able to rely on +Real-time scheduling is all about determinism, a group has to be able to rely on the amount of bandwidth (eg. CPU time) being constant. In order to schedule -multiple groups of realtime tasks, each group must be assigned a fixed portion -of the CPU time available. Without a minimum guarantee a realtime group can +multiple groups of real-time tasks, each group must be assigned a fixed portion +of the CPU time available. Without a minimum guarantee a real-time group can obviously fall short. A fuzzy upper limit is of no use since it cannot be relied upon. Which leaves us with just the single fixed portion. @@ -50,14 +50,14 @@ relied upon. Which leaves us with just the single fixed portion. ---------------- CPU time is divided by means of specifying how much time can be spent running -in a given period. We allocate this "run time" for each realtime group which -the other realtime groups will not be permitted to use. +in a given period. We allocate this "run time" for each real-time group which +the other real-time groups will not be permitted to use. -Any time not allocated to a realtime group will be used to run normal priority +Any time not allocated to a real-time group will be used to run normal priority tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by SCHED_OTHER. -Let's consider an example: a frame fixed realtime renderer must deliver 25 +Let's consider an example: a frame fixed real-time renderer must deliver 25 frames a second, which yields a period of 0.04s per frame. Now say it will also have to play some music and respond to input, leaving it with around 80% CPU time dedicated for the graphics. We can then give this group a run time of 0.8 @@ -70,7 +70,7 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s = of 0.00015s. The remaining CPU time will be used for user input and other tasks. Because -realtime tasks have explicitly allocated the CPU time they need to perform +real-time tasks have explicitly allocated the CPU time they need to perform their tasks, buffer underruns in the graphics or audio can be eliminated. NOTE: the above example is not fully implemented yet. We still @@ -87,18 +87,20 @@ lack an EDF scheduler to make non-uniform periods usable. The system wide settings are configured under the /proc virtual file system: /proc/sys/kernel/sched_rt_period_us: - The scheduling period that is equivalent to 100% CPU bandwidth + The scheduling period that is equivalent to 100% CPU bandwidth. /proc/sys/kernel/sched_rt_runtime_us: - A global limit on how much time realtime scheduling may use. Even without - CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime - processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth - available to all realtime groups. + A global limit on how much time real-time scheduling may use. This is always + less or equal to the period_us, as it denotes the time allocated from the + period_us for the real-time tasks. Even without CONFIG_RT_GROUP_SCHED enabled, + this will limit time reserved to real-time processes. With + CONFIG_RT_GROUP_SCHED=y it signifies the total bandwidth available to all + real-time groups. * Time is specified in us because the interface is s32. This gives an operating range from 1us to about 35 minutes. * sched_rt_period_us takes values from 1 to INT_MAX. - * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1). + * sched_rt_runtime_us takes values from -1 to sched_rt_period_us. * A run time of -1 specifies runtime == period, ie. no limit. @@ -108,7 +110,7 @@ The system wide settings are configured under the /proc virtual file system: The default values for sched_rt_period_us (1000000 or 1s) and sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away -realtime tasks will not lock up the machine but leave a little time to recover +real-time tasks will not lock up the machine but leave a little time to recover it. By setting runtime to -1 you'd get the old behaviour back. By default all bandwidth is assigned to the root group and new groups get the @@ -116,10 +118,10 @@ period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you want to assign bandwidth to another group, reduce the root group's bandwidth and assign some or all of the difference to another group. -Realtime group scheduling means you have to assign a portion of total CPU -bandwidth to the group before it will accept realtime tasks. Therefore you will -not be able to run realtime tasks as any user other than root until you have -done that, even if the user has the rights to run processes with realtime +Real-time group scheduling means you have to assign a portion of total CPU +bandwidth to the group before it will accept real-time tasks. Therefore you will +not be able to run real-time tasks as any user other than root until you have +done that, even if the user has the rights to run processes with real-time priority! |