diff options
Diffstat (limited to 'Documentation/core-api/cpu_hotplug.rst')
-rw-r--r-- | Documentation/core-api/cpu_hotplug.rst | 756 |
1 files changed, 756 insertions, 0 deletions
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst new file mode 100644 index 000000000..f75778d37 --- /dev/null +++ b/Documentation/core-api/cpu_hotplug.rst @@ -0,0 +1,756 @@ +========================= +CPU hotplug in the Kernel +========================= + +:Date: September, 2021 +:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, + Rusty Russell <rusty@rustcorp.com.au>, + Srivatsa Vaddagiri <vatsa@in.ibm.com>, + Ashok Raj <ashok.raj@intel.com>, + Joel Schopp <jschopp@austin.ibm.com>, + Thomas Gleixner <tglx@linutronix.de> + +Introduction +============ + +Modern advances in system architectures have introduced advanced error +reporting and correction capabilities in processors. There are couple OEMS that +support NUMA hardware which are hot pluggable as well, where physical node +insertion and removal require support for CPU hotplug. + +Such advances require CPUs available to a kernel to be removed either for +provisioning reasons, or for RAS purposes to keep an offending CPU off +system execution path. Hence the need for CPU hotplug support in the +Linux kernel. + +A more novel use of CPU-hotplug support is its use today in suspend resume +support for SMP. Dual-core and HT support makes even a laptop run SMP kernels +which didn't support these methods. + + +Command Line Switches +===================== +``maxcpus=n`` + Restrict boot time CPUs to *n*. Say if you have four CPUs, using + ``maxcpus=2`` will only boot two. You can choose to bring the + other CPUs later online. + +``nr_cpus=n`` + Restrict the total amount of CPUs the kernel will support. If the number + supplied here is lower than the number of physically available CPUs, then + those CPUs can not be brought online later. + +``additional_cpus=n`` + Use this to limit hotpluggable CPUs. This option sets + ``cpu_possible_mask = cpu_present_mask + additional_cpus`` + + This option is limited to the IA64 architecture. + +``possible_cpus=n`` + This option sets ``possible_cpus`` bits in ``cpu_possible_mask``. + + This option is limited to the X86 and S390 architecture. + +``cpu0_hotplug`` + Allow to shutdown CPU0. + + This option is limited to the X86 architecture. + +CPU maps +======== + +``cpu_possible_mask`` + Bitmap of possible CPUs that can ever be available in the + system. This is used to allocate some boot time memory for per_cpu variables + that aren't designed to grow/shrink as CPUs are made available or removed. + Once set during boot time discovery phase, the map is static, i.e no bits + are added or removed anytime. Trimming it accurately for your system needs + upfront can save some boot time memory. + +``cpu_online_mask`` + Bitmap of all CPUs currently online. Its set in ``__cpu_up()`` + after a CPU is available for kernel scheduling and ready to receive + interrupts from devices. Its cleared when a CPU is brought down using + ``__cpu_disable()``, before which all OS services including interrupts are + migrated to another target CPU. + +``cpu_present_mask`` + Bitmap of CPUs currently present in the system. Not all + of them may be online. When physical hotplug is processed by the relevant + subsystem (e.g ACPI) can change and new bit either be added or removed + from the map depending on the event is hot-add/hot-remove. There are currently + no locking rules as of now. Typical usage is to init topology during boot, + at which time hotplug is disabled. + +You really don't need to manipulate any of the system CPU maps. They should +be read-only for most use. When setting up per-cpu resources almost always use +``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro +``for_each_cpu()`` can be used to iterate over a custom CPU mask. + +Never use anything other than ``cpumask_t`` to represent bitmap of CPUs. + + +Using CPU hotplug +================= + +The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently +available on multiple architectures including ARM, MIPS, PowerPC and X86. The +configuration is done via the sysfs interface:: + + $ ls -lh /sys/devices/system/cpu + total 0 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6 + drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7 + drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug + -r--r--r-- 1 root root 4.0K Dec 21 16:33 offline + -r--r--r-- 1 root root 4.0K Dec 21 16:33 online + -r--r--r-- 1 root root 4.0K Dec 21 16:33 possible + -r--r--r-- 1 root root 4.0K Dec 21 16:33 present + +The files *offline*, *online*, *possible*, *present* represent the CPU masks. +Each CPU folder contains an *online* file which controls the logical on (1) and +off (0) state. To logically shutdown CPU4:: + + $ echo 0 > /sys/devices/system/cpu/cpu4/online + smpboot: CPU 4 is now offline + +Once the CPU is shutdown, it will be removed from */proc/interrupts*, +*/proc/cpuinfo* and should also not be shown visible by the *top* command. To +bring CPU4 back online:: + + $ echo 1 > /sys/devices/system/cpu/cpu4/online + smpboot: Booting Node 0 Processor 4 APIC 0x1 + +The CPU is usable again. This should work on all CPUs. CPU0 is often special +and excluded from CPU hotplug. On X86 the kernel option +*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to +shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be +used. Some known dependencies of CPU0: + +* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline. +* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected. + +Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies +on CPU0. + +The CPU hotplug coordination +============================ + +The offline case +---------------- + +Once a CPU has been logically shutdown the teardown callbacks of registered +hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating +at state ``CPUHP_OFFLINE``. This includes: + +* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen* + will be set to true. +* All processes are migrated away from this outgoing CPU to new CPUs. + The new CPU is chosen from each process' current cpuset, which may be + a subset of all online CPUs. +* All interrupts targeted to this CPU are migrated to a new CPU +* timers are also migrated to a new CPU +* Once all services are migrated, kernel calls an arch specific routine + ``__cpu_disable()`` to perform arch specific cleanup. + + +The CPU hotplug API +=================== + +CPU hotplug state machine +------------------------- + +CPU hotplug uses a trivial state machine with a linear state space from +CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown +callback. + +When a CPU is onlined, the startup callbacks are invoked sequentially until +the state CPUHP_ONLINE is reached. They can also be invoked when the +callbacks of a state are set up or an instance is added to a multi-instance +state. + +When a CPU is offlined the teardown callbacks are invoked in the reverse +order sequentially until the state CPUHP_OFFLINE is reached. They can also +be invoked when the callbacks of a state are removed or an instance is +removed from a multi-instance state. + +If a usage site requires only a callback in one direction of the hotplug +operations (CPU online or CPU offline) then the other not-required callback +can be set to NULL when the state is set up. + +The state space is divided into three sections: + +* The PREPARE section + + The PREPARE section covers the state space from CPUHP_OFFLINE to + CPUHP_BRINGUP_CPU. + + The startup callbacks in this section are invoked before the CPU is + started during a CPU online operation. The teardown callbacks are invoked + after the CPU has become dysfunctional during a CPU offline operation. + + The callbacks are invoked on a control CPU as they can't obviously run on + the hotplugged CPU which is either not yet started or has become + dysfunctional already. + + The startup callbacks are used to setup resources which are required to + bring a CPU successfully online. The teardown callbacks are used to free + resources or to move pending work to an online CPU after the hotplugged + CPU became dysfunctional. + + The startup callbacks are allowed to fail. If a callback fails, the CPU + online operation is aborted and the CPU is brought down to the previous + state (usually CPUHP_OFFLINE) again. + + The teardown callbacks in this section are not allowed to fail. + +* The STARTING section + + The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1 + and CPUHP_AP_ONLINE. + + The startup callbacks in this section are invoked on the hotplugged CPU + with interrupts disabled during a CPU online operation in the early CPU + setup code. The teardown callbacks are invoked with interrupts disabled + on the hotplugged CPU during a CPU offline operation shortly before the + CPU is completely shut down. + + The callbacks in this section are not allowed to fail. + + The callbacks are used for low level hardware initialization/shutdown and + for core subsystems. + +* The ONLINE section + + The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and + CPUHP_ONLINE. + + The startup callbacks in this section are invoked on the hotplugged CPU + during a CPU online operation. The teardown callbacks are invoked on the + hotplugged CPU during a CPU offline operation. + + The callbacks are invoked in the context of the per CPU hotplug thread, + which is pinned on the hotplugged CPU. The callbacks are invoked with + interrupts and preemption enabled. + + The callbacks are allowed to fail. When a callback fails the hotplug + operation is aborted and the CPU is brought back to the previous state. + +CPU online/offline operations +----------------------------- + +A successful online operation looks like this:: + + [CPUHP_OFFLINE] + [CPUHP_OFFLINE + 1]->startup() -> success + [CPUHP_OFFLINE + 2]->startup() -> success + [CPUHP_OFFLINE + 3] -> skipped because startup == NULL + ... + [CPUHP_BRINGUP_CPU]->startup() -> success + === End of PREPARE section + [CPUHP_BRINGUP_CPU + 1]->startup() -> success + ... + [CPUHP_AP_ONLINE]->startup() -> success + === End of STARTUP section + [CPUHP_AP_ONLINE + 1]->startup() -> success + ... + [CPUHP_ONLINE - 1]->startup() -> success + [CPUHP_ONLINE] + +A successful offline operation looks like this:: + + [CPUHP_ONLINE] + [CPUHP_ONLINE - 1]->teardown() -> success + ... + [CPUHP_AP_ONLINE + 1]->teardown() -> success + === Start of STARTUP section + [CPUHP_AP_ONLINE]->teardown() -> success + ... + [CPUHP_BRINGUP_ONLINE - 1]->teardown() + ... + === Start of PREPARE section + [CPUHP_BRINGUP_CPU]->teardown() + [CPUHP_OFFLINE + 3]->teardown() + [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL + [CPUHP_OFFLINE + 1]->teardown() + [CPUHP_OFFLINE] + +A failed online operation looks like this:: + + [CPUHP_OFFLINE] + [CPUHP_OFFLINE + 1]->startup() -> success + [CPUHP_OFFLINE + 2]->startup() -> success + [CPUHP_OFFLINE + 3] -> skipped because startup == NULL + ... + [CPUHP_BRINGUP_CPU]->startup() -> success + === End of PREPARE section + [CPUHP_BRINGUP_CPU + 1]->startup() -> success + ... + [CPUHP_AP_ONLINE]->startup() -> success + === End of STARTUP section + [CPUHP_AP_ONLINE + 1]->startup() -> success + --- + [CPUHP_AP_ONLINE + N]->startup() -> fail + [CPUHP_AP_ONLINE + (N - 1)]->teardown() + ... + [CPUHP_AP_ONLINE + 1]->teardown() + === Start of STARTUP section + [CPUHP_AP_ONLINE]->teardown() + ... + [CPUHP_BRINGUP_ONLINE - 1]->teardown() + ... + === Start of PREPARE section + [CPUHP_BRINGUP_CPU]->teardown() + [CPUHP_OFFLINE + 3]->teardown() + [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL + [CPUHP_OFFLINE + 1]->teardown() + [CPUHP_OFFLINE] + +A failed offline operation looks like this:: + + [CPUHP_ONLINE] + [CPUHP_ONLINE - 1]->teardown() -> success + ... + [CPUHP_ONLINE - N]->teardown() -> fail + [CPUHP_ONLINE - (N - 1)]->startup() + ... + [CPUHP_ONLINE - 1]->startup() + [CPUHP_ONLINE] + +Recursive failures cannot be handled sensibly. Look at the following +example of a recursive fail due to a failed offline operation: :: + + [CPUHP_ONLINE] + [CPUHP_ONLINE - 1]->teardown() -> success + ... + [CPUHP_ONLINE - N]->teardown() -> fail + [CPUHP_ONLINE - (N - 1)]->startup() -> success + [CPUHP_ONLINE - (N - 2)]->startup() -> fail + +The CPU hotplug state machine stops right here and does not try to go back +down again because that would likely result in an endless loop:: + + [CPUHP_ONLINE - (N - 1)]->teardown() -> success + [CPUHP_ONLINE - N]->teardown() -> fail + [CPUHP_ONLINE - (N - 1)]->startup() -> success + [CPUHP_ONLINE - (N - 2)]->startup() -> fail + [CPUHP_ONLINE - (N - 1)]->teardown() -> success + [CPUHP_ONLINE - N]->teardown() -> fail + +Lather, rinse and repeat. In this case the CPU left in state:: + + [CPUHP_ONLINE - (N - 1)] + +which at least lets the system make progress and gives the user a chance to +debug or even resolve the situation. + +Allocating a state +------------------ + +There are two ways to allocate a CPU hotplug state: + +* Static allocation + + Static allocation has to be used when the subsystem or driver has + ordering requirements versus other CPU hotplug states. E.g. the PERF core + startup callback has to be invoked before the PERF driver startup + callbacks during a CPU online operation. During a CPU offline operation + the driver teardown callbacks have to be invoked before the core teardown + callback. The statically allocated states are described by constants in + the cpuhp_state enum which can be found in include/linux/cpuhotplug.h. + + Insert the state into the enum at the proper place so the ordering + requirements are fulfilled. The state constant has to be used for state + setup and removal. + + Static allocation is also required when the state callbacks are not set + up at runtime and are part of the initializer of the CPU hotplug state + array in kernel/cpu.c. + +* Dynamic allocation + + When there are no ordering requirements for the state callbacks then + dynamic allocation is the preferred method. The state number is allocated + by the setup function and returned to the caller on success. + + Only the PREPARE and ONLINE sections provide a dynamic allocation + range. The STARTING section does not as most of the callbacks in that + section have explicit ordering requirements. + +Setup of a CPU hotplug state +---------------------------- + +The core code provides the following functions to setup a state: + +* cpuhp_setup_state(state, name, startup, teardown) +* cpuhp_setup_state_nocalls(state, name, startup, teardown) +* cpuhp_setup_state_cpuslocked(state, name, startup, teardown) +* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown) + +For cases where a driver or a subsystem has multiple instances and the same +CPU hotplug state callbacks need to be invoked for each instance, the CPU +hotplug core provides multi-instance support. The advantage over driver +specific instance lists is that the instance related functions are fully +serialized against CPU hotplug operations and provide the automatic +invocations of the state callbacks on add and removal. To set up such a +multi-instance state the following function is available: + +* cpuhp_setup_state_multi(state, name, startup, teardown) + +The @state argument is either a statically allocated state or one of the +constants for dynamically allocated states - CPUHP_PREPARE_DYN, +CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for +which a dynamic state should be allocated. + +The @name argument is used for sysfs output and for instrumentation. The +naming convention is "subsys:mode" or "subsys/driver:mode", +e.g. "perf:mode" or "perf/x86:mode". The common mode names are: + +======== ======================================================= +prepare For states in the PREPARE section + +dead For states in the PREPARE section which do not provide + a startup callback + +starting For states in the STARTING section + +dying For states in the STARTING section which do not provide + a startup callback + +online For states in the ONLINE section + +offline For states in the ONLINE section which do not provide + a startup callback +======== ======================================================= + +As the @name argument is only used for sysfs and instrumentation other mode +descriptors can be used as well if they describe the nature of the state +better than the common ones. + +Examples for @name arguments: "perf/online", "perf/x86:prepare", +"RCU/tree:dying", "sched/waitempty" + +The @startup argument is a function pointer to the callback which should be +invoked during a CPU online operation. If the usage site does not require a +startup callback set the pointer to NULL. + +The @teardown argument is a function pointer to the callback which should +be invoked during a CPU offline operation. If the usage site does not +require a teardown callback set the pointer to NULL. + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked() + and cpuhp_setup_state_multi() only install the callbacks + + * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the + callbacks and invoke the @startup callback (if not NULL) for all online + CPUs which have currently a state greater than the newly installed + state. Depending on the state section the callback is either invoked on + the current CPU (PREPARE section) or on each online CPU (ONLINE + section) in the context of the CPU's hotplug thread. + + If a callback fails for CPU N then the teardown callback for CPU + 0 .. N-1 is invoked to rollback the operation. The state setup fails, + the callbacks for the state are not installed and in case of dynamic + allocation the allocated state is freed. + +The state setup and the callback invocations are serialized against CPU +hotplug operations. If the setup function has to be called from a CPU +hotplug read locked region, then the _cpuslocked() variants have to be +used. These functions cannot be used from within CPU hotplug callbacks. + +The function return values: + ======== =================================================================== + 0 Statically allocated state was successfully set up + + >0 Dynamically allocated state was successfully set up. + + The returned number is the state number which was allocated. If + the state callbacks have to be removed later, e.g. module + removal, then this number has to be saved by the caller and used + as @state argument for the state remove function. For + multi-instance states the dynamically allocated state number is + also required as @state argument for the instance add/remove + operations. + + <0 Operation failed + ======== =================================================================== + +Removal of a CPU hotplug state +------------------------------ + +To remove a previously set up state, the following functions are provided: + +* cpuhp_remove_state(state) +* cpuhp_remove_state_nocalls(state) +* cpuhp_remove_state_nocalls_cpuslocked(state) +* cpuhp_remove_multi_state(state) + +The @state argument is either a statically allocated state or the state +number which was allocated in the dynamic range by cpuhp_setup_state*(). If +the state is in the dynamic range, then the state number is freed and +available for dynamic allocation again. + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked() + and cpuhp_remove_multi_state() only remove the callbacks. + + * cpuhp_remove_state() removes the callbacks and invokes the teardown + callback (if not NULL) for all online CPUs which have currently a state + greater than the removed state. Depending on the state section the + callback is either invoked on the current CPU (PREPARE section) or on + each online CPU (ONLINE section) in the context of the CPU's hotplug + thread. + + In order to complete the removal, the teardown callback should not fail. + +The state removal and the callback invocations are serialized against CPU +hotplug operations. If the remove function has to be called from a CPU +hotplug read locked region, then the _cpuslocked() variants have to be +used. These functions cannot be used from within CPU hotplug callbacks. + +If a multi-instance state is removed then the caller has to remove all +instances first. + +Multi-Instance state instance management +---------------------------------------- + +Once the multi-instance state is set up, instances can be added to the +state: + + * cpuhp_state_add_instance(state, node) + * cpuhp_state_add_instance_nocalls(state, node) + +The @state argument is either a statically allocated state or the state +number which was allocated in the dynamic range by cpuhp_setup_state_multi(). + +The @node argument is a pointer to an hlist_node which is embedded in the +instance's data structure. The pointer is handed to the multi-instance +state callbacks and can be used by the callback to retrieve the instance +via container_of(). + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_state_add_instance_nocalls() and only adds the instance to the + multi-instance state's node list. + + * cpuhp_state_add_instance() adds the instance and invokes the startup + callback (if not NULL) associated with @state for all online CPUs which + have currently a state greater than @state. The callback is only + invoked for the to be added instance. Depending on the state section + the callback is either invoked on the current CPU (PREPARE section) or + on each online CPU (ONLINE section) in the context of the CPU's hotplug + thread. + + If a callback fails for CPU N then the teardown callback for CPU + 0 .. N-1 is invoked to rollback the operation, the function fails and + the instance is not added to the node list of the multi-instance state. + +To remove an instance from the state's node list these functions are +available: + + * cpuhp_state_remove_instance(state, node) + * cpuhp_state_remove_instance_nocalls(state, node) + +The arguments are the same as for the cpuhp_state_add_instance*() +variants above. + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_state_remove_instance_nocalls() only removes the instance from the + state's node list. + + * cpuhp_state_remove_instance() removes the instance and invokes the + teardown callback (if not NULL) associated with @state for all online + CPUs which have currently a state greater than @state. The callback is + only invoked for the to be removed instance. Depending on the state + section the callback is either invoked on the current CPU (PREPARE + section) or on each online CPU (ONLINE section) in the context of the + CPU's hotplug thread. + + In order to complete the removal, the teardown callback should not fail. + +The node list add/remove operations and the callback invocations are +serialized against CPU hotplug operations. These functions cannot be used +from within CPU hotplug callbacks and CPU hotplug read locked regions. + +Examples +-------- + +Setup and teardown a statically allocated state in the STARTING section for +notifications on online and offline operations:: + + ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying); + if (ret < 0) + return ret; + .... + cpuhp_remove_state(CPUHP_SUBSYS_STARTING); + +Setup and teardown a dynamically allocated state in the ONLINE section +for notifications on offline operations:: + + state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); + if (state < 0) + return state; + .... + cpuhp_remove_state(state); + +Setup and teardown a dynamically allocated state in the ONLINE section +for notifications on online operations without invoking the callbacks:: + + state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); + if (state < 0) + return state; + .... + cpuhp_remove_state_nocalls(state); + +Setup, use and teardown a dynamically allocated multi-instance state in the +ONLINE section for notifications on online and offline operation:: + + state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); + if (state < 0) + return state; + .... + ret = cpuhp_state_add_instance(state, &inst1->node); + if (ret) + return ret; + .... + ret = cpuhp_state_add_instance(state, &inst2->node); + if (ret) + return ret; + .... + cpuhp_remove_instance(state, &inst1->node); + .... + cpuhp_remove_instance(state, &inst2->node); + .... + remove_multi_state(state); + + +Testing of hotplug states +========================= + +One way to verify whether a custom state is working as expected or not is to +shutdown a CPU and then put it online again. It is also possible to put the CPU +to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to +*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE* +which would lead to rollback to the online state. + +All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` :: + + $ tail /sys/devices/system/cpu/hotplug/states + 138: mm/vmscan:online + 139: mm/vmstat:online + 140: lib/percpu_cnt:online + 141: acpi/cpu-drv:online + 142: base/cacheinfo:online + 143: virtio/net:online + 144: x86/mce:online + 145: printk:online + 168: sched:active + 169: online + +To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue:: + + $ cat /sys/devices/system/cpu/cpu4/hotplug/state + 169 + $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target + $ cat /sys/devices/system/cpu/cpu4/hotplug/state + 140 + +It is important to note that the teardown callback of state 140 have been +invoked. And now get back online:: + + $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target + $ cat /sys/devices/system/cpu/cpu4/hotplug/state + 169 + +With trace events enabled, the individual steps are visible, too:: + + # TASK-PID CPU# TIMESTAMP FUNCTION + # | | | | | + bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work) + cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate) + cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 + cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down) + cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 + cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep) + cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 + cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down) + cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 + bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0 + bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work) + cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online) + cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0 + cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online) + cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 + cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online) + cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 + cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online) + cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 + cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify) + cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0 + cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate) + cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 + bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0 + +As it an be seen, CPU4 went down until timestamp 22.996 and then back up until +95.552. All invoked callbacks including their return codes are visible in the +trace. + +Architecture's requirements +=========================== + +The following functions and configurations are required: + +``CONFIG_HOTPLUG_CPU`` + This entry needs to be enabled in Kconfig + +``__cpu_up()`` + Arch interface to bring up a CPU + +``__cpu_disable()`` + Arch interface to shutdown a CPU, no more interrupts can be handled by the + kernel after the routine returns. This includes the shutdown of the timer. + +``__cpu_die()`` + This actually supposed to ensure death of the CPU. Actually look at some + example code in other arch that implement CPU hotplug. The processor is taken + down from the ``idle()`` loop for that specific architecture. ``__cpu_die()`` + typically waits for some per_cpu state to be set, to ensure the processor dead + routine is called to be sure positively. + +User Space Notification +======================= + +After CPU successfully onlined or offline udev events are sent. A udev rule like:: + + SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh" + +will receive all events. A script like:: + + #!/bin/sh + + if [ "${ACTION}" = "offline" ] + then + echo "CPU ${DEVPATH##*/} offline" + + elif [ "${ACTION}" = "online" ] + then + echo "CPU ${DEVPATH##*/} online" + + fi + +can process the event further. + +Kernel Inline Documentations Reference +====================================== + +.. kernel-doc:: include/linux/cpuhotplug.h |