summaryrefslogtreecommitdiffstats
path: root/Documentation/watchdog/hpwdt.txt
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-06 01:02:30 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-06 01:02:30 +0000
commit76cb841cb886eef6b3bee341a2266c76578724ad (patch)
treef5892e5ba6cc11949952a6ce4ecbe6d516d6ce58 /Documentation/watchdog/hpwdt.txt
parentInitial commit. (diff)
downloadlinux-76cb841cb886eef6b3bee341a2266c76578724ad.tar.xz
linux-76cb841cb886eef6b3bee341a2266c76578724ad.zip
Adding upstream version 4.19.249.upstream/4.19.249upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'Documentation/watchdog/hpwdt.txt')
-rw-r--r--Documentation/watchdog/hpwdt.txt97
1 files changed, 97 insertions, 0 deletions
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt
new file mode 100644
index 000000000..6d866c537
--- /dev/null
+++ b/Documentation/watchdog/hpwdt.txt
@@ -0,0 +1,97 @@
+Last reviewed: 05/20/2016
+
+ HPE iLO NMI Watchdog Driver
+ NMI sourcing for iLO based ProLiant Servers
+ Documentation and Driver by
+ Thomas Mingarelli
+
+ The HPE iLO NMI Watchdog driver is a kernel module that provides basic
+ watchdog functionality and the added benefit of NMI sourcing. Both the
+ watchdog functionality and the NMI sourcing capability need to be enabled
+ by the user. Remember that the two modes are not dependent on one another.
+ A user can have the NMI sourcing without the watchdog timer and vice-versa.
+ All references to iLO in this document imply it also works on iLO2 and all
+ subsequent generations.
+
+ Watchdog functionality is enabled like any other common watchdog driver. That
+ is, an application needs to be started that kicks off the watchdog timer. A
+ basic application exists in tools/testing/selftests/watchdog/ named
+ watchdog-test.c. Simply compile the C file and kick it off. If the system
+ gets into a bad state and hangs, the HPE ProLiant iLO timer register will
+ not be updated in a timely fashion and a hardware system reset (also known as
+ an Automatic Server Recovery (ASR)) event will occur.
+
+ The hpwdt driver also has three (3) module parameters. They are the following:
+
+ soft_margin - allows the user to set the watchdog timer value.
+ Default value is 30 seconds.
+ allow_kdump - allows the user to save off a kernel dump image after an NMI.
+ Default value is 1/ON
+ nowayout - basic watchdog parameter that does not allow the timer to
+ be restarted or an impending ASR to be escaped.
+ Default value is set when compiling the kernel. If it is set
+ to "Y", then there is no way of disabling the watchdog once
+ it has been started.
+
+ NOTE: More information about watchdog drivers in general, including the ioctl
+ interface to /dev/watchdog can be found in
+ Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
+
+ The NMI sourcing capability is disabled by default due to the inability to
+ distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
+ Linux kernel. What this means is that the hpwdt nmi handler code is called
+ each time the NMI signal fires off. This could amount to several thousands of
+ NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
+ confused" message in the logs or if the system gets into a hung state, then
+ the hpwdt driver can be reloaded.
+
+ 1. If the kernel has not been booted with nmi_watchdog turned off then
+ edit and place the nmi_watchdog=0 at the end of the currently booting
+ kernel line. Depending on your Linux distribution and platform setup:
+ For non-UEFI systems
+ /boot/grub/grub.conf or
+ /boot/grub/menu.lst
+ For UEFI systems
+ /boot/efi/EFI/distroname/grub.conf or
+ /boot/efi/efi/distroname/elilo.conf
+ 2. reboot the sever
+ 3. Once the system comes up perform a modprobe -r hpwdt
+ 4. modprobe /lib/modules/`uname -r`/kernel/drivers/watchdog/hpwdt.ko
+
+ Now, the hpwdt can successfully receive and source the NMI and provide a log
+ message that details the reason for the NMI (as determined by the HPE BIOS).
+
+ Below is a list of NMIs the HPE BIOS understands along with the associated
+ code (reason):
+
+ No source found 00h
+
+ Uncorrectable Memory Error 01h
+
+ ASR NMI 1Bh
+
+ PCI Parity Error 20h
+
+ NMI Button Press 27h
+
+ SB_BUS_NMI 28h
+
+ ILO Doorbell NMI 29h
+
+ ILO IOP NMI 2Ah
+
+ ILO Watchdog NMI 2Bh
+
+ Proc Throt NMI 2Ch
+
+ Front Side Bus NMI 2Dh
+
+ PCI Express Error 2Fh
+
+ DMA controller NMI 30h
+
+ Hypertransport/CSI Error 31h
+
+
+
+ -- Tom Mingarelli