Adding upstream version 4.19.249.upstream/4.19.249

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-05-06 01:02:30 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-05-06 01:02:30 +0000
commit: 76cb841cb886eef6b3bee341a2266c76578724ad (patch)
tree: f5892e5ba6cc11949952a6ce4ecbe6d516d6ce58 /Documentation/s390
parent: Initial commit. (diff)
download: linux-76cb841cb886eef6b3bee341a2266c76578724ad.tar.xz
linux-76cb841cb886eef6b3bee341a2266c76578724ad.zip
14 files changed, 4780 insertions, 0 deletions
diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX
new file mode 100644
index 000000000..317f0378a
--- /dev/null
+++ b/Documentation/s390/00-INDEX
@@ -0,0 +1,28 @@
+00-INDEX
+	- this file.
+3270.ChangeLog
+	- ChangeLog for the UTS Global 3270-support patch (outdated).
+3270.txt
+	- how to use the IBM 3270 display system support.
+cds.txt
+	- s390 common device support (common I/O layer).
+CommonIO
+	- common I/O layer command line parameters, procfs and debugfs	entries
+config3270.sh
+	- example configuration for 3270 devices.
+DASD
+	- information on the DASD disk device driver.
+Debugging390.txt
+	- hints for debugging on s390 systems.
+driver-model.txt
+	- information on s390 devices and the driver model.
+monreader.txt
+	- information on accessing the z/VM monitor stream from Linux.
+qeth.txt
+	- HiperSockets Bridge Port Support.
+s390dbf.txt
+	- information on using the s390 debug feature.
+vfio-ccw.txt
+	  information on the vfio-ccw I/O subchannel driver.
+zfcpdump.txt
+	- information on the s390 SCSI dump tool.
diff --git a/Documentation/s390/3270.ChangeLog b/Documentation/s390/3270.ChangeLog
new file mode 100644
index 000000000..031c36081
--- /dev/null
+++ b/Documentation/s390/3270.ChangeLog
@@ -0,0 +1,44 @@
+ChangeLog for the UTS Global 3270-support patch
+
+Sep 2002:	Get bootup colors right on 3270 console
+	* In tubttybld.c, substantially revise ESC processing so that
+	  ESC sequences (especially coloring ones) and the strings
+	  they affect work as right as 3270 can get them.  Also, set
+	  screen height to omit the two rows used for input area, in
+	  tty3270_open() in tubtty.c.
+
+Sep 2002:	Dynamically get 3270 input buffer
+	* Oversize 3270 screen widths may exceed GEOM_MAXINPLEN columns,
+	  so get input-area buffer dynamically when sizing the device in
+	  tubmakemin() in tuball.c (if it's the console) or tty3270_open()
+	  in tubtty.c (if needed).  Change tubp->tty_input to be a
+	  pointer rather than an array, in tubio.h.
+
+Sep 2002:	Fix tubfs kmalloc()s
+	* Do read and write lengths correctly in fs3270_read()
+	  and fs3270_write(), whilst never asking kmalloc()
+	  for more than 0x800 bytes.  Affects tubfs.c and tubio.h.
+
+Sep 2002:	Recognize 3270 control unit type 3174
+	* Recognize control-unit type 0x3174 as well as 0x327?.
+	  The IBM 2047 device emulates a 3174 control unit.
+	  Modularize control-unit recognition in tuball.c by
+	  adding and invoking new tub3270_is_ours().
+
+Apr 2002:	Fix 3270 console reboot loop
+	* (Belated log entry) Fixed reboot loop if 3270 console,
+	  in tubtty.c:ttu3270_bh().
+
+Feb 6, 2001:
+	* This changelog is new
+	* tub3270 now supports 3270 console:
+		Specify y for CONFIG_3270 and y for CONFIG_3270_CONSOLE.
+		Support for 3215 will not appear if 3270 console support
+		is chosen.
+		NOTE:  The default is 3270 console support, NOT 3215.
+	* the components are remodularized: added source modules are
+	  tubttybld.c and tubttyscl.c, for screen-building code and
+	  scroll-timeout code.
+	* tub3270 source for this (2.4.0) version is #ifdeffed to
+	  build with both 2.4.0 and 2.2.16.2.
+	* color support and minimal other ESC-sequence support is added.
diff --git a/Documentation/s390/3270.txt b/Documentation/s390/3270.txt
new file mode 100644
index 000000000..7c715de99
--- /dev/null
+++ b/Documentation/s390/3270.txt
@@ -0,0 +1,271 @@
+IBM 3270 Display System support
+
+This file describes the driver that supports local channel attachment
+of IBM 3270 devices.  It consists of three sections:
+	* Introduction
+	* Installation
+	* Operation
+
+
+INTRODUCTION.
+
+This paper describes installing and operating 3270 devices under
+Linux/390.  A 3270 device is a block-mode rows-and-columns terminal of
+which I'm sure hundreds of millions were sold by IBM and clonemakers
+twenty and thirty years ago.
+
+You may have 3270s in-house and not know it.  If you're using the
+VM-ESA operating system, define a 3270 to your virtual machine by using
+the command "DEF GRAF <hex-address>"  This paper presumes you will be
+defining four 3270s with the CP/CMS commands
+
+	DEF GRAF 620
+	DEF GRAF 621
+	DEF GRAF 622
+	DEF GRAF 623
+
+Your network connection from VM-ESA allows you to use x3270, tn3270, or
+another 3270 emulator, started from an xterm window on your PC or
+workstation.  With the DEF GRAF command, an application such as xterm,
+and this Linux-390 3270 driver, you have another way of talking to your
+Linux box.
+
+This paper covers installation of the driver and operation of a
+dialed-in x3270.
+
+
+INSTALLATION.
+
+You install the driver by installing a patch, doing a kernel build, and
+running the configuration script (config3270.sh, in this directory).
+
+WARNING:  If you are using 3270 console support, you must rerun the
+configuration script every time you change the console's address (perhaps
+by using the condev= parameter in silo's /boot/parmfile).  More precisely,
+you should rerun the configuration script every time your set of 3270s,
+including the console 3270, changes subchannel identifier relative to
+one another.  ReIPL as soon as possible after running the configuration
+script and the resulting /tmp/mkdev3270.
+
+If you have chosen to make tub3270 a module, you add a line to a
+configuration file under /etc/modprobe.d/.  If you are working on a VM
+virtual machine, you can use DEF GRAF to define virtual 3270 devices.
+
+You may generate both 3270 and 3215 console support, or one or the
+other, or neither.  If you generate both, the console type under VM is
+not changed.  Use #CP Q TERM to see what the current console type is.
+Use #CP TERM CONMODE 3270 to change it to 3270.  If you generate only
+3270 console support, then the driver automatically converts your console
+at boot time to a 3270 if it is a 3215.
+
+In brief, these are the steps:
+	1. Install the tub3270 patch
+	2. (If a module) add a line to a file in /etc/modprobe.d/*.conf
+	3. (If VM) define devices with DEF GRAF
+	4. Reboot
+	5. Configure
+
+To test that everything works, assuming VM and x3270,
+	1. Bring up an x3270 window.
+	2. Use the DIAL command in that window.
+	3. You should immediately see a Linux login screen.
+
+Here are the installation steps in detail:
+
+	1.  The 3270 driver is a part of the official Linux kernel
+	source.  Build a tree with the kernel source and any necessary
+	patches.  Then do
+		make oldconfig
+		(If you wish to disable 3215 console support, edit
+		.config; change CONFIG_TN3215's value to "n";
+		and rerun "make oldconfig".)
+		make image
+		make modules
+		make modules_install
+
+	2. (Perform this step only if you have configured tub3270 as a
+	module.)  Add a line to a file /etc/modprobe.d/*.conf to automatically
+	load the driver when it's needed.  With this line added, you will see
+	login prompts appear on your 3270s as soon as boot is complete (or
+	with emulated 3270s, as soon as you dial into your vm guest using the
+	command "DIAL <vmguestname>").  Since the line-mode major number is
+	227, the line to add should be:
+		alias char-major-227 tub3270
+
+	3. Define graphic devices to your vm guest machine, if you
+	haven't already.  Define them before you reboot (reipl):
+		DEFINE GRAF 620
+		DEFINE GRAF 621
+		DEFINE GRAF 622
+		DEFINE GRAF 623
+
+	4. Reboot.  The reboot process scans hardware devices, including
+	3270s, and this enables the tub3270 driver once loaded to respond
+	correctly to the configuration requests of the next step.  If
+	you have chosen 3270 console support, your console now behaves
+	as a 3270, not a 3215.
+
+	5. Run the 3270 configuration script config3270.  It is
+	distributed in this same directory, Documentation/s390, as
+	config3270.sh.	Inspect the output script it produces,
+	/tmp/mkdev3270, and then run that script.  This will create the
+	necessary character special device files and make the necessary
+	changes to /etc/inittab.
+
+	Then notify /sbin/init that /etc/inittab has changed, by issuing
+	the telinit command with the q operand:
+		cd Documentation/s390
+		sh config3270.sh
+		sh /tmp/mkdev3270
+		telinit q
+
+	This should be sufficient for your first time.	If your 3270
+	configuration has changed and you're reusing config3270, you
+	should follow these steps:
+		Change 3270 configuration
+		Reboot
+		Run config3270 and /tmp/mkdev3270
+		Reboot
+
+Here are the testing steps in detail:
+
+	1. Bring up an x3270 window, or use an actual hardware 3278 or
+	3279, or use the 3270 emulator of your choice.  You would be
+	running the emulator on your PC or workstation.  You would use
+	the command, for example,
+		x3270 vm-esa-domain-name &
+	if you wanted a 3278 Model 4 with 43 rows of 80 columns, the
+	default model number.  The driver does not take advantage of
+	extended attributes.
+
+	The screen you should now see contains a VM logo with input
+	lines near the bottom.  Use TAB to move to the bottom line,
+	probably labeled "COMMAND  ===>".
+
+	2. Use the DIAL command instead of the LOGIN command to connect
+	to one of the virtual 3270s you defined with the DEF GRAF
+	commands:
+		dial my-vm-guest-name
+
+	3. You should immediately see a login prompt from your
+	Linux-390 operating system.  If that does not happen, you would
+	see instead the line "DIALED TO my-vm-guest-name   0620".
+
+	To troubleshoot:  do these things.
+
+	A. Is the driver loaded?  Use the lsmod command (no operands)
+	to find out.  Probably it isn't.  Try loading it manually, with
+	the command "insmod tub3270".  Does that command give error
+	messages?  Ha!  There's your problem.
+
+	B. Is the /etc/inittab file modified as in installation step 3
+	above?  Use the grep command to find out; for instance, issue
+	"grep 3270 /etc/inittab".  Nothing found?  There's your
+	problem!
+
+	C. Are the device special files created, as in installation
+	step 2 above?  Use the ls -l command to find out; for instance,
+	issue "ls -l /dev/3270/tty620".  The output should start with the
+	letter "c" meaning character device and should contain "227, 1"
+	just to the left of the device name.  No such file?  no "c"?
+	Wrong major number?  Wrong minor number?  There's your
+	problem!
+
+	D. Do you get the message
+		 "HCPDIA047E my-vm-guest-name 0620 does not exist"?
+	If so, you must issue the command "DEF GRAF 620" from your VM
+	3215 console and then reboot the system.
+
+
+
+OPERATION.
+
+The driver defines three areas on the 3270 screen:  the log area, the
+input area, and the status area.
+
+The log area takes up all but the bottom two lines of the screen.  The
+driver writes terminal output to it, starting at the top line and going
+down.  When it fills, the status area changes from "Linux Running" to
+"Linux More...".  After a scrolling timeout of (default) 5 sec, the
+screen clears and more output is written, from the top down.
+
+The input area extends from the beginning of the second-to-last screen
+line to the start of the status area.  You type commands in this area
+and hit ENTER to execute them.
+
+The status area initializes to "Linux Running" to give you a warm
+fuzzy feeling.  When the log area fills up and output awaits, it
+changes to "Linux More...".  At this time you can do several things or
+nothing.  If you do nothing, the screen will clear in (default) 5 sec
+and more output will appear.  You may hit ENTER with nothing typed in
+the input area to toggle between "Linux More..." and "Linux Holding",
+which indicates no scrolling will occur.  (If you hit ENTER with "Linux
+Running" and nothing typed, the application receives a newline.)
+
+You may change the scrolling timeout value.  For example, the following
+command line:
+	echo scrolltime=60 > /proc/tty/driver/tty3270
+changes the scrolling timeout value to 60 sec.  Set scrolltime to 0 if
+you wish to prevent scrolling entirely.
+
+Other things you may do when the log area fills up are:  hit PA2 to
+clear the log area and write more output to it, or hit CLEAR to clear
+the log area and the input area and write more output to the log area.
+
+Some of the Program Function (PF) and Program Attention (PA) keys are
+preassigned special functions.  The ones that are not yield an alarm
+when pressed.
+
+PA1 causes a SIGINT to the currently running application.  You may do
+the same thing from the input area, by typing "^C" and hitting ENTER.
+
+PA2 causes the log area to be cleared.  If output awaits, it is then
+written to the log area.
+
+PF3 causes an EOF to be received as input by the application.  You may
+cause an EOF also by typing "^D" and hitting ENTER.
+
+No PF key is preassigned to cause a job suspension, but you may cause a
+job suspension by typing "^Z" and hitting ENTER.  You may wish to
+assign this function to a PF key.  To make PF7 cause job suspension,
+execute the command:
+	echo pf7=^z > /proc/tty/driver/tty3270
+
+If the input you type does not end with the two characters "^n", the
+driver appends a newline character and sends it to the tty driver;
+otherwise the driver strips the "^n" and does not append a newline.
+The IBM 3215 driver behaves similarly.
+
+Pf10 causes the most recent command to be retrieved from the tube's
+command stack (default depth 20) and displayed in the input area.  You
+may hit PF10 again for the next-most-recent command, and so on.  A
+command is entered into the stack only when the input area is not made
+invisible (such as for password entry) and it is not identical to the
+current top entry.  PF10 rotates backward through the command stack;
+PF11 rotates forward.  You may assign the backward function to any PF
+key (or PA key, for that matter), say, PA3, with the command:
+	echo -e pa3=\\033k > /proc/tty/driver/tty3270
+This assigns the string ESC-k to PA3.  Similarly, the string ESC-j
+performs the forward function.  (Rationale:  In bash with vi-mode line
+editing, ESC-k and ESC-j retrieve backward and forward history.
+Suggestions welcome.)
+
+Is a stack size of twenty commands not to your liking?  Change it on
+the fly.  To change to saving the last 100 commands, execute the
+command:
+	echo recallsize=100 > /proc/tty/driver/tty3270
+
+Have a command you issue frequently?  Assign it to a PF or PA key!  Use
+the command
+	echo pf24="mkdir foobar; cd foobar" > /proc/tty/driver/tty3270 
+to execute the commands mkdir foobar and cd foobar immediately when you
+hit PF24.  Want to see the command line first, before you execute it?
+Use the -n option of the echo command:
+	echo -n pf24="mkdir foo; cd foo" > /proc/tty/driver/tty3270
+
+
+
+Happy testing!  I welcome any and all comments about this document, the
+driver, etc etc.
+
+Dick Hitt <rbh00@utsglobal.com>
diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO
new file mode 100644
index 000000000..6e0f63f34
--- /dev/null
+++ b/Documentation/s390/CommonIO
@@ -0,0 +1,125 @@
+S/390 common I/O-Layer - command line parameters, procfs and debugfs entries
+============================================================================
+
+Command line parameters
+-----------------------
+
+* ccw_timeout_log
+
+  Enable logging of debug information in case of ccw device timeouts.
+
+* cio_ignore = device[,device[,..]]
+
+	device := {all | [!]ipldev | [!]condev | [!]<devno> | [!]<devno>-<devno>}
+
+  The given devices will be ignored by the common I/O-layer; no detection
+  and device sensing will be done on any of those devices. The subchannel to 
+  which the device in question is attached will be treated as if no device was
+  attached.
+
+  An ignored device can be un-ignored later; see the "/proc entries"-section for
+  details.
+
+  The devices must be given either as bus ids (0.x.abcd) or as hexadecimal
+  device numbers (0xabcd or abcd, for 2.4 backward compatibility). If you
+  give a device number 0xabcd, it will be interpreted as 0.0.abcd.
+
+  You can use the 'all' keyword to ignore all devices. The 'ipldev' and 'condev'
+  keywords can be used to refer to the CCW based boot device and CCW console
+  device respectively (these are probably useful only when combined with the '!'
+  operator). The '!' operator will cause the I/O-layer to _not_ ignore a device.
+  The command line is parsed from left to right.
+
+  For example, 
+	cio_ignore=0.0.0023-0.0.0042,0.0.4711
+  will ignore all devices ranging from 0.0.0023 to 0.0.0042 and the device
+  0.0.4711, if detected.
+  As another example,
+	cio_ignore=all,!0.0.4711,!0.0.fd00-0.0.fd02
+  will ignore all devices but 0.0.4711, 0.0.fd00, 0.0.fd01, 0.0.fd02.
+
+  By default, no devices are ignored.
+
+
+/proc entries
+-------------
+
+* /proc/cio_ignore
+
+  Lists the ranges of devices (by bus id) which are ignored by common I/O.
+
+  You can un-ignore certain or all devices by piping to /proc/cio_ignore. 
+  "free all" will un-ignore all ignored devices, 
+  "free <device range>, <device range>, ..." will un-ignore the specified
+  devices.
+
+  For example, if devices 0.0.0023 to 0.0.0042 and 0.0.4711 are ignored,
+  - echo free 0.0.0030-0.0.0032 > /proc/cio_ignore
+    will un-ignore devices 0.0.0030 to 0.0.0032 and will leave devices 0.0.0023
+    to 0.0.002f, 0.0.0033 to 0.0.0042 and 0.0.4711 ignored;
+  - echo free 0.0.0041 > /proc/cio_ignore will furthermore un-ignore device
+    0.0.0041;
+  - echo free all > /proc/cio_ignore will un-ignore all remaining ignored 
+    devices.
+
+  When a device is un-ignored, device recognition and sensing is performed and 
+  the device driver will be notified if possible, so the device will become
+  available to the system. Note that un-ignoring is performed asynchronously.
+
+  You can also add ranges of devices to be ignored by piping to 
+  /proc/cio_ignore; "add <device range>, <device range>, ..." will ignore the
+  specified devices.
+
+  Note: While already known devices can be added to the list of devices to be
+        ignored, there will be no effect on then. However, if such a device
+	disappears and then reappears, it will then be ignored. To make
+	known devices go away, you need the "purge" command (see below).
+
+  For example,
+	"echo add 0.0.a000-0.0.accc, 0.0.af00-0.0.afff > /proc/cio_ignore"
+  will add 0.0.a000-0.0.accc and 0.0.af00-0.0.afff to the list of ignored
+  devices.
+
+  You can remove already known but now ignored devices via
+	"echo purge > /proc/cio_ignore"
+  All devices ignored but still registered and not online (= not in use)
+  will be deregistered and thus removed from the system.
+
+  The devices can be specified either by bus id (0.x.abcd) or, for 2.4 backward
+  compatibility, by the device number in hexadecimal (0xabcd or abcd). Device
+  numbers given as 0xabcd will be interpreted as 0.0.abcd.
+
+* /proc/cio_settle
+
+  A write request to this file is blocked until all queued cio actions are
+  handled. This will allow userspace to wait for pending work affecting
+  device availability after changing cio_ignore or the hardware configuration.
+
+* For some of the information present in the /proc filesystem in 2.4 (namely,
+  /proc/subchannels and /proc/chpids), see driver-model.txt.
+  Information formerly in /proc/irq_count is now in /proc/interrupts.
+
+
+debugfs entries
+---------------
+
+* /sys/kernel/debug/s390dbf/cio_*/ (S/390 debug feature)
+
+  Some views generated by the debug feature to hold various debug outputs.
+
+  - /sys/kernel/debug/s390dbf/cio_crw/sprintf
+    Messages from the processing of pending channel report words (machine check
+    handling).
+
+  - /sys/kernel/debug/s390dbf/cio_msg/sprintf
+    Various debug messages from the common I/O-layer.
+
+  - /sys/kernel/debug/s390dbf/cio_trace/hex_ascii
+    Logs the calling of functions in the common I/O-layer and, if applicable, 
+    which subchannel they were called for, as well as dumps of some data
+    structures (like irb in an error case).
+
+  The level of logging can be changed to be more or less verbose by piping to 
+  /sys/kernel/debug/s390dbf/cio_*/level a number between 0 and 6; see the
+  documentation on the S/390 debug feature (Documentation/s390/s390dbf.txt)
+  for details.
diff --git a/Documentation/s390/DASD b/Documentation/s390/DASD
new file mode 100644
index 000000000..9963f1e9c
--- /dev/null
+++ b/Documentation/s390/DASD
@@ -0,0 +1,73 @@
+DASD device driver
+
+S/390's disk devices (DASDs) are managed by Linux via the DASD device
+driver. It is valid for all types of DASDs and represents them to
+Linux as block devices, namely "dd". Currently the DASD driver uses a
+single major number (254) and 4 minor numbers per volume (1 for the
+physical volume and 3 for partitions). With respect to partitions see
+below. Thus you may have up to 64 DASD devices in your system.
+
+The kernel parameter 'dasd=from-to,...' may be issued arbitrary times
+in the kernel's parameter line or not at all. The 'from' and 'to'
+parameters are to be given in hexadecimal notation without a leading
+0x.
+If you supply kernel parameters the different instances are processed
+in order of appearance and a minor number is reserved for any device
+covered by the supplied range up to 64 volumes. Additional DASDs are
+ignored. If you do not supply the 'dasd=' kernel parameter at all, the 
+DASD driver registers all supported DASDs of your system to a minor
+number in ascending order of the subchannel number.
+
+The driver currently supports ECKD-devices and there are stubs for
+support of the FBA and CKD architectures. For the FBA architecture
+only some smart data structures are missing to make the support
+complete. 
+We performed our testing on 3380 and 3390 type disks of different
+sizes, under VM and on the bare hardware (LPAR), using internal disks
+of the multiprise as well as a RAMAC virtual array. Disks exported by
+an Enterprise Storage Server (Seascape) should work fine as well.
+
+We currently implement one partition per volume, which is the whole
+volume, skipping the first blocks up to the volume label. These are
+reserved for IPL records and IBM's volume label to assure
+accessibility of the DASD from other OSs. In a later stage we will
+provide support of partitions, maybe VTOC oriented or using a kind of
+partition table in the label record.
+
+USAGE
+
+-Low-level format (?CKD only)
+For using an ECKD-DASD as a Linux harddisk you have to low-level
+format the tracks by issuing the BLKDASDFORMAT-ioctl on that
+device. This will erase any data on that volume including IBM volume
+labels, VTOCs etc. The ioctl may take a 'struct format_data *' or
+'NULL' as an argument.  
+typedef struct {
+	int start_unit;
+	int stop_unit;
+	int blksize;
+} format_data_t;
+When a NULL argument is passed to the BLKDASDFORMAT ioctl the whole
+disk is formatted to a blocksize of 1024 bytes. Otherwise start_unit
+and stop_unit are the first and last track to be formatted. If
+stop_unit is -1 it implies that the DASD is formatted from start_unit
+up to the last track. blksize can be any power of two between 512 and
+4096. We recommend no blksize lower than 1024 because the ext2fs uses
+1kB blocks anyway and you gain approx. 50% of capacity increasing your
+blksize from 512 byte to 1kB.
+
+-Make a filesystem
+Then you can mk??fs the filesystem of your choice on that volume or
+partition. For reasons of sanity you should build your filesystem on
+the partition /dev/dd?1 instead of the whole volume. You only lose 3kB	
+but may be sure that you can reuse your data after introduction of a
+real partition table.
+
+BUGS:
+- Performance sometimes is rather low because we don't fully exploit clustering
+
+TODO-List:
+- Add IBM'S Disk layout to genhd
+- Enhance driver to use more than one major number
+- Enable usage as a module
+- Support Cache fast write and DASD fast write (ECKD)
diff --git a/Documentation/s390/Debugging390.txt b/Documentation/s390/Debugging390.txt
new file mode 100644
index 000000000..5ae7f868a
--- /dev/null
+++ b/Documentation/s390/Debugging390.txt
@@ -0,0 +1,2142 @@
+
+		  Debugging on Linux for s/390 & z/Architecture
+				       by
+	  Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com)
+    Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation
+			Best viewed with fixed width fonts
+
+Overview of Document:
+=====================
+This document is intended to give a good overview of how to debug Linux for
+s/390 and z/Architecture. It is not intended as a complete reference and not a
+tutorial on the fundamentals of C & assembly. It doesn't go into
+390 IO in any detail. It is intended to complement the documents in the
+reference section below & any other worthwhile references you get.
+
+It is intended like the Enterprise Systems Architecture/390 Reference Summary
+to be printed out & used as a quick cheat sheet self help style reference when
+problems occur.
+
+Contents
+========
+Register Set
+Address Spaces on Intel Linux
+Address Spaces on Linux for s/390 & z/Architecture
+The Linux for s/390 & z/Architecture Kernel Task Structure
+Register Usage & Stackframes on Linux for s/390 & z/Architecture
+A sample program with comments
+Compiling programs for debugging on Linux for s/390 & z/Architecture
+Debugging under VM
+s/390 & z/Architecture IO Overview
+Debugging IO on s/390 & z/Architecture under VM
+GDB on s/390 & z/Architecture
+Stack chaining in gdb by hand
+Examining core dumps
+ldd
+Debugging modules
+The proc file system
+SysRq
+References
+Special Thanks
+
+Register Set
+============
+The current architectures have the following registers.
+ 
+16 General propose registers, 32 bit on s/390 and 64 bit on z/Architecture,
+r0-r15 (or gpr0-gpr15), used for arithmetic and addressing.
+
+16 Control registers, 32 bit on s/390 and 64 bit on z/Architecture, cr0-cr15,
+kernel usage only, used for memory management, interrupt control, debugging
+control etc.
+
+16 Access registers (ar0-ar15), 32 bit on both s/390 and z/Architecture,
+normally not used by normal programs but potentially could be used as
+temporary storage. These registers have a 1:1 association with general
+purpose registers and are designed to be used in the so-called access
+register mode to select different address spaces.
+Access register 0 (and access register 1 on z/Architecture, which needs a
+64 bit pointer) is currently used by the pthread library as a pointer to
+the current running threads private area.
+
+16 64 bit floating point registers (fp0-fp15 ) IEEE & HFP floating 
+point format compliant on G5 upwards & a Floating point control reg (FPC) 
+4  64 bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines.
+Note:
+Linux (currently) always uses IEEE & emulates G5 IEEE format on older machines,
+( provided the kernel is configured for this ).
+
+
+The PSW is the most important register on the machine it
+is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of 
+a program counter (pc), condition code register,memory space designator.
+In IBM standard notation I am counting bit 0 as the MSB.
+It has several advantages over a normal program counter
+in that you can change address translation & program counter 
+in a single instruction. To change address translation,
+e.g. switching address translation off requires that you
+have a logical=physical mapping for the address you are
+currently running at.
+
+      Bit           Value
+s/390 z/Architecture
+0       0     Reserved ( must be 0 ) otherwise specification exception occurs.
+
+1       1     Program Event Recording 1 PER enabled, 
+	      PER is used to facilitate debugging e.g. single stepping.
+
+2-4    2-4    Reserved ( must be 0 ). 
+
+5       5     Dynamic address translation 1=DAT on.
+
+6       6     Input/Output interrupt Mask
+
+7	7     External interrupt Mask used primarily for interprocessor
+	      signalling and clock interrupts.
+
+8-11  8-11    PSW Key used for complex memory protection mechanism
+	      (not used under linux)
+
+12      12    1 on s/390 0 on z/Architecture
+
+13      13    Machine Check Mask 1=enable machine check interrupts
+
+14	14    Wait State. Set this to 1 to stop the processor except for
+	      interrupts and give  time to other LPARS. Used in CPU idle in
+	      the kernel to increase overall usage of processor resources.
+
+15      15    Problem state ( if set to 1 certain instructions are disabled )
+	      all linux user programs run with this bit 1 
+	      ( useful info for debugging under VM ).
+
+16-17 16-17   Address Space Control
+
+	      00 Primary Space Mode:
+	      The register CR1 contains the primary address-space control ele-
+	      ment (PASCE), which points to the primary space region/segment
+	      table origin.
+
+	      01 Access register mode
+
+	      10 Secondary Space Mode:
+	      The register CR7 contains the secondary address-space control
+	      element (SASCE), which points to the secondary space region or
+	      segment table origin.
+
+	      11 Home Space Mode:
+	      The register CR13 contains the home space address-space control
+	      element (HASCE), which points to the home space region/segment
+	      table origin.
+
+	      See "Address Spaces on Linux for s/390 & z/Architecture" below
+	      for more information about address space usage in Linux.
+
+18-19 18-19   Condition codes (CC)
+
+20    20      Fixed point overflow mask if 1=FPU exceptions for this event 
+              occur ( normally 0 ) 
+
+21    21      Decimal overflow mask if 1=FPU exceptions for this event occur 
+              ( normally 0 )
+
+22    22      Exponent underflow mask if 1=FPU exceptions for this event occur 
+              ( normally 0 )
+
+23    23      Significance Mask if 1=FPU exceptions for this event occur 
+              ( normally 0 )
+
+24-31 24-30   Reserved Must be 0.
+
+      31      Extended Addressing Mode
+      32      Basic Addressing Mode
+              Used to set addressing mode
+	      PSW 31   PSW 32
+                0         0        24 bit
+                0         1        31 bit
+                1         1        64 bit
+
+32             1=31 bit addressing mode 0=24 bit addressing mode (for backward 
+               compatibility), linux always runs with this bit set to 1
+
+33-64          Instruction address.
+      33-63    Reserved must be 0
+      64-127   Address
+               In 24 bits mode bits 64-103=0 bits 104-127 Address 
+               In 31 bits mode bits 64-96=0 bits 97-127 Address
+               Note: unlike 31 bit mode on s/390 bit 96 must be zero
+	       when loading the address with LPSWE otherwise a 
+               specification exception occurs, LPSW is fully backward
+               compatible.
+
+
+Prefix Page(s)
+--------------
+This per cpu memory area is too intimately tied to the processor not to mention.
+It exists between the real addresses 0-4096 on s/390 and between 0-8192 on
+z/Architecture and is exchanged with one page on s/390 or two pages on
+z/Architecture in absolute storage by the set prefix instruction during Linux
+startup.
+This page is mapped to a different prefix for each processor in an SMP
+configuration (assuming the OS designer is sane of course).
+Bytes 0-512 (200 hex) on s/390 and 0-512, 4096-4544, 4604-5119 currently on
+z/Architecture are used by the processor itself for holding such information
+as exception indications and entry points for exceptions.
+Bytes after 0xc00 hex are used by linux for per processor globals on s/390 and
+z/Architecture (there is a gap on z/Architecture currently between 0xc00 and
+0x1000, too, which is used by Linux).
+The closest thing to this on traditional architectures is the interrupt
+vector table. This is a good thing & does simplify some of the kernel coding
+however it means that we now cannot catch stray NULL pointers in the
+kernel without hard coded checks.
+
+
+
+Address Spaces on Intel Linux
+=============================
+
+The traditional Intel Linux is approximately mapped as follows forgive
+the ascii art.
+0xFFFFFFFF 4GB Himem		*****************
+				*		*
+				* Kernel Space	*
+				*		*
+				*****************	  ****************
+User Space Himem		*  User Stack	*	  *		 *
+(typically 0xC0000000 3GB )	*****************	  *		 *
+				*  Shared Libs	*	  * Next Process *
+				*****************	  *	to	 *
+				*		*   <==   *	Run	 *  <==
+				*  User Program *	  *		 *
+				*   Data BSS	*	  *		 *
+				*    Text	*	  *		 *
+				*   Sections	*	  *		 *
+0x00000000			*****************	  ****************
+
+Now it is easy to see that on Intel it is quite easy to recognise a kernel
+address as being one greater than user space himem (in this case 0xC0000000),
+and addresses of less than this are the ones in the current running program on
+this processor (if an smp box).
+If using the virtual machine ( VM ) as a debugger it is quite difficult to
+know which user process is running as the address space you are looking at
+could be from any process in the run queue.
+
+The limitation of Intels addressing technique is that the linux
+kernel uses a very simple real address to virtual addressing technique
+of Real Address=Virtual Address-User Space Himem.
+This means that on Intel the kernel linux can typically only address
+Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines
+can typically use.
+They can lower User Himem to 2GB or lower & thus be
+able to use 2GB of RAM however this shrinks the maximum size
+of User Space from 3GB to 2GB they have a no win limit of 4GB unless
+they go to 64 Bit.
+
+
+On 390 our limitations & strengths make us slightly different.
+For backward compatibility we are only allowed use 31 bits (2GB)
+of our 32 bit addresses, however, we use entirely separate address 
+spaces for the user & kernel.
+
+This means we can support 2GB of non Extended RAM on s/390, & more
+with the Extended memory management swap device & 
+currently 4TB of physical memory currently on z/Architecture.
+
+
+Address Spaces on Linux for s/390 & z/Architecture
+==================================================
+
+Our addressing scheme is basically as follows:
+
+				   Primary Space	       Home Space
+Himem 0x7fffffff 2GB on s/390    *****************          ****************
+currently 0x3ffffffffff (2^42)-1 *  User Stack   *          *              *
+on z/Architecture.		 *****************          *              *
+				 *  Shared Libs  *	    *		   *
+				 *****************	    *		   *
+			         *               *          *    Kernel    *  
+		                 *  User Program *          *              *
+		                 *   Data BSS    *          *              *
+                                 *    Text       *          *              *
+            			 *   Sections    *          *              *
+0x00000000                       *****************          ****************
+
+This also means that we need to look at the PSW problem state bit and the
+addressing mode to decide whether we are looking at user or kernel space.
+
+User space runs in primary address mode (or access register mode within
+the vdso code).
+
+The kernel usually also runs in home space mode, however when accessing
+user space the kernel switches to primary or secondary address mode if
+the mvcos instruction is not available or if a compare-and-swap (futex)
+instruction on a user space address is performed.
+
+When also looking at the ASCE control registers, this means:
+
+User space:
+- runs in primary or access register mode
+- cr1 contains the user asce
+- cr7 contains the user asce
+- cr13 contains the kernel asce
+
+Kernel space:
+- runs in home space mode
+- cr1 contains the user or kernel asce
+  -> the kernel asce is loaded when a uaccess requires primary or
+     secondary address mode
+- cr7 contains the user or kernel asce, (changed with set_fs())
+- cr13 contains the kernel asce
+
+In case of uaccess the kernel changes to:
+- primary space mode in case of a uaccess (copy_to_user) and uses
+  e.g. the mvcp instruction to access user space. However the kernel
+  will stay in home space mode if the mvcos instruction is available
+- secondary space mode in case of futex atomic operations, so that the
+  instructions come from primary address space and data from secondary
+  space
+
+In case of KVM, the kernel runs in home space mode, but cr1 gets switched
+to contain the gmap asce before the SIE instruction gets executed. When
+the SIE instruction is finished, cr1 will be switched back to contain the
+user asce.
+
+
+Virtual Addresses on s/390 & z/Architecture
+===========================================
+
+A virtual address on s/390 is made up of 3 parts
+The SX (segment index, roughly corresponding to the PGD & PMD in Linux
+terminology) being bits 1-11.
+The PX (page index, corresponding to the page table entry (pte) in Linux
+terminology) being bits 12-19.
+The remaining bits BX (the byte index are the offset in the page )
+i.e. bits 20 to 31.
+
+On z/Architecture in linux we currently make up an address from 4 parts.
+The region index bits (RX) 0-32 we currently use bits 22-32
+The segment index (SX) being bits 33-43
+The page index (PX) being bits  44-51
+The byte index (BX) being bits  52-63
+
+Notes:
+1) s/390 has no PMD so the PMD is really the PGD also.
+A lot of this stuff is defined in pgtable.h.
+
+2) Also seeing as s/390's page indexes are only 1k  in size 
+(bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k )
+to make the best use of memory by updating 4 segment indices 
+entries each time we mess with a PMD & use offsets 
+0,1024,2048 & 3072 in this page as for our segment indexes.
+On z/Architecture our page indexes are now 2k in size
+( bits 12-19 x 8 bytes per pte ) we do a similar trick
+but only mess with 2 segment indices each time we mess with
+a PMD.
+
+3) As z/Architecture supports up to a massive 5-level page table lookup we
+can only use 3 currently on Linux ( as this is all the generic kernel
+currently supports ) however this may change in future
+this allows us to access ( according to my sums )
+4TB of virtual storage per process i.e.
+4096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes,
+enough for another 2 or 3 of years I think :-).
+to do this we use a region-third-table designation type in
+our address space control registers.
+ 
+
+The Linux for s/390 & z/Architecture Kernel Task Structure
+==========================================================
+Each process/thread under Linux for S390 has its own kernel task_struct
+defined in linux/include/linux/sched.h
+The S390 on initialisation & resuming of a process on a cpu sets
+the __LC_KERNEL_STACK variable in the spare prefix area for this cpu
+(which we use for per-processor globals).
+
+The kernel stack pointer is intimately tied with the task structure for
+each processor as follows.
+
+                      s/390
+            ************************
+            *  1 page kernel stack *
+	    *        ( 4K )        *
+            ************************
+            *   1 page task_struct *        
+            *        ( 4K )        *
+8K aligned  ************************ 
+
+                 z/Architecture
+            ************************
+            *  2 page kernel stack *
+	    *        ( 8K )        *
+            ************************
+            *  2 page task_struct  *        
+            *        ( 8K )        *
+16K aligned ************************ 
+
+What this means is that we don't need to dedicate any register or global
+variable to point to the current running process & can retrieve it with the
+following very simple construct for s/390 & one very similar for z/Architecture.
+
+static inline struct task_struct * get_current(void)
+{
+        struct task_struct *current;
+        __asm__("lhi   %0,-8192\n\t"
+                "nr    %0,15"
+                : "=r" (current) );
+        return current;
+}
+
+i.e. just anding the current kernel stack pointer with the mask -8192.
+Thankfully because Linux doesn't have support for nested IO interrupts
+& our devices have large buffers can survive interrupts being shut for 
+short amounts of time we don't need a separate stack for interrupts.
+
+
+
+
+Register Usage & Stackframes on Linux for s/390 & z/Architecture
+=================================================================
+Overview:
+---------
+This is the code that gcc produces at the top & the bottom of
+each function. It usually is fairly consistent & similar from 
+function to function & if you know its layout you can probably
+make some headway in finding the ultimate cause of a problem
+after a crash without a source level debugger.
+
+Note: To follow stackframes requires a knowledge of C or Pascal &
+limited knowledge of one assembly language.
+
+It should be noted that there are some differences between the
+s/390 and z/Architecture stack layouts as the z/Architecture stack layout
+didn't have to maintain compatibility with older linkage formats.
+
+Glossary:
+---------
+alloca:
+This is a built in compiler function for runtime allocation
+of extra space on the callers stack which is obviously freed
+up on function exit ( e.g. the caller may choose to allocate nothing
+of a buffer of 4k if required for temporary purposes ), it generates 
+very efficient code ( a few cycles  ) when compared to alternatives 
+like malloc.
+
+automatics: These are local variables on the stack,
+i.e they aren't in registers & they aren't static.
+
+back-chain:
+This is a pointer to the stack pointer before entering a
+framed functions ( see frameless function ) prologue got by 
+dereferencing the address of the current stack pointer,
+ i.e. got by accessing the 32 bit value at the stack pointers
+current location.
+
+base-pointer:
+This is a pointer to the back of the literal pool which
+is an area just behind each procedure used to store constants
+in each function.
+
+call-clobbered: The caller probably needs to save these registers if there 
+is something of value in them, on the stack or elsewhere before making a 
+call to another procedure so that it can restore it later.
+
+epilogue:
+The code generated by the compiler to return to the caller.
+
+frameless-function
+A frameless function in Linux for s390 & z/Architecture is one which doesn't 
+need more than the register save area (96 bytes on s/390, 160 on z/Architecture)
+given to it by the caller.
+A frameless function never:
+1) Sets up a back chain.
+2) Calls alloca.
+3) Calls other normal functions
+4) Has automatics.
+
+GOT-pointer:
+This is a pointer to the global-offset-table in ELF
+( Executable Linkable Format, Linux'es most common executable format ),
+all globals & shared library objects are found using this pointer.
+
+lazy-binding
+ELF shared libraries are typically only loaded when routines in the shared
+library are actually first called at runtime. This is lazy binding.
+
+procedure-linkage-table
+This is a table found from the GOT which contains pointers to routines
+in other shared libraries which can't be called to by easier means.
+
+prologue:
+The code generated by the compiler to set up the stack frame.
+
+outgoing-args:
+This is extra area allocated on the stack of the calling function if the
+parameters for the callee's cannot all be put in registers, the same
+area can be reused by each function the caller calls.
+
+routine-descriptor:
+A COFF  executable format based concept of a procedure reference 
+actually being 8 bytes or more as opposed to a simple pointer to the routine.
+This is typically defined as follows
+Routine Descriptor offset 0=Pointer to Function
+Routine Descriptor offset 4=Pointer to Table of Contents
+The table of contents/TOC is roughly equivalent to a GOT pointer.
+& it means that shared libraries etc. can be shared between several
+environments each with their own TOC.
+
+ 
+static-chain: This is used in nested functions a concept adopted from pascal 
+by gcc not used in ansi C or C++ ( although quite useful ), basically it
+is a pointer used to reference local variables of enclosing functions.
+You might come across this stuff once or twice in your lifetime.
+
+e.g.
+The function below should return 11 though gcc may get upset & toss warnings 
+about unused variables.
+int FunctionA(int a)
+{
+	int b;
+	FunctionC(int c)
+	{
+		b=c+1;
+	}
+	FunctionC(10);
+	return(b);
+}
+
+
+s/390 & z/Architecture Register usage
+=====================================
+r0       used by syscalls/assembly                  call-clobbered
+r1	 used by syscalls/assembly                  call-clobbered
+r2       argument 0 / return value 0                call-clobbered
+r3       argument 1 / return value 1 (if long long) call-clobbered
+r4       argument 2                                 call-clobbered
+r5       argument 3                                 call-clobbered
+r6	 argument 4				    saved
+r7       pointer-to arguments 5 to ...              saved      
+r8       this & that                                saved
+r9       this & that                                saved
+r10      static-chain ( if nested function )        saved
+r11      frame-pointer ( if function used alloca )  saved
+r12      got-pointer                                saved
+r13      base-pointer                               saved
+r14      return-address                             saved
+r15      stack-pointer                              saved
+
+f0       argument 0 / return value ( float/double ) call-clobbered
+f2       argument 1                                 call-clobbered
+f4       z/Architecture argument 2                  saved
+f6       z/Architecture argument 3                  saved
+The remaining floating points
+f1,f3,f5 f7-f15 are call-clobbered.
+
+Notes:
+------
+1) The only requirement is that registers which are used
+by the callee are saved, e.g. the compiler is perfectly
+capable of using r11 for purposes other than a frame a
+frame pointer if a frame pointer is not needed.
+2) In functions with variable arguments e.g. printf the calling procedure 
+is identical to one without variable arguments & the same number of 
+parameters. However, the prologue of this function is somewhat more
+hairy owing to it having to move these parameters to the stack to
+get va_start, va_arg & va_end to work.
+3) Access registers are currently unused by gcc but are used in
+the kernel. Possibilities exist to use them at the moment for
+temporary storage but it isn't recommended.
+4) Only 4 of the floating point registers are used for
+parameter passing as older machines such as G3 only have only 4
+& it keeps the stack frame compatible with other compilers.
+However with IEEE floating point emulation under linux on the
+older machines you are free to use the other 12.
+5) A long long or double parameter cannot be have the 
+first 4 bytes in a register & the second four bytes in the 
+outgoing args area. It must be purely in the outgoing args
+area if crossing this boundary.
+6) Floating point parameters are mixed with outgoing args
+on the outgoing args area in the order the are passed in as parameters.
+7) Floating point arguments 2 & 3 are saved in the outgoing args area for 
+z/Architecture
+
+
+Stack Frame Layout
+------------------
+s/390     z/Architecture
+0         0             back chain ( a 0 here signifies end of back chain )
+4         8             eos ( end of stack, not used on Linux for S390 used in other linkage formats )
+8         16            glue used in other s/390 linkage formats for saved routine descriptors etc.
+12        24            glue used in other s/390 linkage formats for saved routine descriptors etc.
+16        32            scratch area
+20        40            scratch area
+24        48            saved r6 of caller function
+28        56            saved r7 of caller function
+32        64            saved r8 of caller function
+36        72            saved r9 of caller function
+40        80            saved r10 of caller function
+44        88            saved r11 of caller function
+48        96            saved r12 of caller function
+52        104           saved r13 of caller function
+56        112           saved r14 of caller function
+60        120           saved r15 of caller function
+64        128           saved f4 of caller function
+72        132           saved f6 of caller function
+80                      undefined
+96        160           outgoing args passed from caller to callee
+96+x      160+x         possible stack alignment ( 8 bytes desirable )
+96+x+y    160+x+y       alloca space of caller ( if used )
+96+x+y+z  160+x+y+z     automatics of caller ( if used )
+0                       back-chain
+
+A sample program with comments.
+===============================
+
+Comments on the function test
+-----------------------------
+1) It didn't need to set up a pointer to the constant pool gpr13 as it is not
+used ( :-( ).
+2) This is a frameless function & no stack is bought.
+3) The compiler was clever enough to recognise that it could return the
+value in r2 as well as use it for the passed in parameter ( :-) ).
+4) The basr ( branch relative & save ) trick works as follows the instruction 
+has a special case with r0,r0 with some instruction operands is understood as 
+the literal value 0, some risc architectures also do this ). So now
+we are branching to the next address & the address new program counter is
+in r13,so now we subtract the size of the function prologue we have executed
++ the size of the literal pool to get to the top of the literal pool
+0040037c int test(int b)
+{                                                          # Function prologue below
+  40037c:	90 de f0 34 	stm	%r13,%r14,52(%r15) # Save registers r13 & r14
+  400380:	0d d0       	basr	%r13,%r0           # Set up pointer to constant pool using
+  400382:	a7 da ff fa 	ahi	%r13,-6            # basr trick
+	return(5+b);
+	                                                   # Huge main program
+  400386:	a7 2a 00 05 	ahi	%r2,5              # add 5 to r2
+
+                                                           # Function epilogue below 
+  40038a:	98 de f0 34 	lm	%r13,%r14,52(%r15) # restore registers r13 & 14
+  40038e:	07 fe       	br	%r14               # return
+}
+
+Comments on the function main
+-----------------------------
+1) The compiler did this function optimally ( 8-) )
+
+Literal pool for main.
+400390:	ff ff ff ec 	.long 0xffffffec
+main(int argc,char *argv[])
+{                                                          # Function prologue below
+  400394:	90 bf f0 2c 	stm	%r11,%r15,44(%r15) # Save necessary registers
+  400398:	18 0f       	lr	%r0,%r15           # copy stack pointer to r0
+  40039a:	a7 fa ff a0 	ahi	%r15,-96           # Make area for callee saving 
+  40039e:	0d d0       	basr	%r13,%r0           # Set up r13 to point to
+  4003a0:	a7 da ff f0 	ahi	%r13,-16           # literal pool
+  4003a4:	50 00 f0 00 	st	%r0,0(%r15)        # Save backchain
+
+	return(test(5));                                   # Main Program Below
+  4003a8:	58 e0 d0 00 	l	%r14,0(%r13)       # load relative address of test from
+						           # literal pool
+  4003ac:	a7 28 00 05 	lhi	%r2,5              # Set first parameter to 5
+  4003b0:	4d ee d0 00 	bas	%r14,0(%r14,%r13)  # jump to test setting r14 as return
+							   # address using branch & save instruction.
+
+							   # Function Epilogue below
+  4003b4:	98 bf f0 8c 	lm	%r11,%r15,140(%r15)# Restore necessary registers.
+  4003b8:	07 fe       	br	%r14               # return to do program exit 
+}
+
+
+Compiler updates
+----------------
+
+main(int argc,char *argv[])
+{
+  4004fc:	90 7f f0 1c       	stm	%r7,%r15,28(%r15)
+  400500:	a7 d5 00 04       	bras	%r13,400508 <main+0xc>
+  400504:	00 40 04 f4       	.long	0x004004f4 
+  # compiler now puts constant pool in code to so it saves an instruction 
+  400508:	18 0f             	lr	%r0,%r15
+  40050a:	a7 fa ff a0       	ahi	%r15,-96
+  40050e:	50 00 f0 00       	st	%r0,0(%r15)
+	return(test(5));
+  400512:	58 10 d0 00       	l	%r1,0(%r13)
+  400516:	a7 28 00 05       	lhi	%r2,5
+  40051a:	0d e1             	basr	%r14,%r1
+  # compiler adds 1 extra instruction to epilogue this is done to
+  # avoid processor pipeline stalls owing to data dependencies on g5 &
+  # above as register 14 in the old code was needed directly after being loaded 
+  # by the lm	%r11,%r15,140(%r15) for the br %14.
+  40051c:	58 40 f0 98       	l	%r4,152(%r15)
+  400520:	98 7f f0 7c       	lm	%r7,%r15,124(%r15)
+  400524:	07 f4             	br	%r4
+}
+
+
+Hartmut ( our compiler developer ) also has been threatening to take out the
+stack backchain in optimised code as this also causes pipeline stalls, you
+have been warned.
+
+64 bit z/Architecture code disassembly
+--------------------------------------
+
+If you understand the stuff above you'll understand the stuff
+below too so I'll avoid repeating myself & just say that 
+some of the instructions have g's on the end of them to indicate
+they are 64 bit & the stack offsets are a bigger, 
+the only other difference you'll find between 32 & 64 bit is that
+we now use f4 & f6 for floating point arguments on 64 bit.
+00000000800005b0 <test>:
+int test(int b)
+{
+	return(5+b);
+    800005b0:	a7 2a 00 05       	ahi	%r2,5
+    800005b4:	b9 14 00 22       	lgfr	%r2,%r2 # downcast to integer
+    800005b8:	07 fe             	br	%r14
+    800005ba:	07 07             	bcr	0,%r7
+
+
+}
+
+00000000800005bc <main>:
+main(int argc,char *argv[])
+{ 
+    800005bc:	eb bf f0 58 00 24 	stmg	%r11,%r15,88(%r15)
+    800005c2:	b9 04 00 1f       	lgr	%r1,%r15
+    800005c6:	a7 fb ff 60       	aghi	%r15,-160
+    800005ca:	e3 10 f0 00 00 24 	stg	%r1,0(%r15)
+	return(test(5));
+    800005d0:	a7 29 00 05       	lghi	%r2,5
+    # brasl allows jumps > 64k & is overkill here bras would do fune
+    800005d4:	c0 e5 ff ff ff ee 	brasl	%r14,800005b0 <test> 
+    800005da:	e3 40 f1 10 00 04 	lg	%r4,272(%r15)
+    800005e0:	eb bf f0 f8 00 04 	lmg	%r11,%r15,248(%r15)
+    800005e6:	07 f4             	br	%r4
+}
+
+
+
+Compiling programs for debugging on Linux for s/390 & z/Architecture
+====================================================================
+-gdwarf-2 now works it should be considered the default debugging
+format for s/390 & z/Architecture as it is more reliable for debugging
+shared libraries,  normal -g debugging works much better now
+Thanks to the IBM java compiler developers bug reports. 
+
+This is typically done adding/appending the flags -g or -gdwarf-2 to the 
+CFLAGS & LDFLAGS variables Makefile of the program concerned.
+
+If using gdb & you would like accurate displays of registers &
+ stack traces compile without optimisation i.e make sure
+that there is no -O2 or similar on the CFLAGS line of the Makefile &
+the emitted gcc commands, obviously this will produce worse code 
+( not advisable for shipment ) but it is an  aid to the debugging process.
+
+This aids debugging because the compiler will copy parameters passed in
+in registers onto the stack so backtracing & looking at passed in
+parameters will work, however some larger programs which use inline functions
+will not compile without optimisation.
+
+Debugging with optimisation has since much improved after fixing
+some bugs, please make sure you are using gdb-5.0 or later developed 
+after Nov'2000.
+
+
+
+Debugging under VM
+==================
+
+Notes
+-----
+Addresses & values in the VM debugger are always hex never decimal
+Address ranges are of the format <HexValue1>-<HexValue2> or
+<HexValue1>.<HexValue2>
+For example, the address range	0x2000 to 0x3000 can be described as 2000-3000
+or 2000.1000
+
+The VM Debugger is case insensitive.
+
+VM's strengths are usually other debuggers weaknesses you can get at any
+resource no matter how sensitive e.g. memory management resources, change
+address translation in the PSW. For kernel hacking you will reap dividends if
+you get good at it.
+
+The VM Debugger displays operators but not operands, and also the debugger
+displays useful information on the same line as the author of the code probably
+felt that it was a good idea not to go over the 80 columns on the screen.
+This isn't as unintuitive as it may seem as the s/390 instructions are easy to
+decode mentally and you can make a good guess at a lot of them as all the
+operands are nibble (half byte aligned).
+So if you have an objdump listing by hand, it is quite easy to follow, and if
+you don't have an objdump listing keep a copy of the s/390 Reference Summary
+or alternatively the s/390 principles of operation next to you.
+e.g. even I can guess that 
+0001AFF8' LR    180F        CC 0
+is a ( load register ) lr r0,r15 
+
+Also it is very easy to tell the length of a 390 instruction from the 2 most
+significant bits in the instruction (not that this info is really useful except
+if you are trying to make sense of a hexdump of code).
+Here is a table
+Bits                    Instruction Length
+------------------------------------------
+00                          2 Bytes
+01                          4 Bytes
+10                          4 Bytes
+11                          6 Bytes
+
+The debugger also displays other useful info on the same line such as the
+addresses being operated on destination addresses of branches & condition codes.
+e.g.  
+00019736' AHI   A7DAFF0E    CC 1
+000198BA' BRC   A7840004 -> 000198C2'   CC 0
+000198CE' STM   900EF068 >> 0FA95E78    CC 2
+
+
+
+Useful VM debugger commands
+---------------------------
+
+I suppose I'd better mention this before I start
+to list the current active traces do 
+Q TR
+there can be a maximum of 255 of these per set
+( more about trace sets later ).
+To stop traces issue a
+TR END.
+To delete a particular breakpoint issue
+TR DEL <breakpoint number>
+
+The PA1 key drops to CP mode so you can issue debugger commands,
+Doing alt c (on my 3270 console at least ) clears the screen. 
+hitting b <enter> comes back to the running operating system
+from cp mode ( in our case linux ).
+It is typically useful to add shortcuts to your profile.exec file
+if you have one ( this is roughly equivalent to autoexec.bat in DOS ).
+file here are a few from mine.
+/* this gives me command history on issuing f12 */
+set pf12 retrieve 
+/* this continues */
+set pf8 imm b
+/* goes to trace set a */
+set pf1 imm tr goto a
+/* goes to trace set b */
+set pf2 imm tr goto b
+/* goes to trace set c */
+set pf3 imm tr goto c
+
+
+
+Instruction Tracing
+-------------------
+Setting a simple breakpoint
+TR I PSWA <address>
+To debug a particular function try
+TR I R <function address range>
+TR I on its own will single step.
+TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics
+e.g.
+TR I DATA 4D R 0197BC.4000
+will trace for BAS'es ( opcode 4D ) in the range 0197BC.4000
+if you were inclined you could add traces for all branch instructions &
+suffix them with the run prefix so you would have a backtrace on screen 
+when a program crashes.
+TR BR <INTO OR FROM> will trace branches into or out of an address.
+e.g.
+TR BR INTO 0 is often quite useful if a program is getting awkward & deciding
+to branch to 0 & crashing as this will stop at the address before in jumps to 0.
+TR I R <address range> RUN cmd d g
+single steps a range of addresses but stays running &
+displays the gprs on each step.
+
+
+
+Displaying & modifying Registers
+--------------------------------
+D G will display all the gprs
+Adding a extra G to all the commands is necessary to access the full 64 bit 
+content in VM on z/Architecture. Obviously this isn't required for access
+registers as these are still 32 bit.
+e.g. DGG instead of DG 
+D X will display all the control registers
+D AR will display all the access registers
+D AR4-7 will display access registers 4 to 7
+CPU ALL D G will display the GRPS of all CPUS in the configuration
+D PSW will display the current PSW
+st PSW 2000 will put the value 2000 into the PSW &
+cause crash your machine.
+D PREFIX displays the prefix offset
+
+
+Displaying Memory
+-----------------
+To display memory mapped using the current PSW's mapping try
+D <range>
+To make VM display a message each time it hits a particular address and
+continue try
+D I<range> will disassemble/display a range of instructions.
+ST addr 32 bit word will store a 32 bit aligned address
+D T<range> will display the EBCDIC in an address (if you are that way inclined)
+D R<range> will display real addresses ( without DAT ) but with prefixing.
+There are other complex options to display if you need to get at say home space
+but are in primary space the easiest thing to do is to temporarily
+modify the PSW to the other addressing mode, display the stuff & then
+restore it.
+
+
+ 
+Hints
+-----
+If you want to issue a debugger command without halting your virtual machine
+with the PA1 key try prefixing the command with #CP e.g.
+#cp tr i pswa 2000
+also suffixing most debugger commands with RUN will cause them not
+to stop just display the mnemonic at the current instruction on the console.
+If you have several breakpoints you want to put into your program &
+you get fed up of cross referencing with System.map
+you can do the following trick for several symbols.
+grep do_signal System.map 
+which emits the following among other things
+0001f4e0 T do_signal 
+now you can do
+
+TR I PSWA 0001f4e0 cmd msg * do_signal
+This sends a message to your own console each time do_signal is entered.
+( As an aside I wrote a perl script once which automatically generated a REXX
+script with breakpoints on every kernel procedure, this isn't a good idea
+because there are thousands of these routines & VM can only set 255 breakpoints
+at a time so you nearly had to spend as long pruning the file down as you would 
+entering the msgs by hand), however, the trick might be useful for a single
+object file. In the 3270 terminal emulator x3270 there is a very useful option
+in the file menu called "Save Screen In File" - this is very good for keeping a
+copy of traces.
+
+From CMS help <command name> will give you online help on a particular command. 
+e.g. 
+HELP DISPLAY
+
+Also CP has a file called profile.exec which automatically gets called
+on startup of CMS ( like autoexec.bat ), keeping on a DOS analogy session
+CP has a feature similar to doskey, it may be useful for you to
+use profile.exec to define some keystrokes. 
+e.g.
+SET PF9 IMM B
+This does a single step in VM on pressing F8. 
+SET PF10  ^
+This sets up the ^ key.
+which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed directly
+into some 3270 consoles.
+SET PF11 ^-
+This types the starting keystrokes for a sysrq see SysRq below.
+SET PF12 RETRIEVE
+This retrieves command history on pressing F12.
+
+
+Sometimes in VM the display is set up to scroll automatically this
+can be very annoying if there are messages you wish to look at
+to stop this do
+TERM MORE 255 255
+This will nearly stop automatic screen updates, however it will
+cause a denial of service if lots of messages go to the 3270 console,
+so it would be foolish to use this as the default on a production machine.
+ 
+
+Tracing particular processes
+----------------------------
+The kernel's text segment is intentionally at an address in memory that it will
+very seldom collide with text segments of user programs ( thanks Martin ),
+this simplifies debugging the kernel.
+However it is quite common for user processes to have addresses which collide
+this can make debugging a particular process under VM painful under normal
+circumstances as the process may change when doing a 
+TR I R <address range>.
+Thankfully after reading VM's online help I figured out how to debug
+I particular process.
+
+Your first problem is to find the STD ( segment table designation )
+of the program you wish to debug.
+There are several ways you can do this here are a few
+1) objdump --syms <program to be debugged> | grep main
+To get the address of main in the program.
+tr i pswa <address of main>
+Start the program, if VM drops to CP on what looks like the entry
+point of the main function this is most likely the process you wish to debug.
+Now do a D X13 or D XG13 on z/Architecture.
+On 31 bit the STD is bits 1-19 ( the STO segment table origin ) 
+& 25-31 ( the STL segment table length ) of CR13.
+now type
+TR I R STD <CR13's value> 0.7fffffff
+e.g.
+TR I R STD 8F32E1FF 0.7fffffff
+Another very useful variation is
+TR STORE INTO STD <CR13's value> <address range>
+for finding out when a particular variable changes.
+
+An alternative way of finding the STD of a currently running process 
+is to do the following, ( this method is more complex but
+could be quite convenient if you aren't updating the kernel much &
+so your kernel structures will stay constant for a reasonable period of
+time ).
+
+grep task /proc/<pid>/status
+from this you should see something like
+task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68
+This now gives you a pointer to the task structure.
+Now make CC:="s390-gcc -g" kernel/sched.s
+To get the task_struct stabinfo.
+( task_struct is defined in include/linux/sched.h ).
+Now we want to look at
+task->active_mm->pgd
+on my machine the active_mm in the task structure stab is
+active_mm:(4,12),672,32
+its offset is 672/8=84=0x54
+the pgd member in the mm_struct stab is
+pgd:(4,6)=*(29,5),96,32
+so its offset is 96/8=12=0xc
+
+so we'll
+hexdump -s 0xf160054 /dev/mem | more
+i.e. task_struct+active_mm offset
+to look at the active_mm member
+f160054 0fee cc60 0019 e334 0000 0000 0000 0011
+hexdump -s 0x0feecc6c /dev/mem | more
+i.e. active_mm+pgd offset
+feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010
+we get something like
+now do 
+TR I R STD <pgd|0x7f> 0.7fffffff
+i.e. the 0x7f is added because the pgd only
+gives the page table origin & we need to set the low bits
+to the maximum possible segment table length.
+TR I R STD 0f2c007f 0.7fffffff
+on z/Architecture you'll probably need to do
+TR I R STD <pgd|0x7> 0.ffffffffffffffff
+to set the TableType to 0x1 & the Table length to 3.
+
+
+
+Tracing Program Exceptions
+--------------------------
+If you get a crash which says something like
+illegal operation or specification exception followed by a register dump
+You can restart linux & trace these using the tr prog <range or value> trace
+option.
+
+
+The most common ones you will normally be tracing for is
+1=operation exception
+2=privileged operation exception
+4=protection exception
+5=addressing exception
+6=specification exception
+10=segment translation exception
+11=page translation exception
+
+The full list of these is on page 22 of the current s/390 Reference Summary.
+e.g.
+tr prog 10 will trace segment translation exceptions.
+tr prog on its own will trace all program interruption codes.
+
+Trace Sets
+----------
+On starting VM you are initially in the INITIAL trace set.
+You can do a Q TR to verify this.
+If you have a complex tracing situation where you wish to wait for instance 
+till a driver is open before you start tracing IO, but know in your
+heart that you are going to have to make several runs through the code till you
+have a clue whats going on. 
+
+What you can do is
+TR I PSWA <Driver open address>
+hit b to continue till breakpoint
+reach the breakpoint
+now do your
+TR GOTO B 
+TR IO 7c08-7c09 inst int run 
+or whatever the IO channels you wish to trace are & hit b
+
+To got back to the initial trace set do
+TR GOTO INITIAL
+& the TR I PSWA <Driver open address> will be the only active breakpoint again.
+
+
+Tracing linux syscalls under VM
+-------------------------------
+Syscalls are implemented on Linux for S390 by the Supervisor call instruction
+(SVC). There 256 possibilities of these as the instruction is made up of a 0xA
+opcode and the second byte being the syscall number. They are traced using the
+simple command:
+TR SVC  <Optional value or range>
+the syscalls are defined in linux/arch/s390/include/asm/unistd.h
+e.g. to trace all file opens just do
+TR SVC 5 ( as this is the syscall number of open )
+
+
+SMP Specific commands
+---------------------
+To find out how many cpus you have
+Q CPUS displays all the CPU's available to your virtual machine
+To find the cpu that the current cpu VM debugger commands are being directed at
+do Q CPU to change the current cpu VM debugger commands are being directed at do
+CPU <desired cpu no>
+
+On a SMP guest issue a command to all CPUs try prefixing the command with cpu
+all. To issue a command to a particular cpu try cpu <cpu number> e.g.
+CPU 01 TR I R 2000.3000
+If you are running on a guest with several cpus & you have a IO related problem
+& cannot follow the flow of code but you know it isn't smp related.
+from the bash prompt issue
+shutdown -h now or halt.
+do a Q CPUS to find out how many cpus you have
+detach each one of them from cp except cpu 0 
+by issuing a 
+DETACH CPU 01-(number of cpus in configuration)
+& boot linux again.
+TR SIGP will trace inter processor signal processor instructions.
+DEFINE CPU 01-(number in configuration) 
+will get your guests cpus back.
+
+
+Help for displaying ascii textstrings
+-------------------------------------
+On the very latest VM Nucleus'es VM can now display ascii
+( thanks Neale for the hint ) by doing
+D TX<lowaddr>.<len>
+e.g.
+D TX0.100
+
+Alternatively
+=============
+Under older VM debuggers (I love EBDIC too) you can use following little
+program which converts a command line of hex digits to ascii text. It can be
+compiled under linux and you can copy the hex digits from your x3270 terminal
+to your xterm if you are debugging from a linuxbox.
+
+This is quite useful when looking at a parameter passed in as a text string
+under VM ( unless you are good at decoding ASCII in your head ).
+
+e.g. consider tracing an open syscall
+TR SVC 5
+We have stopped at a breakpoint
+000151B0' SVC   0A05     -> 0001909A'   CC 0
+
+D 20.8 to check the SVC old psw in the prefix area and see was it from userspace
+(for the layout of the prefix area consult the "Fixed Storage Locations"
+chapter of the s/390 Reference Summary if you have it available).
+V00000020  070C2000 800151B2
+The problem state bit wasn't set &  it's also too early in the boot sequence
+for it to be a userspace SVC if it was we would have to temporarily switch the 
+psw to user space addressing so we could get at the first parameter of the open
+in gpr2.
+Next do a 
+D G2
+GPR  2 =  00014CB4
+Now display what gpr2 is pointing to
+D 00014CB4.20
+V00014CB4  2F646576 2F636F6E 736F6C65 00001BF5
+V00014CC4  FC00014C B4001001 E0001000 B8070707
+Now copy the text till the first 00 hex ( which is the end of the string
+to an xterm & do hex2ascii on it.
+hex2ascii 2F646576 2F636F6E 736F6C65 00 
+outputs
+Decoded Hex:=/ d e v / c o n s o l e 0x00 
+We were opening the console device,
+
+You can compile the code below yourself for practice :-),
+/*
+ *    hex2ascii.c
+ *    a useful little tool for converting a hexadecimal command line to ascii
+ *
+ *    Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com)
+ *    (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation.
+ */   
+#include <stdio.h>
+
+int main(int argc,char *argv[])
+{
+  int cnt1,cnt2,len,toggle=0;
+  int startcnt=1;
+  unsigned char c,hex;
+  
+  if(argc>1&&(strcmp(argv[1],"-a")==0))
+     startcnt=2;
+  printf("Decoded Hex:=");
+  for(cnt1=startcnt;cnt1<argc;cnt1++)
+  {
+    len=strlen(argv[cnt1]);
+    for(cnt2=0;cnt2<len;cnt2++)
+    {
+       c=argv[cnt1][cnt2];
+       if(c>='0'&&c<='9')
+	  c=c-'0';
+       if(c>='A'&&c<='F')
+	  c=c-'A'+10;
+       if(c>='a'&&c<='f')
+	  c=c-'a'+10;
+       switch(toggle)
+       {
+	  case 0:
+	     hex=c<<4;
+	     toggle=1;
+	  break;
+	  case 1:
+	     hex+=c;
+	     if(hex<32||hex>127)
+	     {
+		if(startcnt==1)
+		   printf("0x%02X ",(int)hex);
+		else
+		   printf(".");
+	     }
+	     else
+	     {
+	       printf("%c",hex);
+	       if(startcnt==1)
+		  printf(" ");
+	     }
+	     toggle=0;
+	  break;
+       }
+    }
+  }
+  printf("\n");
+}
+
+
+
+
+Stack tracing under VM
+----------------------
+A basic backtrace
+-----------------
+
+Here are the tricks I use 9 out of 10 times it works pretty well,
+
+When your backchain reaches a dead end
+--------------------------------------
+This can happen when an exception happens in the kernel and the kernel is
+entered twice. If you reach the NULL pointer at the end of the back chain you
+should be able to sniff further back if you follow the following tricks.
+1) A kernel address should be easy to recognise since it is in
+primary space & the problem state bit isn't set & also
+The Hi bit of the address is set.
+2) Another backchain should also be easy to recognise since it is an 
+address pointing to another address approximately 100 bytes or 0x70 hex
+behind the current stackpointer.
+
+
+Here is some practice.
+boot the kernel & hit PA1 at some random time
+d g to display the gprs, this should display something like
+GPR  0 =  00000001  00156018  0014359C  00000000
+GPR  4 =  00000001  001B8888  000003E0  00000000
+GPR  8 =  00100080  00100084  00000000  000FE000
+GPR 12 =  00010400  8001B2DC  8001B36A  000FFED8
+Note that GPR14 is a return address but as we are real men we are going to
+trace the stack.
+display 0x40 bytes after the stack pointer.
+
+V000FFED8  000FFF38 8001B838 80014C8E 000FFF38
+V000FFEE8  00000000 00000000 000003E0 00000000
+V000FFEF8  00100080 00100084 00000000 000FE000
+V000FFF08  00010400 8001B2DC 8001B36A 000FFED8
+
+
+Ah now look at whats in sp+56 (sp+0x38) this is 8001B36A our saved r14 if
+you look above at our stackframe & also agrees with GPR14.
+
+now backchain 
+d 000FFF38.40
+we now are taking the contents of SP to get our first backchain.
+
+V000FFF38  000FFFA0 00000000 00014995 00147094
+V000FFF48  00147090 001470A0 000003E0 00000000
+V000FFF58  00100080 00100084 00000000 001BF1D0
+V000FFF68  00010400 800149BA 80014CA6 000FFF38
+
+This displays a 2nd return address of 80014CA6
+
+now do d 000FFFA0.40 for our 3rd backchain
+
+V000FFFA0  04B52002 0001107F 00000000 00000000
+V000FFFB0  00000000 00000000 FF000000 0001107F
+V000FFFC0  00000000 00000000 00000000 00000000
+V000FFFD0  00010400 80010802 8001085A 000FFFA0
+
+
+our 3rd return address is 8001085A
+
+as the 04B52002 looks suspiciously like rubbish it is fair to assume that the
+kernel entry routines for the sake of optimisation don't set up a backchain.
+
+now look at System.map to see if the addresses make any sense.
+
+grep -i 0001b3 System.map
+outputs among other things
+0001b304 T cpu_idle 
+so 8001B36A
+is cpu_idle+0x66 ( quiet the cpu is asleep, don't wake it )
+
+
+grep -i 00014 System.map 
+produces among other things
+00014a78 T start_kernel  
+so 0014CA6 is start_kernel+some hex number I can't add in my head.
+
+grep -i 00108 System.map 
+this produces
+00010800 T _stext
+so   8001085A is _stext+0x5a
+
+Congrats you've done your first backchain.
+
+
+
+s/390 & z/Architecture IO Overview
+==================================
+
+I am not going to give a course in 390 IO architecture as this would take me
+quite a while and I'm no expert. Instead I'll give a 390 IO architecture
+summary for Dummies. If you have the s/390 principles of operation available
+read this instead. If nothing else you may find a few useful keywords in here
+and be able to use them on a web search engine to find more useful information.
+
+Unlike other bus architectures modern 390 systems do their IO using mostly
+fibre optics and devices such as tapes and disks can be shared between several
+mainframes. Also S390 can support up to 65536 devices while a high end PC based
+system might be choking with around 64.
+
+Here is some of the common IO terminology:
+
+Subchannel:
+This is the logical number most IO commands use to talk to an IO device. There
+can be up to 0x10000 (65536) of these in a configuration, typically there are a
+few hundred. Under VM for simplicity they are allocated contiguously, however
+on the native hardware they are not. They typically stay consistent between
+boots provided no new hardware is inserted or removed.
+Under Linux for s390 we use these as IRQ's and also when issuing an IO command
+(CLEAR SUBCHANNEL, HALT SUBCHANNEL, MODIFY SUBCHANNEL, RESUME SUBCHANNEL,
+START SUBCHANNEL, STORE SUBCHANNEL and TEST SUBCHANNEL). We use this as the ID
+of the device we wish to talk to. The most important of these instructions are
+START SUBCHANNEL (to start IO), TEST SUBCHANNEL (to check whether the IO
+completed successfully) and HALT SUBCHANNEL (to kill IO). A subchannel can have
+up to 8 channel paths to a device, this offers redundancy if one is not
+available.
+
+Device Number:
+This number remains static and is closely tied to the hardware. There are 65536
+of these, made up of a CHPID (Channel Path ID, the most significant 8 bits) and
+another lsb 8 bits. These remain static even if more devices are inserted or
+removed from the hardware. There is a 1 to 1 mapping between subchannels and
+device numbers, provided devices aren't inserted or removed.
+
+Channel Control Words:
+CCWs are linked lists of instructions initially pointed to by an operation
+request block (ORB), which is initially given to Start Subchannel (SSCH)
+command along with the subchannel number for the IO subsystem to process
+while the CPU continues executing normal code.
+CCWs come in two flavours, Format 0 (24 bit for backward compatibility) and
+Format 1 (31 bit). These are typically used to issue read and write (and many
+other) instructions. They consist of a length field and an absolute address
+field.
+Each IO typically gets 1 or 2 interrupts, one for channel end (primary status)
+when the channel is idle, and the second for device end (secondary status).
+Sometimes you get both concurrently. You check how the IO went on by issuing a
+TEST SUBCHANNEL at each interrupt, from which you receive an Interruption
+response block (IRB). If you get channel and device end status in the IRB
+without channel checks etc. your IO probably went okay. If you didn't you
+probably need to examine the IRB, extended status word etc.
+If an error occurs, more sophisticated control units have a facility known as
+concurrent sense. This means that if an error occurs Extended sense information
+will be presented in the Extended status word in the IRB. If not you have to
+issue a subsequent SENSE CCW command after the test subchannel.
+
+
+TPI (Test pending interrupt) can also be used for polled IO, but in
+multitasking multiprocessor systems it isn't recommended except for
+checking special cases (i.e. non looping checks for pending IO etc.).
+
+Store Subchannel and Modify Subchannel can be used to examine and modify
+operating characteristics of a subchannel (e.g. channel paths).
+
+Other IO related Terms:
+Sysplex: S390's Clustering Technology
+QDIO: S390's new high speed IO architecture to support devices such as gigabit
+ethernet, this architecture is also designed to be forward compatible with
+upcoming 64 bit machines.
+
+
+General Concepts 
+
+Input Output Processors (IOP's) are responsible for communicating between
+the mainframe CPU's & the channel & relieve the mainframe CPU's from the
+burden of communicating with IO devices directly, this allows the CPU's to 
+concentrate on data processing. 
+
+IOP's can use one or more links ( known as channel paths ) to talk to each 
+IO device. It first checks for path availability & chooses an available one,
+then starts ( & sometimes terminates IO ).
+There are two types of channel path: ESCON & the Parallel IO interface.
+
+IO devices are attached to control units, control units provide the
+logic to interface the channel paths & channel path IO protocols to 
+the IO devices, they can be integrated with the devices or housed separately
+& often talk to several similar devices ( typical examples would be raid 
+controllers or a control unit which connects to 1000 3270 terminals ).
+
+
+    +---------------------------------------------------------------+
+    | +-----+ +-----+ +-----+ +-----+  +----------+  +----------+   |
+    | | CPU | | CPU | | CPU | | CPU |  |  Main    |  | Expanded |   |
+    | |     | |     | |     | |     |  |  Memory  |  |  Storage |   |
+    | +-----+ +-----+ +-----+ +-----+  +----------+  +----------+   | 
+    |---------------------------------------------------------------+
+    |   IOP        |      IOP      |       IOP                      |
+    |---------------------------------------------------------------
+    | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | 
+    ----------------------------------------------------------------
+         ||                                              ||
+         ||  Bus & Tag Channel Path                      || ESCON
+         ||  ======================                      || Channel
+         ||  ||                  ||                      || Path
+    +----------+               +----------+         +----------+
+    |          |               |          |         |          |
+    |    CU    |               |    CU    |         |    CU    |
+    |          |               |          |         |          |
+    +----------+               +----------+         +----------+
+       |      |                     |                |       |
++----------+ +----------+      +----------+   +----------+ +----------+
+|I/O Device| |I/O Device|      |I/O Device|   |I/O Device| |I/O Device|
++----------+ +----------+      +----------+   +----------+ +----------+
+  CPU = Central Processing Unit    
+  C = Channel                      
+  IOP = IP Processor               
+  CU = Control Unit
+
+The 390 IO systems come in 2 flavours the current 390 machines support both
+
+The Older 360 & 370 Interface,sometimes called the Parallel I/O interface,
+sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers
+Interface (OEMI).
+
+This byte wide Parallel channel path/bus has parity & data on the "Bus" cable 
+and control lines on the "Tag" cable. These can operate in byte multiplex mode
+for sharing between several slow devices or burst mode and monopolize the
+channel for the whole burst. Up to 256 devices can be addressed on one of these
+cables. These cables are about one inch in diameter. The maximum unextended
+length supported by these cables is 125 Meters but this can be extended up to
+2km with a fibre optic channel extended such as a 3044. The maximum burst speed
+supported is 4.5 megabytes per second. However, some really old processors
+support only transfer rates of 3.0, 2.0 & 1.0 MB/sec.
+One of these paths can be daisy chained to up to 8 control units.
+
+
+ESCON if fibre optic it is also called FICON 
+Was introduced by IBM in 1990. Has 2 fibre optic cables and uses either leds or
+lasers for communication at a signaling rate of up to 200 megabits/sec. As
+10bits are transferred for every 8 bits info this drops to 160 megabits/sec
+and to 18.6 Megabytes/sec once control info and CRC are added. ESCON only
+operates in burst mode.
+ 
+ESCONs typical max cable length is 3km for the led version and 20km for the
+laser version known as XDF (extended distance facility). This can be further
+extended by using an ESCON director which triples the above mentioned ranges.
+Unlike Bus & Tag as ESCON is serial it uses a packet switching architecture,
+the standard Bus & Tag control protocol is however present within the packets.
+Up to 256 devices can be attached to each control unit that uses one of these
+interfaces.
+
+Common 390 Devices include:
+Network adapters typically OSA2,3172's,2116's & OSA-E gigabit ethernet adapters,
+Consoles 3270 & 3215 (a teletype emulated under linux for a line mode console).
+DASD's direct access storage devices ( otherwise known as hard disks ).
+Tape Drives.
+CTC ( Channel to Channel Adapters ),
+ESCON or Parallel Cables used as a very high speed serial link
+between 2 machines.
+
+
+Debugging IO on s/390 & z/Architecture under VM
+===============================================
+
+Now we are ready to go on with IO tracing commands under VM
+
+A few self explanatory queries:
+Q OSA
+Q CTC
+Q DISK ( This command is CMS specific )
+Q DASD
+
+
+
+
+
+
+Q OSA on my machine returns
+OSA  7C08 ON OSA   7C08 SUBCHANNEL = 0000
+OSA  7C09 ON OSA   7C09 SUBCHANNEL = 0001
+OSA  7C14 ON OSA   7C14 SUBCHANNEL = 0002
+OSA  7C15 ON OSA   7C15 SUBCHANNEL = 0003
+
+If you have a guest with certain privileges you may be able to see devices
+which don't belong to you. To avoid this, add the option V.
+e.g.
+Q V OSA
+
+Now using the device numbers returned by this command we will
+Trace the io starting up on the first device 7c08 & 7c09
+In our simplest case we can trace the 
+start subchannels
+like TR SSCH 7C08-7C09
+or the halt subchannels
+or TR HSCH 7C08-7C09
+MSCH's ,STSCH's I think you can guess the rest
+
+A good trick is tracing all the IO's and CCWS and spooling them into the reader
+of another VM guest so he can ftp the logfile back to his own machine. I'll do
+a small bit of this and give you a look at the output.
+
+1) Spool stdout to VM reader
+SP PRT TO (another vm guest ) or * for the local vm guest
+2) Fill the reader with the trace
+TR IO 7c08-7c09 INST INT CCW PRT RUN
+3) Start up linux 
+i 00c  
+4) Finish the trace
+TR END
+5) close the reader
+C PRT
+6) list reader contents
+RDRLIST
+7) copy it to linux4's minidisk 
+RECEIVE / LOG TXT A1 ( replace
+8)
+filel & press F11 to look at it
+You should see something like:
+
+00020942' SSCH  B2334000    0048813C    CC 0    SCH 0000    DEV 7C08
+          CPA 000FFDF0   PARM 00E2C9C4    KEY 0  FPI C0  LPM 80
+          CCW    000FFDF0  E4200100 00487FE8   0000  E4240100 ........
+          IDAL                                      43D8AFE8
+          IDAL                                      0FB76000
+00020B0A'   I/O DEV 7C08 -> 000197BC'   SCH 0000   PARM 00E2C9C4
+00021628' TSCH  B2354000 >> 00488164    CC 0    SCH 0000    DEV 7C08
+          CCWA 000FFDF8   DEV STS 0C  SCH STS 00  CNT 00EC
+           KEY 0   FPI C0  CC 0   CTLS 4007
+00022238' STSCH B2344000 >> 00488108    CC 0    SCH 0000    DEV 7C08
+
+If you don't like messing up your readed ( because you possibly booted from it )
+you can alternatively spool it to another readers guest.
+
+
+Other common VM device related commands
+---------------------------------------------
+These commands are listed only because they have
+been of use to me in the past & may be of use to
+you too. For more complete info on each of the commands
+use type HELP <command> from CMS.
+detaching devices
+DET <devno range>
+ATT <devno range> <guest> 
+attach a device to guest * for your own guest
+READY <devno> cause VM to issue a fake interrupt.
+
+The VARY command is normally only available to VM administrators.
+VARY ON PATH <path> TO <devno range>
+VARY OFF PATH <PATH> FROM <devno range>
+This is used to switch on or off channel paths to devices.
+
+Q CHPID <channel path ID>
+This displays state of devices using this channel path
+D SCHIB <subchannel>
+This displays the subchannel information SCHIB block for the device.
+this I believe is also only available to administrators.
+DEFINE CTC <devno>
+defines a virtual CTC channel to channel connection
+2 need to be defined on each guest for the CTC driver to use.
+COUPLE  devno userid remote devno
+Joins a local virtual device to a remote virtual device
+( commonly used for the CTC driver ).
+
+Building a VM ramdisk under CMS which linux can use
+def vfb-<blocksize> <subchannel> <number blocks>
+blocksize is commonly 4096 for linux.
+Formatting it
+format <subchannel> <driver letter e.g. x> (blksize <blocksize>
+
+Sharing a disk between multiple guests
+LINK userid devno1 devno2 mode password
+
+
+
+GDB on S390
+===========
+N.B. if compiling for debugging gdb works better without optimisation 
+( see Compiling programs for debugging )
+
+invocation
+----------
+gdb <victim program> <optional corefile>
+
+Online help
+-----------
+help: gives help on commands
+e.g.
+help
+help display
+Note gdb's online help is very good use it.
+
+
+Assembly
+--------
+info registers: displays registers other than floating point.
+info all-registers: displays floating points as well.
+disassemble: disassembles
+e.g.
+disassemble without parameters will disassemble the current function
+disassemble $pc $pc+10 
+
+Viewing & modifying variables
+-----------------------------
+print or p: displays variable or register
+e.g. p/x $sp will display the stack pointer
+
+display: prints variable or register each time program stops
+e.g.
+display/x $pc will display the program counter
+display argc
+
+undisplay : undo's display's
+
+info breakpoints: shows all current breakpoints
+
+info stack: shows stack back trace (if this doesn't work too well, I'll show
+you the stacktrace by hand below).
+
+info locals: displays local variables.
+
+info args: display current procedure arguments.
+
+set args: will set argc & argv each time the victim program is invoked.
+
+set <variable>=value
+set argc=100
+set $pc=0
+
+
+
+Modifying execution
+-------------------
+step: steps n lines of sourcecode
+step steps 1 line.
+step 100 steps 100 lines of code.
+
+next: like step except this will not step into subroutines
+
+stepi: steps a single machine code instruction.
+e.g. stepi 100
+
+nexti: steps a single machine code instruction but will not step into
+subroutines.
+
+finish: will run until exit of the current routine
+
+run: (re)starts a program
+
+cont: continues a program
+
+quit: exits gdb.
+
+
+breakpoints
+------------
+
+break
+sets a breakpoint
+e.g.
+
+break main
+
+break *$pc
+
+break *0x400618
+
+Here's a really useful one for large programs
+rbr
+Set a breakpoint for all functions matching REGEXP
+e.g.
+rbr 390
+will set a breakpoint with all functions with 390 in their name.
+
+info breakpoints
+lists all breakpoints
+
+delete: delete breakpoint by number or delete them all
+e.g.
+delete 1 will delete the first breakpoint
+delete will delete them all
+
+watch: This will set a watchpoint ( usually hardware assisted ),
+This will watch a variable till it changes
+e.g.
+watch cnt, will watch the variable cnt till it changes.
+As an aside unfortunately gdb's, architecture independent watchpoint code
+is inconsistent & not very good, watchpoints usually work but not always.
+
+info watchpoints: Display currently active watchpoints
+
+condition: ( another useful one )
+Specify breakpoint number N to break only if COND is true.
+Usage is `condition N COND', where N is an integer and COND is an
+expression to be evaluated whenever breakpoint N is reached.
+
+
+
+User defined functions/macros
+-----------------------------
+define: ( Note this is very very useful,simple & powerful )
+usage define <name> <list of commands> end
+
+examples which you should consider putting into .gdbinit in your home directory
+define d
+stepi
+disassemble $pc $pc+10
+end
+
+define e
+nexti
+disassemble $pc $pc+10
+end
+
+
+Other hard to classify stuff
+----------------------------
+signal n:
+sends the victim program a signal.
+e.g. signal 3 will send a SIGQUIT.
+
+info signals:
+what gdb does when the victim receives certain signals.
+
+list:
+e.g.
+list lists current function source
+list 1,10 list first 10 lines of current file.
+list test.c:1,10
+
+
+directory:
+Adds directories to be searched for source if gdb cannot find the source.
+(note it is a bit sensitive about slashes)
+e.g. To add the root of the filesystem to the searchpath do
+directory //
+
+
+call <function>
+This calls a function in the victim program, this is pretty powerful
+e.g.
+(gdb) call printf("hello world")
+outputs:
+$1 = 11 
+
+You might now be thinking that the line above didn't work, something extra had
+to be done.
+(gdb) call fflush(stdout)
+hello world$2 = 0
+As an aside the debugger also calls malloc & free under the hood 
+to make space for the "hello world" string.
+
+
+
+hints
+-----
+1) command completion works just like bash 
+( if you are a bad typist like me this really helps )
+e.g. hit br <TAB> & cursor up & down :-).
+
+2) if you have a debugging problem that takes a few steps to recreate
+put the steps into a file called .gdbinit in your current working directory
+if you have defined a few extra useful user defined commands put these in 
+your home directory & they will be read each time gdb is launched.
+
+A typical .gdbinit file might be.
+break main
+run
+break runtime_exception
+cont 
+
+
+stack chaining in gdb by hand
+-----------------------------
+This is done using a the same trick described for VM 
+p/x (*($sp+56))&0x7fffffff get the first backchain.
+
+For z/Architecture
+Replace 56 with 112 & ignore the &0x7fffffff
+in the macros below & do nasty casts to longs like the following
+as gdb unfortunately deals with printed arguments as ints which
+messes up everything.
+i.e. here is a 3rd backchain dereference
+p/x *(long *)(***(long ***)$sp+112)
+
+
+this outputs 
+$5 = 0x528f18 
+on my machine.
+Now you can use 
+info symbol (*($sp+56))&0x7fffffff 
+you might see something like.
+rl_getc + 36 in section .text  telling you what is located at address 0x528f18
+Now do.
+p/x (*(*$sp+56))&0x7fffffff 
+This outputs
+$6 = 0x528ed0
+Now do.
+info symbol (*(*$sp+56))&0x7fffffff
+rl_read_key + 180 in section .text
+now do
+p/x (*(**$sp+56))&0x7fffffff
+& so on.
+
+Disassembling instructions without debug info
+---------------------------------------------
+gdb typically complains if there is a lack of debugging
+symbols in the disassemble command with 
+"No function contains specified address." To get around
+this do 
+x/<number lines to disassemble>xi <address>
+e.g.
+x/20xi 0x400730
+
+
+
+Note: Remember gdb has history just like bash you don't need to retype the
+whole line just use the up & down arrows.
+
+
+
+For more info
+-------------
+From your linuxbox do 
+man gdb or info gdb.
+
+core dumps
+----------
+What a core dump ?,
+A core dump is a file generated by the kernel (if allowed) which contains the
+registers and all active pages of the program which has crashed.
+From this file gdb will allow you to look at the registers, stack trace and
+memory of the program as if it just crashed on your system. It is usually
+called core and created in the current working directory.
+This is very useful in that a customer can mail a core dump to a technical
+support department and the technical support department can reconstruct what
+happened. Provided they have an identical copy of this program with debugging
+symbols compiled in and the source base of this build is available.
+In short it is far more useful than something like a crash log could ever hope
+to be.
+
+Why have I never seen one ?.
+Probably because you haven't used the command 
+ulimit -c unlimited in bash
+to allow core dumps, now do 
+ulimit -a 
+to verify that the limit was accepted.
+
+A sample core dump
+To create this I'm going to do
+ulimit -c unlimited
+gdb 
+to launch gdb (my victim app. ) now be bad & do the following from another 
+telnet/xterm session to the same machine
+ps -aux | grep gdb
+kill -SIGSEGV <gdb's pid>
+or alternatively use killall -SIGSEGV gdb if you have the killall command.
+Now look at the core dump.
+./gdb core
+Displays the following
+GNU gdb 4.18
+Copyright 1998 Free Software Foundation, Inc.
+GDB is free software, covered by the GNU General Public License, and you are
+welcome to change it and/or distribute copies of it under certain conditions.
+Type "show copying" to see the conditions.
+There is absolutely no warranty for GDB.  Type "show warranty" for details.
+This GDB was configured as "s390-ibm-linux"...
+Core was generated by `./gdb'.
+Program terminated with signal 11, Segmentation fault.
+Reading symbols from /usr/lib/libncurses.so.4...done.
+Reading symbols from /lib/libm.so.6...done.
+Reading symbols from /lib/libc.so.6...done.
+Reading symbols from /lib/ld-linux.so.2...done.
+#0  0x40126d1a in read () from /lib/libc.so.6
+Setting up the environment for debugging gdb.
+Breakpoint 1 at 0x4dc6f8: file utils.c, line 471.
+Breakpoint 2 at 0x4d87a4: file top.c, line 2609.
+(top-gdb) info stack
+#0  0x40126d1a in read () from /lib/libc.so.6
+#1  0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402
+#2  0x528ed0 in rl_read_key () at input.c:381
+#3  0x5167e6 in readline_internal_char () at readline.c:454
+#4  0x5168ee in readline_internal_charloop () at readline.c:507
+#5  0x51692c in readline_internal () at readline.c:521
+#6  0x5164fe in readline (prompt=0x7ffff810)
+    at readline.c:349
+#7  0x4d7a8a in command_line_input (prompt=0x564420 "(gdb) ", repeat=1,
+    annotation_suffix=0x4d6b44 "prompt") at top.c:2091
+#8  0x4d6cf0 in command_loop () at top.c:1345
+#9  0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635
+
+
+LDD
+===
+This is a program which lists the shared libraries which a library needs,
+Note you also get the relocations of the shared library text segments which
+help when using objdump --source.
+e.g.
+ ldd ./gdb
+outputs
+libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000)
+libm.so.6 => /lib/libm.so.6 (0x4005e000)
+libc.so.6 => /lib/libc.so.6 (0x40084000)
+/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
+
+
+Debugging shared libraries
+==========================
+Most programs use shared libraries, however it can be very painful
+when you single step instruction into a function like printf for the 
+first time & you end up in functions like _dl_runtime_resolve this is
+the ld.so doing lazy binding, lazy binding is a concept in ELF where 
+shared library functions are not loaded into memory unless they are 
+actually used, great for saving memory but a pain to debug.
+To get around this either relink the program -static or exit gdb type 
+export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing 
+the program in question.
+ 
+
+
+Debugging modules
+=================
+As modules are dynamically loaded into the kernel their address can be
+anywhere to get around this use the -m option with insmod to emit a load
+map which can be piped into a file if required.
+
+The proc file system
+====================
+What is it ?.
+It is a filesystem created by the kernel with files which are created on demand
+by the kernel if read, or can be used to modify kernel parameters,
+it is a powerful concept.
+
+e.g.
+
+cat /proc/sys/net/ipv4/ip_forward 
+On my machine outputs 
+0 
+telling me ip_forwarding is not on to switch it on I can do
+echo 1 >  /proc/sys/net/ipv4/ip_forward
+cat it again
+cat /proc/sys/net/ipv4/ip_forward 
+On my machine now outputs
+1
+IP forwarding is on.
+There is a lot of useful info in here best found by going in and having a look
+around, so I'll take you through some entries I consider important.
+
+All the processes running on the machine have their own entry defined by
+/proc/<pid>
+So lets have a look at the init process
+cd /proc/1
+
+cat cmdline
+emits
+init [2]
+
+cd /proc/1/fd
+This contains numerical entries of all the open files,
+some of these you can cat e.g. stdout (2)
+
+cat /proc/29/maps
+on my machine emits
+
+00400000-00478000 r-xp 00000000 5f:00 4103       /bin/bash
+00478000-0047e000 rw-p 00077000 5f:00 4103       /bin/bash
+0047e000-00492000 rwxp 00000000 00:00 0
+40000000-40015000 r-xp 00000000 5f:00 14382      /lib/ld-2.1.2.so
+40015000-40016000 rw-p 00014000 5f:00 14382      /lib/ld-2.1.2.so
+40016000-40017000 rwxp 00000000 00:00 0
+40017000-40018000 rw-p 00000000 00:00 0
+40018000-4001b000 r-xp 00000000 5f:00 14435      /lib/libtermcap.so.2.0.8
+4001b000-4001c000 rw-p 00002000 5f:00 14435      /lib/libtermcap.so.2.0.8
+4001c000-4010d000 r-xp 00000000 5f:00 14387      /lib/libc-2.1.2.so
+4010d000-40111000 rw-p 000f0000 5f:00 14387      /lib/libc-2.1.2.so
+40111000-40114000 rw-p 00000000 00:00 0
+40114000-4011e000 r-xp 00000000 5f:00 14408      /lib/libnss_files-2.1.2.so
+4011e000-4011f000 rw-p 00009000 5f:00 14408      /lib/libnss_files-2.1.2.so
+7fffd000-80000000 rwxp ffffe000 00:00 0
+
+
+Showing us the shared libraries init uses where they are in memory
+& memory access permissions for each virtual memory area.
+
+/proc/1/cwd is a softlink to the current working directory.
+/proc/1/root is the root of the filesystem for this process. 
+
+/proc/1/mem is the current running processes memory which you
+can read & write to like a file.
+strace uses this sometimes as it is a bit faster than the
+rather inefficient ptrace interface for peeking at DATA.
+
+
+cat status 
+
+Name:   init
+State:  S (sleeping)
+Pid:    1
+PPid:   0
+Uid:    0       0       0       0
+Gid:    0       0       0       0
+Groups:
+VmSize:      408 kB
+VmLck:         0 kB
+VmRSS:       208 kB
+VmData:       24 kB
+VmStk:         8 kB
+VmExe:       368 kB
+VmLib:         0 kB
+SigPnd: 0000000000000000
+SigBlk: 0000000000000000
+SigIgn: 7fffffffd7f0d8fc
+SigCgt: 00000000280b2603
+CapInh: 00000000fffffeff
+CapPrm: 00000000ffffffff
+CapEff: 00000000fffffeff
+
+User PSW:    070de000 80414146
+task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68
+User GPRS:
+00000400  00000000  0000000b  7ffffa90
+00000000  00000000  00000000  0045d9f4
+0045cafc  7ffffa90  7fffff18  0045cb08
+00010400  804039e8  80403af8  7ffff8b0
+User ACRS:
+00000000  00000000  00000000  00000000
+00000001  00000000  00000000  00000000
+00000000  00000000  00000000  00000000
+00000000  00000000  00000000  00000000
+Kernel BackChain  CallChain    BackChain  CallChain
+       004b7ca8   8002bd0c     004b7d18   8002b92c
+       004b7db8   8005cd50     004b7e38   8005d12a
+       004b7f08   80019114                     
+Showing among other things memory usage & status of some signals &
+the processes'es registers from the kernel task_structure
+as well as a backchain which may be useful if a process crashes
+in the kernel for some unknown reason.
+
+Some driver debugging techniques
+================================
+debug feature
+-------------
+Some of our drivers now support a "debug feature" in
+/proc/s390dbf see s390dbf.txt in the linux/Documentation directory
+for more info.
+e.g. 
+to switch on the lcs "debug feature"
+echo 5 > /proc/s390dbf/lcs/level
+& then after the error occurred.
+cat /proc/s390dbf/lcs/sprintf >/logfile
+the logfile now contains some information which may help
+tech support resolve a problem in the field.
+
+
+
+high level debugging network drivers
+------------------------------------
+ifconfig is a quite useful command
+it gives the current state of network drivers.
+
+If you suspect your network device driver is dead
+one way to check is type 
+ifconfig <network device> 
+e.g. tr0
+You should see something like
+tr0       Link encap:16/4 Mbps Token Ring (New)  HWaddr 00:04:AC:20:8E:48
+          inet addr:9.164.185.132  Bcast:9.164.191.255  Mask:255.255.224.0
+          UP BROADCAST RUNNING MULTICAST  MTU:2000  Metric:1
+          RX packets:246134 errors:0 dropped:0 overruns:0 frame:0
+          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
+          collisions:0 txqueuelen:100
+
+if the device doesn't say up
+try
+/etc/rc.d/init.d/network start 
+( this starts the network stack & hopefully calls ifconfig tr0 up ).
+ifconfig looks at the output of /proc/net/dev and presents it in a more
+presentable form.
+Now ping the device from a machine in the same subnet.
+if the RX packets count & TX packets counts don't increment you probably
+have problems.
+next 
+cat /proc/net/arp
+Do you see any hardware addresses in the cache if not you may have problems.
+Next try
+ping -c 5 <broadcast_addr> i.e. the Bcast field above in the output of
+ifconfig. Do you see any replies from machines other than the local machine
+if not you may have problems. also if the TX packets count in ifconfig
+hasn't incremented either you have serious problems in your driver 
+(e.g. the txbusy field of the network device being stuck on ) 
+or you may have multiple network devices connected.
+
+
+chandev
+-------
+There is a new device layer for channel devices, some
+drivers e.g. lcs are registered with this layer.
+If the device uses the channel device layer you'll be
+able to find what interrupts it uses & the current state 
+of the device.
+See the manpage chandev.8 &type cat /proc/chandev for more info.
+
+
+SysRq
+=====
+This is now supported by linux for s/390 & z/Architecture.
+To enable it do compile the kernel with 
+Kernel Hacking -> Magic SysRq Key Enabled
+echo "1" > /proc/sys/kernel/sysrq
+also type
+echo "8" >/proc/sys/kernel/printk
+To make printk output go to console.
+On 390 all commands are prefixed with
+^-
+e.g.
+^-t will show tasks.
+^-? or some unknown command will display help.
+The sysrq key reading is very picky ( I have to type the keys in an
+ xterm session & paste them  into the x3270 console )
+& it may be wise to predefine the keys as described in the VM hints above
+
+This is particularly useful for syncing disks unmounting & rebooting
+if the machine gets partially hung.
+
+Read Documentation/admin-guide/sysrq.rst for more info
+
+References:
+===========
+Enterprise Systems Architecture Reference Summary
+Enterprise Systems Architecture Principles of Operation
+Hartmut Penners s390 stack frame sheet.
+IBM Mainframe Channel Attachment a technology brief from a CISCO webpage
+Various bits of man & info pages of Linux.
+Linux & GDB source.
+Various info & man pages.
+CMS Help on tracing commands.
+Linux for s/390 Elf Application Binary Interface
+Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended )
+z/Architecture Principles of Operation SA22-7832-00
+Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the
+Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05
+
+Special Thanks
+==============
+Special thanks to Neale Ferguson who maintains a much
+prettier HTML version of this page at
+http://linuxvm.org/penguinvm/
+Bob Grainger Stefan Bader & others for reporting bugs
diff --git a/Documentation/s390/cds.txt b/Documentation/s390/cds.txt
new file mode 100644
index 000000000..480a78ef5
--- /dev/null
+++ b/Documentation/s390/cds.txt
@@ -0,0 +1,472 @@
+Linux for S/390 and zSeries
+
+Common Device Support (CDS)
+Device Driver I/O Support Routines
+
+Authors : Ingo Adlung
+	  Cornelia Huck
+
+Copyright, IBM Corp. 1999-2002
+
+Introduction
+
+This document describes the common device support routines for Linux/390.
+Different than other hardware architectures, ESA/390 has defined a unified
+I/O access method. This gives relief to the device drivers as they don't
+have to deal with different bus types, polling versus interrupt
+processing, shared versus non-shared interrupt processing, DMA versus port
+I/O (PIO), and other hardware features more. However, this implies that
+either every single device driver needs to implement the hardware I/O
+attachment functionality itself, or the operating system provides for a
+unified method to access the hardware, providing all the functionality that
+every single device driver would have to provide itself.
+
+The document does not intend to explain the ESA/390 hardware architecture in
+every detail.This information can be obtained from the ESA/390 Principles of
+Operation manual (IBM Form. No. SA22-7201).
+
+In order to build common device support for ESA/390 I/O interfaces, a
+functional layer was introduced that provides generic I/O access methods to
+the hardware. 
+
+The common device support layer comprises the I/O support routines defined 
+below. Some of them implement common Linux device driver interfaces, while 
+some of them are ESA/390 platform specific.
+
+Note:
+In order to write a driver for S/390, you also need to look into the interface
+described in Documentation/s390/driver-model.txt.
+
+Note for porting drivers from 2.4:
+The major changes are:
+* The functions use a ccw_device instead of an irq (subchannel).
+* All drivers must define a ccw_driver (see driver-model.txt) and the associated
+  functions.
+* request_irq() and free_irq() are no longer done by the driver.
+* The oper_handler is (kindof) replaced by the probe() and set_online() functions
+  of the ccw_driver.
+* The not_oper_handler is (kindof) replaced by the remove() and set_offline()
+  functions of the ccw_driver.
+* The channel device layer is gone.
+* The interrupt handlers must be adapted to use a ccw_device as argument.
+  Moreover, they don't return a devstat, but an irb.
+* Before initiating an io, the options must be set via ccw_device_set_options().
+* Instead of calling read_dev_chars()/read_conf_data(), the driver issues
+  the channel program and handles the interrupt itself.
+
+ccw_device_get_ciw()
+   get commands from extended sense data.
+
+ccw_device_start()	
+ccw_device_start_timeout()
+ccw_device_start_key()
+ccw_device_start_key_timeout()
+   initiate an I/O request.
+
+ccw_device_resume()
+   resume channel program execution.
+
+ccw_device_halt()	
+   terminate the current I/O request processed on the device.
+
+do_IRQ()	
+   generic interrupt routine. This function is called by the interrupt entry
+   routine whenever an I/O interrupt is presented to the system. The do_IRQ()
+   routine determines the interrupt status and calls the device specific
+   interrupt handler according to the rules (flags) defined during I/O request
+   initiation with do_IO().
+
+The next chapters describe the functions other than do_IRQ() in more details.
+The do_IRQ() interface is not described, as it is called from the Linux/390
+first level interrupt handler only and does not comprise a device driver
+callable interface. Instead, the functional description of do_IO() also
+describes the input to the device specific interrupt handler.
+
+Note: All explanations apply also to the 64 bit architecture s390x.
+
+
+Common Device Support (CDS) for Linux/390 Device Drivers
+
+General Information
+
+The following chapters describe the I/O related interface routines the
+Linux/390 common device support (CDS) provides to allow for device specific
+driver implementations on the IBM ESA/390 hardware platform. Those interfaces
+intend to provide the functionality required by every device driver
+implementation to allow to drive a specific hardware device on the ESA/390
+platform. Some of the interface routines are specific to Linux/390 and some
+of them can be found on other Linux platforms implementations too.
+Miscellaneous function prototypes, data declarations, and macro definitions
+can be found in the architecture specific C header file
+linux/arch/s390/include/asm/irq.h.
+
+Overview of CDS interface concepts
+
+Different to other hardware platforms, the ESA/390 architecture doesn't define
+interrupt lines managed by a specific interrupt controller and bus systems
+that may or may not allow for shared interrupts, DMA processing, etc.. Instead,
+the ESA/390 architecture has implemented a so called channel subsystem, that
+provides a unified view of the devices physically attached to the systems.
+Though the ESA/390 hardware platform knows about a huge variety of different
+peripheral attachments like disk devices (aka. DASDs), tapes, communication
+controllers, etc. they can all be accessed by a well defined access method and
+they are presenting I/O completion a unified way : I/O interruptions. Every
+single device is uniquely identified to the system by a so called subchannel,
+where the ESA/390 architecture allows for 64k devices be attached.
+
+Linux, however, was first built on the Intel PC architecture, with its two
+cascaded 8259 programmable interrupt controllers (PICs), that allow for a
+maximum of 15 different interrupt lines. All devices attached to such a system
+share those 15 interrupt levels. Devices attached to the ISA bus system must
+not share interrupt levels (aka. IRQs), as the ISA bus bases on edge triggered
+interrupts. MCA, EISA, PCI and other bus systems base on level triggered
+interrupts, and therewith allow for shared IRQs. However, if multiple devices
+present their hardware status by the same (shared) IRQ, the operating system
+has to call every single device driver registered on this IRQ in order to
+determine the device driver owning the device that raised the interrupt.
+
+Up to kernel 2.4, Linux/390 used to provide interfaces via the IRQ (subchannel).
+For internal use of the common I/O layer, these are still there. However, 
+device drivers should use the new calling interface via the ccw_device only.
+
+During its startup the Linux/390 system checks for peripheral devices. Each
+of those devices is uniquely defined by a so called subchannel by the ESA/390
+channel subsystem. While the subchannel numbers are system generated, each
+subchannel also takes a user defined attribute, the so called device number.
+Both subchannel number and device number cannot exceed 65535. During sysfs
+initialisation, the information about control unit type and device types that 
+imply specific I/O commands (channel command words - CCWs) in order to operate
+the device are gathered. Device drivers can retrieve this set of hardware
+information during their initialization step to recognize the devices they
+support using the information saved in the struct ccw_device given to them.
+This methods implies that Linux/390 doesn't require to probe for free (not
+armed) interrupt request lines (IRQs) to drive its devices with. Where
+applicable, the device drivers can use issue the READ DEVICE CHARACTERISTICS
+ccw to retrieve device characteristics in its online routine.
+
+In order to allow for easy I/O initiation the CDS layer provides a
+ccw_device_start() interface that takes a device specific channel program (one
+or more CCWs) as input sets up the required architecture specific control blocks
+and initiates an I/O request on behalf of the device driver. The
+ccw_device_start() routine allows to specify whether it expects the CDS layer
+to notify the device driver for every interrupt it observes, or with final status
+only. See ccw_device_start() for more details. A device driver must never issue
+ESA/390 I/O commands itself, but must use the Linux/390 CDS interfaces instead.
+
+For long running I/O request to be canceled, the CDS layer provides the
+ccw_device_halt() function. Some devices require to initially issue a HALT
+SUBCHANNEL (HSCH) command without having pending I/O requests. This function is
+also covered by ccw_device_halt().
+
+
+get_ciw() - get command information word
+
+This call enables a device driver to get information about supported commands
+from the extended SenseID data.
+
+struct ciw *
+ccw_device_get_ciw(struct ccw_device *cdev, __u32 cmd);
+
+cdev - The ccw_device for which the command is to be retrieved.
+cmd  - The command type to be retrieved.
+
+ccw_device_get_ciw() returns:
+NULL    - No extended data available, invalid device or command not found.
+!NULL   - The command requested.
+
+
+ccw_device_start() - Initiate I/O Request
+
+The ccw_device_start() routines is the I/O request front-end processor. All
+device driver I/O requests must be issued using this routine. A device driver
+must not issue ESA/390 I/O commands itself. Instead the ccw_device_start()
+routine provides all interfaces required to drive arbitrary devices.
+
+This description also covers the status information passed to the device
+driver's interrupt handler as this is related to the rules (flags) defined
+with the associated I/O request when calling ccw_device_start().
+
+int ccw_device_start(struct ccw_device *cdev,
+		     struct ccw1 *cpa,
+		     unsigned long intparm,
+		     __u8 lpm,
+		     unsigned long flags);
+int ccw_device_start_timeout(struct ccw_device *cdev,
+			     struct ccw1 *cpa,
+			     unsigned long intparm,
+			     __u8 lpm,
+			     unsigned long flags,
+			     int expires);
+int ccw_device_start_key(struct ccw_device *cdev,
+			 struct ccw1 *cpa,
+			 unsigned long intparm,
+			 __u8 lpm,
+			 __u8 key,
+			 unsigned long flags);
+int ccw_device_start_key_timeout(struct ccw_device *cdev,
+				 struct ccw1 *cpa,
+				 unsigned long intparm,
+				 __u8 lpm,
+				 __u8 key,
+				 unsigned long flags,
+				 int expires);
+
+cdev         : ccw_device the I/O is destined for
+cpa          : logical start address of channel program
+user_intparm : user specific interrupt information; will be presented
+	       back to the device driver's interrupt handler. Allows a
+               device driver to associate the interrupt with a
+               particular I/O request.
+lpm          : defines the channel path to be used for a specific I/O
+               request. A value of 0 will make cio use the opm.
+key	     : the storage key to use for the I/O (useful for operating on a
+	       storage with a storage key != default key)
+flag         : defines the action to be performed for I/O processing
+expires      : timeout value in jiffies. The common I/O layer will terminate
+	       the running program after this and call the interrupt handler
+	       with ERR_PTR(-ETIMEDOUT) as irb.
+
+Possible flag values are :
+
+DOIO_ALLOW_SUSPEND       - channel program may become suspended
+DOIO_DENY_PREFETCH       - don't allow for CCW prefetch; usually
+                           this implies the channel program might
+                           become modified
+DOIO_SUPPRESS_INTER     - don't call the handler on intermediate status
+
+The cpa parameter points to the first format 1 CCW of a channel program :
+
+struct ccw1 {
+      __u8  cmd_code;/* command code */
+      __u8  flags;   /* flags, like IDA addressing, etc. */
+      __u16 count;   /* byte count */
+      __u32 cda;     /* data address */
+} __attribute__ ((packed,aligned(8)));
+
+with the following CCW flags values defined :
+
+CCW_FLAG_DC        - data chaining
+CCW_FLAG_CC        - command chaining
+CCW_FLAG_SLI       - suppress incorrect length
+CCW_FLAG_SKIP      - skip
+CCW_FLAG_PCI       - PCI
+CCW_FLAG_IDA       - indirect addressing
+CCW_FLAG_SUSPEND   - suspend
+
+
+Via ccw_device_set_options(), the device driver may specify the following
+options for the device:
+
+DOIO_EARLY_NOTIFICATION  - allow for early interrupt notification
+DOIO_REPORT_ALL          - report all interrupt conditions
+
+
+The ccw_device_start() function returns :
+
+      0 - successful completion or request successfully initiated
+-EBUSY	- The device is currently processing a previous I/O request, or there is
+          a status pending at the device.
+-ENODEV - cdev is invalid, the device is not operational or the ccw_device is
+          not online.
+
+When the I/O request completes, the CDS first level interrupt handler will
+accumulate the status in a struct irb and then call the device interrupt handler.
+The intparm field will contain the value the device driver has associated with a 
+particular I/O request. If a pending device status was recognized, 
+intparm will be set to 0 (zero). This may happen during I/O initiation or delayed
+by an alert status notification. In any case this status is not related to the
+current (last) I/O request. In case of a delayed status notification no special
+interrupt will be presented to indicate I/O completion as the I/O request was
+never started, even though ccw_device_start() returned with successful completion.
+
+The irb may contain an error value, and the device driver should check for this
+first:
+
+-ETIMEDOUT: the common I/O layer terminated the request after the specified
+            timeout value
+-EIO:       the common I/O layer terminated the request due to an error state
+
+If the concurrent sense flag in the extended status word (esw) in the irb is
+set, the field erw.scnt in the esw describes the number of device specific
+sense bytes available in the extended control word irb->scsw.ecw[]. No device
+sensing by the device driver itself is required.
+
+The device interrupt handler can use the following definitions to investigate
+the primary unit check source coded in sense byte 0 :
+
+SNS0_CMD_REJECT         0x80
+SNS0_INTERVENTION_REQ   0x40
+SNS0_BUS_OUT_CHECK      0x20
+SNS0_EQUIPMENT_CHECK    0x10
+SNS0_DATA_CHECK         0x08
+SNS0_OVERRUN            0x04
+SNS0_INCOMPL_DOMAIN     0x01
+
+Depending on the device status, multiple of those values may be set together.
+Please refer to the device specific documentation for details.
+
+The irb->scsw.cstat field provides the (accumulated) subchannel status :
+
+SCHN_STAT_PCI            - program controlled interrupt
+SCHN_STAT_INCORR_LEN     - incorrect length
+SCHN_STAT_PROG_CHECK     - program check
+SCHN_STAT_PROT_CHECK     - protection check
+SCHN_STAT_CHN_DATA_CHK   - channel data check
+SCHN_STAT_CHN_CTRL_CHK   - channel control check
+SCHN_STAT_INTF_CTRL_CHK  - interface control check
+SCHN_STAT_CHAIN_CHECK    - chaining check
+
+The irb->scsw.dstat field provides the (accumulated) device status :
+
+DEV_STAT_ATTENTION   - attention
+DEV_STAT_STAT_MOD    - status modifier
+DEV_STAT_CU_END      - control unit end
+DEV_STAT_BUSY        - busy
+DEV_STAT_CHN_END     - channel end
+DEV_STAT_DEV_END     - device end
+DEV_STAT_UNIT_CHECK  - unit check
+DEV_STAT_UNIT_EXCEP  - unit exception
+
+Please see the ESA/390 Principles of Operation manual for details on the
+individual flag meanings.
+
+Usage Notes :
+
+ccw_device_start() must be called disabled and with the ccw device lock held.
+
+The device driver is allowed to issue the next ccw_device_start() call from
+within its interrupt handler already. It is not required to schedule a
+bottom-half, unless a non deterministically long running error recovery procedure
+or similar needs to be scheduled. During I/O processing the Linux/390 generic
+I/O device driver support has already obtained the IRQ lock, i.e. the handler
+must not try to obtain it again when calling ccw_device_start() or we end in a
+deadlock situation!
+
+If a device driver relies on an I/O request to be completed prior to start the
+next it can reduce I/O processing overhead by chaining a NoOp I/O command
+CCW_CMD_NOOP to the end of the submitted CCW chain. This will force Channel-End
+and Device-End status to be presented together, with a single interrupt.
+However, this should be used with care as it implies the channel will remain
+busy, not being able to process I/O requests for other devices on the same
+channel. Therefore e.g. read commands should never use this technique, as the
+result will be presented by a single interrupt anyway.
+
+In order to minimize I/O overhead, a device driver should use the
+DOIO_REPORT_ALL  only if the device can report intermediate interrupt
+information prior to device-end the device driver urgently relies on. In this
+case all I/O interruptions are presented to the device driver until final
+status is recognized.
+
+If a device is able to recover from asynchronously presented I/O errors, it can
+perform overlapping I/O using the DOIO_EARLY_NOTIFICATION flag. While some
+devices always report channel-end and device-end together, with a single
+interrupt, others present primary status (channel-end) when the channel is
+ready for the next I/O request and secondary status (device-end) when the data
+transmission has been completed at the device.
+
+Above flag allows to exploit this feature, e.g. for communication devices that
+can handle lost data on the network to allow for enhanced I/O processing.
+
+Unless the channel subsystem at any time presents a secondary status interrupt,
+exploiting this feature will cause only primary status interrupts to be
+presented to the device driver while overlapping I/O is performed. When a
+secondary status without error (alert status) is presented, this indicates
+successful completion for all overlapping ccw_device_start() requests that have
+been issued since the last secondary (final) status.
+
+Channel programs that intend to set the suspend flag on a channel command word 
+(CCW)  must start the I/O operation with the DOIO_ALLOW_SUSPEND option or the 
+suspend flag will cause a channel program check. At the time the channel program 
+becomes suspended an intermediate interrupt will be generated by the channel 
+subsystem.
+
+ccw_device_resume() - Resume Channel Program Execution 
+
+If a device driver chooses to suspend the current channel program execution by 
+setting the CCW suspend flag on a particular CCW, the channel program execution 
+is suspended. In order to resume channel program execution the CIO layer 
+provides the ccw_device_resume() routine. 
+
+int ccw_device_resume(struct ccw_device *cdev);
+
+cdev - ccw_device the resume operation is requested for
+
+The ccw_device_resume() function returns:
+
+        0  - suspended channel program is resumed
+-EBUSY     - status pending
+-ENODEV    - cdev invalid or not-operational subchannel 
+-EINVAL    - resume function not applicable  
+-ENOTCONN  - there is no I/O request pending for completion 
+
+Usage Notes:
+Please have a look at the ccw_device_start() usage notes for more details on
+suspended channel programs.
+
+ccw_device_halt() - Halt I/O Request Processing
+
+Sometimes a device driver might need a possibility to stop the processing of
+a long-running channel program or the device might require to initially issue
+a halt subchannel (HSCH) I/O command. For those purposes the ccw_device_halt()
+command is provided.
+
+ccw_device_halt() must be called disabled and with the ccw device lock held.
+
+int ccw_device_halt(struct ccw_device *cdev,
+                    unsigned long intparm);
+
+cdev    : ccw_device the halt operation is requested for
+intparm : interruption parameter; value is only used if no I/O
+          is outstanding, otherwise the intparm associated with
+          the I/O request is returned
+
+The ccw_device_halt() function returns :
+
+      0 - request successfully initiated
+-EBUSY  - the device is currently busy, or status pending.
+-ENODEV - cdev invalid.
+-EINVAL - The device is not operational or the ccw device is not online.
+
+Usage Notes :
+
+A device driver may write a never-ending channel program by writing a channel
+program that at its end loops back to its beginning by means of a transfer in
+channel (TIC)   command (CCW_CMD_TIC). Usually this is performed by network
+device drivers by setting the PCI CCW flag (CCW_FLAG_PCI). Once this CCW is
+executed a program controlled interrupt (PCI) is generated. The device driver
+can then perform an appropriate action. Prior to interrupt of an outstanding
+read to a network device (with or without PCI flag) a ccw_device_halt()
+is required to end the pending operation.
+
+ccw_device_clear() - Terminage I/O Request Processing
+
+In order to terminate all I/O processing at the subchannel, the clear subchannel
+(CSCH) command is used. It can be issued via ccw_device_clear().
+
+ccw_device_clear() must be called disabled and with the ccw device lock held.
+
+int ccw_device_clear(struct ccw_device *cdev, unsigned long intparm);
+
+cdev:	 ccw_device the clear operation is requested for
+intparm: interruption parameter (see ccw_device_halt())
+
+The ccw_device_clear() function returns:
+
+      0 - request successfully initiated
+-ENODEV - cdev invalid
+-EINVAL - The device is not operational or the ccw device is not online.
+
+Miscellaneous Support Routines
+
+This chapter describes various routines to be used in a Linux/390 device
+driver programming environment.
+
+get_ccwdev_lock()
+
+Get the address of the device specific lock. This is then used in
+spin_lock() / spin_unlock() calls.
+
+
+__u8 ccw_device_get_path_mask(struct ccw_device *cdev);
+
+Get the mask of the path currently available for cdev.
diff --git a/Documentation/s390/config3270.sh b/Documentation/s390/config3270.sh
new file mode 100644
index 000000000..515e2f431
--- /dev/null
+++ b/Documentation/s390/config3270.sh
@@ -0,0 +1,76 @@
+#!/bin/sh
+#
+# config3270 -- Autoconfigure /dev/3270/* and /etc/inittab
+#
+#       Usage:
+#               config3270
+#
+#       Output:
+#               /tmp/mkdev3270
+#
+#       Operation:
+#               1. Run this script
+#               2. Run the script it produces: /tmp/mkdev3270
+#               3. Issue "telinit q" or reboot, as appropriate.
+#
+P=/proc/tty/driver/tty3270
+ROOT=
+D=$ROOT/dev
+SUBD=3270
+TTY=$SUBD/tty
+TUB=$SUBD/tub
+SCR=$ROOT/tmp/mkdev3270
+SCRTMP=$SCR.a
+GETTYLINE=:2345:respawn:/sbin/mingetty
+INITTAB=$ROOT/etc/inittab
+NINITTAB=$ROOT/etc/NEWinittab
+OINITTAB=$ROOT/etc/OLDinittab
+ADDNOTE=\\"# Additional mingettys for the 3270/tty* driver, tub3270 ---\\"
+
+if ! ls $P > /dev/null 2>&1; then
+	modprobe tub3270 > /dev/null 2>&1
+fi
+ls $P > /dev/null 2>&1 || exit 1
+
+# Initialize two files, one for /dev/3270 commands and one
+# to replace the /etc/inittab file (old one saved in OLDinittab)
+echo "#!/bin/sh" > $SCR || exit 1
+echo " " >> $SCR
+echo "# Script built by /sbin/config3270" >> $SCR
+if [ ! -d /dev/dasd ]; then
+	echo rm -rf "$D/$SUBD/*" >> $SCR
+fi
+echo "grep -v $TTY $INITTAB > $NINITTAB" > $SCRTMP || exit 1
+echo "echo $ADDNOTE >> $NINITTAB" >> $SCRTMP
+if [ ! -d /dev/dasd ]; then
+	echo mkdir -p $D/$SUBD >> $SCR
+fi
+
+# Now query the tub3270 driver for 3270 device information
+# and add appropriate mknod and mingetty lines to our files
+echo what=config > $P
+while read devno maj min;do
+	if [ $min = 0 ]; then
+		fsmaj=$maj
+		if [ ! -d /dev/dasd ]; then
+			echo mknod $D/$TUB c $fsmaj 0 >> $SCR
+			echo chmod 666 $D/$TUB >> $SCR
+		fi
+	elif [ $maj = CONSOLE ]; then
+		if [ ! -d /dev/dasd ]; then
+			echo mknod $D/$TUB$devno c $fsmaj $min >> $SCR
+		fi
+	else
+		if [ ! -d /dev/dasd ]; then
+			echo mknod $D/$TTY$devno c $maj $min >>$SCR
+			echo mknod $D/$TUB$devno c $fsmaj $min >> $SCR
+		fi
+		echo "echo t$min$GETTYLINE $TTY$devno >> $NINITTAB" >> $SCRTMP
+	fi
+done < $P
+
+echo mv $INITTAB $OINITTAB >> $SCRTMP || exit 1
+echo mv $NINITTAB $INITTAB >> $SCRTMP
+cat $SCRTMP >> $SCR
+rm $SCRTMP
+exit 0
diff --git a/Documentation/s390/driver-model.txt b/Documentation/s390/driver-model.txt
new file mode 100644
index 000000000..ed265cf54
--- /dev/null
+++ b/Documentation/s390/driver-model.txt
@@ -0,0 +1,287 @@
+S/390 driver model interfaces
+-----------------------------
+
+1. CCW devices
+--------------
+
+All devices which can be addressed by means of ccws are called 'CCW devices' -
+even if they aren't actually driven by ccws.
+
+All ccw devices are accessed via a subchannel, this is reflected in the 
+structures under devices/:
+
+devices/
+     - system/
+     - css0/
+           - 0.0.0000/0.0.0815/
+	   - 0.0.0001/0.0.4711/
+	   - 0.0.0002/
+	   - 0.1.0000/0.1.1234/
+	   ...
+	   - defunct/
+
+In this example, device 0815 is accessed via subchannel 0 in subchannel set 0,
+device 4711 via subchannel 1 in subchannel set 0, and subchannel 2 is a non-I/O
+subchannel. Device 1234 is accessed via subchannel 0 in subchannel set 1.
+
+The subchannel named 'defunct' does not represent any real subchannel on the
+system; it is a pseudo subchannel where disconnected ccw devices are moved to
+if they are displaced by another ccw device becoming operational on their
+former subchannel. The ccw devices will be moved again to a proper subchannel
+if they become operational again on that subchannel.
+
+You should address a ccw device via its bus id (e.g. 0.0.4711); the device can
+be found under bus/ccw/devices/.
+
+All ccw devices export some data via sysfs.
+
+cutype:	    The control unit type / model.
+
+devtype:    The device type / model, if applicable.
+
+availability: Can be 'good' or 'boxed'; 'no path' or 'no device' for
+	      disconnected devices.
+
+online:     An interface to set the device online and offline.
+	    In the special case of the device being disconnected (see the
+	    notify function under 1.2), piping 0 to online will forcibly delete
+	    the device.
+
+The device drivers can add entries to export per-device data and interfaces.
+
+There is also some data exported on a per-subchannel basis (see under
+bus/css/devices/):
+
+chpids:	    Via which chpids the device is connected.
+
+pimpampom:  The path installed, path available and path operational masks.
+
+There also might be additional data, for example for block devices.
+
+
+1.1 Bringing up a ccw device
+----------------------------
+
+This is done in several steps.
+
+a. Each driver can provide one or more parameter interfaces where parameters can
+   be specified. These interfaces are also in the driver's responsibility.
+b. After a. has been performed, if necessary, the device is finally brought up
+   via the 'online' interface.
+
+
+1.2 Writing a driver for ccw devices
+------------------------------------
+
+The basic struct ccw_device and struct ccw_driver data structures can be found
+under include/asm/ccwdev.h.
+
+struct ccw_device {
+        spinlock_t *ccwlock;
+        struct ccw_device_private *private;
+	struct ccw_device_id id;	
+
+	struct ccw_driver *drv;		
+	struct device dev;		
+	int online;
+
+	void (*handler) (struct ccw_device *dev, unsigned long intparm,
+                         struct irb *irb);
+};
+
+struct ccw_driver {
+	struct module *owner;		
+	struct ccw_device_id *ids;	
+	int (*probe) (struct ccw_device *); 
+	int (*remove) (struct ccw_device *);
+	int (*set_online) (struct ccw_device *);
+	int (*set_offline) (struct ccw_device *);
+	int (*notify) (struct ccw_device *, int);
+	struct device_driver driver;
+	char *name;
+};
+
+The 'private' field contains data needed for internal i/o operation only, and
+is not available to the device driver.
+
+Each driver should declare in a MODULE_DEVICE_TABLE into which CU types/models
+and/or device types/models it is interested. This information can later be found
+in the struct ccw_device_id fields:
+
+struct ccw_device_id {
+	__u16	match_flags;	
+
+	__u16	cu_type;	
+	__u16	dev_type;	
+	__u8	cu_model;	
+	__u8	dev_model;	
+
+	unsigned long driver_info;
+};
+
+The functions in ccw_driver should be used in the following way:
+probe:   This function is called by the device layer for each device the driver
+	 is interested in. The driver should only allocate private structures
+	 to put in dev->driver_data and create attributes (if needed). Also,
+	 the interrupt handler (see below) should be set here.
+
+int (*probe) (struct ccw_device *cdev); 
+
+Parameters:  cdev     - the device to be probed.
+
+
+remove:  This function is called by the device layer upon removal of the driver,
+	 the device or the module. The driver should perform cleanups here.
+
+int (*remove) (struct ccw_device *cdev);
+
+Parameters:   cdev    - the device to be removed.
+
+
+set_online: This function is called by the common I/O layer when the device is
+	    activated via the 'online' attribute. The driver should finally
+	    setup and activate the device here.
+
+int (*set_online) (struct ccw_device *);
+
+Parameters:   cdev	- the device to be activated. The common layer has
+			  verified that the device is not already online.
+
+
+set_offline: This function is called by the common I/O layer when the device is
+	     de-activated via the 'online' attribute. The driver should shut
+	     down the device, but not de-allocate its private data.
+
+int (*set_offline) (struct ccw_device *);
+
+Parameters:   cdev       - the device to be deactivated. The common layer has
+			   verified that the device is online.
+
+
+notify: This function is called by the common I/O layer for some state changes
+	of the device.
+	Signalled to the driver are:
+	* In online state, device detached (CIO_GONE) or last path gone
+	  (CIO_NO_PATH). The driver must return !0 to keep the device; for
+	  return code 0, the device will be deleted as usual (also when no
+	  notify function is registered). If the driver wants to keep the
+	  device, it is moved into disconnected state.
+	* In disconnected state, device operational again (CIO_OPER). The
+	  common I/O layer performs some sanity checks on device number and
+	  Device / CU to be reasonably sure if it is still the same device.
+	  If not, the old device is removed and a new one registered. By the
+	  return code of the notify function the device driver signals if it
+	  wants the device back: !0 for keeping, 0 to make the device being
+	  removed and re-registered.
+	
+int (*notify) (struct ccw_device *, int);
+
+Parameters:   cdev    - the device whose state changed.
+	      event   - the event that happened. This can be one of CIO_GONE,
+		        CIO_NO_PATH or CIO_OPER.
+
+The handler field of the struct ccw_device is meant to be set to the interrupt
+handler for the device. In order to accommodate drivers which use several 
+distinct handlers (e.g. multi subchannel devices), this is a member of ccw_device
+instead of ccw_driver.
+The handler is registered with the common layer during set_online() processing
+before the driver is called, and is deregistered during set_offline() after the
+driver has been called. Also, after registering / before deregistering, path 
+grouping resp. disbanding of the path group (if applicable) are performed.
+
+void (*handler) (struct ccw_device *dev, unsigned long intparm, struct irb *irb);
+
+Parameters:	dev	- the device the handler is called for
+		intparm - the intparm which allows the device driver to identify
+                          the i/o the interrupt is associated with, or to recognize
+                          the interrupt as unsolicited.
+                irb     - interruption response block which contains the accumulated
+                          status.
+
+The device driver is called from the common ccw_device layer and can retrieve 
+information about the interrupt from the irb parameter.
+
+
+1.3 ccwgroup devices
+--------------------
+
+The ccwgroup mechanism is designed to handle devices consisting of multiple ccw
+devices, like lcs or ctc.
+
+The ccw driver provides a 'group' attribute. Piping bus ids of ccw devices to
+this attributes creates a ccwgroup device consisting of these ccw devices (if
+possible). This ccwgroup device can be set online or offline just like a normal
+ccw device.
+
+Each ccwgroup device also provides an 'ungroup' attribute to destroy the device
+again (only when offline). This is a generic ccwgroup mechanism (the driver does
+not need to implement anything beyond normal removal routines).
+
+A ccw device which is a member of a ccwgroup device carries a pointer to the
+ccwgroup device in the driver_data of its device struct. This field must not be
+touched by the driver - it should use the ccwgroup device's driver_data for its
+private data.
+
+To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in
+mind that most drivers will need to implement both a ccwgroup and a ccw
+driver.
+
+
+2. Channel paths
+-----------------
+
+Channel paths show up, like subchannels, under the channel subsystem root (css0)
+and are called 'chp0.<chpid>'. They have no driver and do not belong to any bus.
+Please note, that unlike /proc/chpids in 2.4, the channel path objects reflect
+only the logical state and not the physical state, since we cannot track the
+latter consistently due to lacking machine support (we don't need to be aware
+of it anyway).
+
+status - Can be 'online' or 'offline'.
+	 Piping 'on' or 'off' sets the chpid logically online/offline.
+	 Piping 'on' to an online chpid triggers path reprobing for all devices
+	 the chpid connects to. This can be used to force the kernel to re-use
+	 a channel path the user knows to be online, but the machine hasn't
+	 created a machine check for.
+
+type - The physical type of the channel path.
+
+shared - Whether the channel path is shared.
+
+cmg - The channel measurement group.
+
+3. System devices
+-----------------
+
+3.1 xpram 
+---------
+
+xpram shows up under devices/system/ as 'xpram'.
+
+3.2 cpus
+--------
+
+For each cpu, a directory is created under devices/system/cpu/. Each cpu has an
+attribute 'online' which can be 0 or 1.
+
+
+4. Other devices
+----------------
+
+4.1 Netiucv
+-----------
+
+The netiucv driver creates an attribute 'connection' under
+bus/iucv/drivers/netiucv. Piping to this attribute creates a new netiucv
+connection to the specified host.
+
+Netiucv connections show up under devices/iucv/ as "netiucv<ifnum>". The interface
+number is assigned sequentially to the connections defined via the 'connection'
+attribute.
+
+user			  - shows the connection partner.
+
+buffer			  - maximum buffer size.
+			    Pipe to it to change buffer size.
+
+
diff --git a/Documentation/s390/monreader.txt b/Documentation/s390/monreader.txt
new file mode 100644
index 000000000..d3729585f
--- /dev/null
+++ b/Documentation/s390/monreader.txt
@@ -0,0 +1,197 @@
+
+Date  : 2004-Nov-26
+Author: Gerald Schaefer (geraldsc@de.ibm.com)
+
+
+             Linux API for read access to z/VM Monitor Records
+             =================================================
+
+
+Description
+===========
+This item delivers a new Linux API in the form of a misc char device that is
+usable from user space and allows read access to the z/VM Monitor Records
+collected by the *MONITOR System Service of z/VM.
+
+
+User Requirements
+=================
+The z/VM guest on which you want to access this API needs to be configured in
+order to allow IUCV connections to the *MONITOR service, i.e. it needs the
+IUCV *MONITOR statement in its user entry. If the monitor DCSS to be used is
+restricted (likely), you also need the NAMESAVE <DCSS NAME> statement.
+This item will use the IUCV device driver to access the z/VM services, so you
+need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1.
+
+There are two options for being able to load the monitor DCSS (examples assume
+that the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the
+location of the monitor DCSS with the Class E privileged CP command Q NSS MAP
+(the values BEGPAG and ENDPAG are given in units of 4K pages).
+
+See also "CP Command and Utility Reference" (SC24-6081-00) for more information
+on the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning
+and Administration" (SC24-6116-00) for more information on DCSSes.
+
+1st option:
+-----------
+You can use the CP command DEF STOR CONFIG to define a "memory hole" in your
+guest virtual storage around the address range of the DCSS.
+
+Example: DEF STOR CONFIG 0.140M 200M.200M
+
+This defines two blocks of storage, the first is 140MB in size an begins at
+address 0MB, the second is 200MB in size and begins at address 200MB,
+resulting in a total storage of 340MB. Note that the first block should
+always start at 0 and be at least 64MB in size.
+
+2nd option:
+-----------
+Your guest virtual storage has to end below the starting address of the DCSS
+and you have to specify the "mem=" kernel parameter in your parmfile with a
+value greater than the ending address of the DCSS.
+
+Example: DEF STOR 140M
+
+This defines 140MB storage size for your guest, the parameter "mem=160M" is
+added to the parmfile.
+
+
+User Interface
+==============
+The char device is implemented as a kernel module named "monreader",
+which can be loaded via the modprobe command, or it can be compiled into the
+kernel instead. There is one optional module (or kernel) parameter, "mondcss",
+to specify the name of the monitor DCSS. If the module is compiled into the
+kernel, the kernel parameter "monreader.mondcss=<DCSS NAME>" can be specified
+in the parmfile.
+
+The default name for the DCSS is "MONDCSS" if none is specified. In case that
+there are other users already connected to the *MONITOR service (e.g.
+Performance Toolkit), the monitor DCSS is already defined and you have to use
+the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name
+of the monitor DCSS, if already defined, and the users connected to the
+*MONITOR service.
+Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor
+DCSS if your z/VM doesn't have one already, you need Class E privileges to
+define and save a DCSS.
+
+Example:
+--------
+modprobe monreader mondcss=MYDCSS
+
+This loads the module and sets the DCSS name to "MYDCSS".
+
+NOTE:
+-----
+This API provides no interface to control the *MONITOR service, e.g. specify
+which data should be collected. This can be done by the CP command MONITOR
+(Class E privileged), see "CP Command and Utility Reference".
+
+Device nodes with udev:
+-----------------------
+After loading the module, a char device will be created along with the device
+node /<udev directory>/monreader.
+
+Device nodes without udev:
+--------------------------
+If your distribution does not support udev, a device node will not be created
+automatically and you have to create it manually after loading the module.
+Therefore you need to know the major and minor numbers of the device. These
+numbers can be found in /sys/class/misc/monreader/dev.
+Typing cat /sys/class/misc/monreader/dev will give an output of the form
+<major>:<minor>. The device node can be created via the mknod command, enter
+mknod <name> c <major> <minor>, where <name> is the name of the device node
+to be created.
+
+Example:
+--------
+# modprobe monreader
+# cat /sys/class/misc/monreader/dev
+10:63
+# mknod /dev/monreader c 10 63
+
+This loads the module with the default monitor DCSS (MONDCSS) and creates a
+device node.
+
+File operations:
+----------------
+The following file operations are supported: open, release, read, poll.
+There are two alternative methods for reading: either non-blocking read in
+conjunction with polling, or blocking read without polling. IOCTLs are not
+supported.
+
+Read:
+-----
+Reading from the device provides a 12 Byte monitor control element (MCE),
+followed by a set of one or more contiguous monitor records (similar to the
+output of the CMS utility MONWRITE without the 4K control blocks). The MCE
+contains information on the type of the following record set (sample/event
+data), the monitor domains contained within it and the start and end address
+of the record set in the monitor DCSS. The start and end address can be used
+to determine the size of the record set, the end address is the address of the
+last byte of data. The start address is needed to handle "end-of-frame" records
+correctly (domain 1, record 13), i.e. it can be used to determine the record
+start offset relative to a 4K page (frame) boundary.
+
+See "Appendix A: *MONITOR" in the "z/VM Performance" document for a description
+of the monitor control element layout. The layout of the monitor records can
+be found here (z/VM 5.1): http://www.vm.ibm.com/pubs/mon510/index.html
+
+The layout of the data stream provided by the monreader device is as follows:
+...
+<0 byte read>
+<first MCE>              \
+<first set of records>    |
+...                       |- data set
+<last MCE>                |
+<last set of records>    /
+<0 byte read>
+...
+
+There may be more than one combination of MCE and corresponding record set
+within one data set and the end of each data set is indicated by a successful
+read with a return value of 0 (0 byte read).
+Any received data must be considered invalid until a complete set was
+read successfully, including the closing 0 byte read. Therefore you should
+always read the complete set into a buffer before processing the data.
+
+The maximum size of a data set can be as large as the size of the
+monitor DCSS, so design the buffer adequately or use dynamic memory allocation.
+The size of the monitor DCSS will be printed into syslog after loading the
+module. You can also use the (Class E privileged) CP command Q NSS MAP to
+list all available segments and information about them.
+
+As with most char devices, error conditions are indicated by returning a
+negative value for the number of bytes read. In this case, the errno variable
+indicates the error condition:
+
+EIO: reply failed, read data is invalid and the application
+     should discard the data read since the last successful read with 0 size.
+EFAULT: copy_to_user failed, read data is invalid and the application should
+        discard the data read since the last successful read with 0 size.
+EAGAIN: occurs on a non-blocking read if there is no data available at the
+        moment. There is no data missing or corrupted, just try again or rather
+        use polling for non-blocking reads.
+EOVERFLOW: message limit reached, the data read since the last successful
+           read with 0 size is valid but subsequent records may be missing.
+
+In the last case (EOVERFLOW) there may be missing data, in the first two cases
+(EIO, EFAULT) there will be missing data. It's up to the application if it will
+continue reading subsequent data or rather exit.
+
+Open:
+-----
+Only one user is allowed to open the char device. If it is already in use, the
+open function will fail (return a negative value) and set errno to EBUSY.
+The open function may also fail if an IUCV connection to the *MONITOR service
+cannot be established. In this case errno will be set to EIO and an error
+message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER
+codes are described in the "z/VM Performance" book, Appendix A.
+
+NOTE:
+-----
+As soon as the device is opened, incoming messages will be accepted and they
+will account for the message limit, i.e. opening the device without reading
+from it will provoke the "message limit reached" error (EOVERFLOW error code)
+eventually.
+
diff --git a/Documentation/s390/qeth.txt b/Documentation/s390/qeth.txt
new file mode 100644
index 000000000..aa06fcf5f
--- /dev/null
+++ b/Documentation/s390/qeth.txt
@@ -0,0 +1,50 @@
+IBM s390 QDIO Ethernet Driver
+
+OSA and HiperSockets Bridge Port Support
+
+Uevents
+
+To generate the events the device must be assigned a role of either
+a primary or a secondary Bridge Port. For more information, see
+"z/VM Connectivity, SC24-6174".
+
+When run on an OSA or HiperSockets Bridge Capable Port hardware, and the state
+of some configured Bridge Port device on the channel changes, a udev
+event with ACTION=CHANGE is emitted on behalf of the corresponding
+ccwgroup device. The event has the following attributes:
+
+BRIDGEPORT=statechange -  indicates that the Bridge Port device changed
+  its state.
+
+ROLE={primary|secondary|none} - the role assigned to the port.
+
+STATE={active|standby|inactive} - the newly assumed state of the port.
+
+When run on HiperSockets Bridge Capable Port hardware with host address
+notifications enabled, a udev event with ACTION=CHANGE is emitted.
+It is emitted on behalf of the corresponding ccwgroup device when a host
+or a VLAN is registered or unregistered on the network served by the device.
+The event has the following attributes:
+
+BRIDGEDHOST={reset|register|deregister|abort} - host address
+  notifications are started afresh, a new host or VLAN is registered or
+  deregistered on the Bridge Port HiperSockets channel, or address
+  notifications are aborted.
+
+VLAN=numeric-vlan-id - VLAN ID on which the event occurred. Not included
+  if no VLAN is involved in the event.
+
+MAC=xx:xx:xx:xx:xx:xx - MAC address of the host that is being registered
+  or deregistered from the HiperSockets channel. Not reported if the
+  event reports the creation or destruction of a VLAN.
+
+NTOK_BUSID=x.y.zzzz - device bus ID (CSSID, SSID and device number).
+
+NTOK_IID=xx - device IID.
+
+NTOK_CHPID=xx - device CHPID.
+
+NTOK_CHID=xxxx - device channel ID.
+
+Note that the NTOK_* attributes refer to devices other than  the one
+connected to the system on which the OS is running.
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt
new file mode 100644
index 000000000..61329fd62
--- /dev/null
+++ b/Documentation/s390/s390dbf.txt
@@ -0,0 +1,667 @@
+S390 Debug Feature
+==================
+
+files: arch/s390/kernel/debug.c
+       arch/s390/include/asm/debug.h
+
+Description:
+------------
+The goal of this feature is to provide a kernel debug logging API 
+where log records can be stored efficiently in memory, where each component 
+(e.g. device drivers) can have one separate debug log.
+One purpose of this is to inspect the debug logs after a production system crash
+in order to analyze the reason for the crash.
+If the system still runs but only a subcomponent which uses dbf fails,
+it is possible to look at the debug logs on a live system via the Linux
+debugfs filesystem.
+The debug feature may also very useful for kernel and driver development.
+
+Design:
+-------
+Kernel components (e.g. device drivers) can register themselves at the debug 
+feature with the function call debug_register(). This function initializes a 
+debug log for the caller. For each debug log exists a number of debug areas 
+where exactly one is active at one time.  Each debug area consists of contiguous
+pages in memory. In the debug areas there are stored debug entries (log records)
+which are written by event- and exception-calls. 
+
+An event-call writes the specified debug entry to the active debug
+area and updates the log pointer for the active area. If the end 
+of the active debug area is reached, a wrap around is done (ring buffer) 
+and the next debug entry will be written at the beginning of the active 
+debug area.
+
+An exception-call writes the specified debug entry to the log and
+switches to the next debug area. This is done in order to be sure
+that the records which describe the origin of the exception are not
+overwritten when a wrap around for the current area occurs.
+
+The debug areas themselves are also ordered in form of a ring buffer.
+When an exception is thrown in the last debug area, the following debug 
+entries are then written again in the very first area.
+
+There are three versions for the event- and exception-calls: One for
+logging raw data, one for text and one for numbers.
+
+Each debug entry contains the following data:
+
+- Timestamp
+- Cpu-Number of calling task
+- Level of debug entry (0...6)
+- Return Address to caller
+- Flag, if entry is an exception or not
+
+The debug logs can be inspected in a live system through entries in
+the debugfs-filesystem. Under the toplevel directory "s390dbf" there is
+a directory for each registered component, which is named like the
+corresponding component. The debugfs normally should be mounted to
+/sys/kernel/debug therefore the debug feature can be accessed under
+/sys/kernel/debug/s390dbf.
+
+The content of the directories are files which represent different views
+to the debug log. Each component can decide which views should be
+used through registering them with the function debug_register_view().
+Predefined views for hex/ascii, sprintf and raw binary data are provided.
+It is also possible to define other views. The content of
+a view can be inspected simply by reading the corresponding debugfs file.
+
+All debug logs have an actual debug level (range from 0 to 6).
+The default level is 3. Event and Exception functions have a 'level'
+parameter. Only debug entries with a level that is lower or equal
+than the actual level are written to the log. This means, when
+writing events, high priority log entries should have a low level
+value whereas low priority entries should have a high one.
+The actual debug level can be changed with the help of the debugfs-filesystem
+through writing a number string "x" to the 'level' debugfs file which is
+provided for every debug log. Debugging can be switched off completely
+by using "-" on the 'level' debugfs file.
+
+Example:
+
+> echo "-" > /sys/kernel/debug/s390dbf/dasd/level
+
+It is also possible to deactivate the debug feature globally for every
+debug log. You can change the behavior using  2 sysctl parameters in
+/proc/sys/s390dbf:
+There are currently 2 possible triggers, which stop the debug feature
+globally. The first possibility is to use the "debug_active" sysctl. If
+set to 1 the debug feature is running. If "debug_active" is set to 0 the
+debug feature is turned off.
+The second trigger which stops the debug feature is a kernel oops.
+That prevents the debug feature from overwriting debug information that
+happened before the oops. After an oops you can reactivate the debug feature
+by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not
+suggested to use an oopsed kernel in a production environment.
+If you want to disallow the deactivation of the debug feature, you can use
+the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug
+feature cannot be stopped. If the debug feature is already stopped, it
+will stay deactivated.
+
+Kernel Interfaces:
+------------------
+
+----------------------------------------------------------------------------
+debug_info_t *debug_register(char *name, int pages, int nr_areas,
+                             int buf_size);
+
+Parameter:    name:        Name of debug log (e.g. used for debugfs entry)
+              pages:       number of pages, which will be allocated per area
+              nr_areas:    number of debug areas
+              buf_size:    size of data area in each debug entry
+
+Return Value: Handle for generated debug area   
+              NULL if register failed 
+
+Description:  Allocates memory for a debug log     
+              Must not be called within an interrupt handler 
+
+----------------------------------------------------------------------------
+debug_info_t *debug_register_mode(char *name, int pages, int nr_areas,
+				  int buf_size, mode_t mode, uid_t uid,
+				  gid_t gid);
+
+Parameter:    name:	   Name of debug log (e.g. used for debugfs entry)
+	      pages:	   Number of pages, which will be allocated per area
+	      nr_areas:    Number of debug areas
+	      buf_size:    Size of data area in each debug entry
+	      mode:	   File mode for debugfs files. E.g. S_IRWXUGO
+	      uid:	   User ID for debugfs files. Currently only 0 is
+			   supported.
+	      gid:	   Group ID for debugfs files. Currently only 0 is
+			   supported.
+
+Return Value: Handle for generated debug area
+	      NULL if register failed
+
+Description:  Allocates memory for a debug log
+	      Must not be called within an interrupt handler
+
+---------------------------------------------------------------------------
+void debug_unregister (debug_info_t * id);
+
+Parameter:     id:   handle for debug log  
+
+Return Value:  none 
+
+Description:   frees memory for a debug log and removes all registered debug
+	       views.
+               Must not be called within an interrupt handler 
+
+---------------------------------------------------------------------------
+void debug_set_level (debug_info_t * id, int new_level);
+
+Parameter:     id:        handle for debug log  
+               new_level: new debug level 
+
+Return Value:  none 
+
+Description:   Sets new actual debug level if new_level is valid. 
+
+---------------------------------------------------------------------------
+bool debug_level_enabled (debug_info_t * id, int level);
+
+Parameter:    id:	  handle for debug log
+	      level:	  debug level
+
+Return Value: True if level is less or equal to the current debug level.
+
+Description:  Returns true if debug events for the specified level would be
+	      logged. Otherwise returns false.
+---------------------------------------------------------------------------
+void debug_stop_all(void);
+
+Parameter:     none
+
+Return Value:  none
+
+Description:   stops the debug feature if stopping is allowed. Currently
+               used in case of a kernel oops.
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_event (debug_info_t* id, int level, void* data, 
+                            int length);
+
+Parameter:     id:     handle for debug log  
+               level:  debug level           
+               data:   pointer to data for debug entry  
+               length: length of data in bytes       
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry to active debug area (if level <= actual 
+               debug level)    
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_int_event (debug_info_t * id, int level, 
+                                unsigned int data);
+debug_entry_t* debug_long_event(debug_info_t * id, int level,
+                                unsigned long data);
+
+Parameter:     id:     handle for debug log  
+               level:  debug level           
+               data:   integer value for debug entry           
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry to active debug area (if level <= actual 
+               debug level)    
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_text_event (debug_info_t * id, int level, 
+                                 const char* data);
+
+Parameter:     id:     handle for debug log  
+               level:  debug level           
+               data:   string for debug entry  
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry in ascii format to active debug area 
+               (if level <= actual debug level)     
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_sprintf_event (debug_info_t * id, int level, 
+                                    char* string,...);
+
+Parameter:     id:    handle for debug log 
+               level: debug level
+               string: format string for debug entry 
+               ...: varargs used as in sprintf()
+
+Return Value:  Address of written debug entry
+
+Description:   writes debug entry with format string and varargs (longs) to 
+               active debug area (if level $<=$ actual debug level). 
+               floats and long long datatypes cannot be used as varargs.
+
+---------------------------------------------------------------------------
+
+debug_entry_t* debug_exception (debug_info_t* id, int level, void* data, 
+                                int length);
+
+Parameter:     id:     handle for debug log  
+               level:  debug level           
+               data:   pointer to data for debug entry  
+               length: length of data in bytes       
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry to active debug area (if level <= actual 
+               debug level) and switches to next debug area  
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_int_exception (debug_info_t * id, int level, 
+                                    unsigned int data);
+debug_entry_t* debug_long_exception(debug_info_t * id, int level,
+                                    unsigned long data);
+
+Parameter:     id:     handle for debug log  
+               level:  debug level           
+               data:   integer value for debug entry           
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry to active debug area (if level <= actual 
+               debug level) and switches to next debug area  
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_text_exception (debug_info_t * id, int level, 
+                                     const char* data);
+
+Parameter:     id:     handle for debug log  
+               level:  debug level           
+               data:   string for debug entry  
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry in ascii format to active debug area 
+               (if level <= actual debug level) and switches to next debug 
+               area  
+
+---------------------------------------------------------------------------
+debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level,
+                                        char* string,...);
+
+Parameter:     id:    handle for debug log  
+               level: debug level  
+               string: format string for debug entry  
+               ...: varargs used as in sprintf()
+
+Return Value:  Address of written debug entry 
+
+Description:   writes debug entry with format string and varargs (longs) to 
+               active debug area (if level $<=$ actual debug level) and
+               switches to next debug area. 
+               floats and long long datatypes cannot be used as varargs.
+
+---------------------------------------------------------------------------
+
+int debug_register_view (debug_info_t * id, struct debug_view *view);
+
+Parameter:     id:    handle for debug log  
+               view:  pointer to debug view struct 
+
+Return Value:  0  : ok 
+               < 0: Error 
+
+Description:   registers new debug view and creates debugfs dir entry
+
+---------------------------------------------------------------------------
+int debug_unregister_view (debug_info_t * id, struct debug_view *view); 
+
+Parameter:     id:    handle for debug log  
+               view:  pointer to debug view struct 
+
+Return Value:  0  : ok 
+               < 0: Error 
+
+Description:   unregisters debug view and removes debugfs dir entry
+
+
+
+Predefined views:
+-----------------
+
+extern struct debug_view debug_hex_ascii_view;
+extern struct debug_view debug_raw_view;
+extern struct debug_view debug_sprintf_view;
+
+Examples
+--------
+
+/*
+ * hex_ascii- + raw-view Example
+ */
+
+#include <linux/init.h>
+#include <asm/debug.h>
+
+static debug_info_t* debug_info;
+
+static int init(void)
+{
+    /* register 4 debug areas with one page each and 4 byte data field */
+
+    debug_info = debug_register ("test", 1, 4, 4 );
+    debug_register_view(debug_info,&debug_hex_ascii_view);
+    debug_register_view(debug_info,&debug_raw_view);
+
+    debug_text_event(debug_info, 4 , "one ");
+    debug_int_exception(debug_info, 4, 4711);
+    debug_event(debug_info, 3, &debug_info, 4);
+
+    return 0;
+}
+
+static void cleanup(void)
+{
+    debug_unregister (debug_info);
+}
+
+module_init(init);
+module_exit(cleanup);
+
+---------------------------------------------------------------------------
+
+/*
+ * sprintf-view Example
+ */
+
+#include <linux/init.h>
+#include <asm/debug.h>
+
+static debug_info_t* debug_info;
+
+static int init(void)
+{
+    /* register 4 debug areas with one page each and data field for */
+    /* format string pointer + 2 varargs (= 3 * sizeof(long))       */
+
+    debug_info = debug_register ("test", 1, 4, sizeof(long) * 3);
+    debug_register_view(debug_info,&debug_sprintf_view);
+
+    debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__);
+    debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info);
+
+    return 0;
+}
+
+static void cleanup(void)
+{
+    debug_unregister (debug_info);
+}
+
+module_init(init);
+module_exit(cleanup);
+
+
+
+Debugfs Interface
+----------------
+Views to the debug logs can be investigated through reading the corresponding 
+debugfs-files:
+
+Example:
+
+> ls /sys/kernel/debug/s390dbf/dasd
+flush  hex_ascii  level pages raw
+> cat /sys/kernel/debug/s390dbf/dasd/hex_ascii | sort -k2,2 -s
+00 00974733272:680099 2 - 02 0006ad7e  07 ea 4a 90 | ....
+00 00974733272:682210 2 - 02 0006ade6  46 52 45 45 | FREE
+00 00974733272:682213 2 - 02 0006adf6  07 ea 4a 90 | ....
+00 00974733272:682281 1 * 02 0006ab08  41 4c 4c 43 | EXCP 
+01 00974733272:682284 2 - 02 0006ab16  45 43 4b 44 | ECKD
+01 00974733272:682287 2 - 02 0006ab28  00 00 00 04 | ....
+01 00974733272:682289 2 - 02 0006ab3e  00 00 00 20 | ... 
+01 00974733272:682297 2 - 02 0006ad7e  07 ea 4a 90 | ....
+01 00974733272:684384 2 - 00 0006ade6  46 52 45 45 | FREE
+01 00974733272:684388 2 - 00 0006adf6  07 ea 4a 90 | ....
+
+See section about predefined views for explanation of the above output!
+
+Changing the debug level
+------------------------
+
+Example:
+
+
+> cat /sys/kernel/debug/s390dbf/dasd/level
+3
+> echo "5" > /sys/kernel/debug/s390dbf/dasd/level
+> cat /sys/kernel/debug/s390dbf/dasd/level
+5
+
+Flushing debug areas
+--------------------
+Debug areas can be flushed with piping the number of the desired
+area (0...n) to the debugfs file "flush". When using "-" all debug areas
+are flushed.
+
+Examples:
+
+1. Flush debug area 0:
+> echo "0" > /sys/kernel/debug/s390dbf/dasd/flush
+
+2. Flush all debug areas:
+> echo "-" > /sys/kernel/debug/s390dbf/dasd/flush
+
+Changing the size of debug areas
+------------------------------------
+It is possible the change the size of debug areas through piping
+the number of pages to the debugfs file "pages". The resize request will
+also flush the debug areas.
+
+Example:
+
+Define 4 pages for the debug areas of debug feature "dasd":
+> echo "4" > /sys/kernel/debug/s390dbf/dasd/pages
+
+Stooping the debug feature
+--------------------------
+Example:
+
+1. Check if stopping is allowed
+> cat /proc/sys/s390dbf/debug_stoppable
+2. Stop debug feature
+> echo 0 > /proc/sys/s390dbf/debug_active
+
+lcrash Interface
+----------------
+It is planned that the dump analysis tool lcrash gets an additional command
+'s390dbf' to display all the debug logs. With this tool it will be possible 
+to investigate the debug logs on a live system and with a memory dump after 
+a system crash.
+
+Investigating raw memory
+------------------------
+One last possibility to investigate the debug logs at a live
+system and after a system crash is to look at the raw memory
+under VM or at the Service Element.
+It is possible to find the anker of the debug-logs through
+the 'debug_area_first' symbol in the System map. Then one has
+to follow the correct pointers of the data-structures defined
+in debug.h and find the debug-areas in memory.
+Normally modules which use the debug feature will also have
+a global variable with the pointer to the debug-logs. Following
+this pointer it will also be possible to find the debug logs in
+memory.
+
+For this method it is recommended to use '16 * x + 4' byte (x = 0..n)
+for the length of the data field in debug_register() in
+order to see the debug entries well formatted.
+
+
+Predefined Views
+----------------
+
+There are three predefined views: hex_ascii, raw and sprintf. 
+The hex_ascii view shows the data field in hex and ascii representation 
+(e.g. '45 43 4b 44 | ECKD'). 
+The raw view returns a bytestream as the debug areas are stored in memory.
+
+The sprintf view formats the debug entries in the same way as the sprintf
+function would do. The sprintf event/exception functions write to the
+debug entry a pointer to the format string (size = sizeof(long)) 
+and for each vararg a long value. So e.g. for a debug entry with a format 
+string plus two varargs one would need to allocate a (3 * sizeof(long)) 
+byte data area in the debug_register() function.
+
+IMPORTANT: Using "%s" in sprintf event functions is dangerous. You can only
+use "%s" in the sprintf event functions, if the memory for the passed string is
+available as long as the debug feature exists. The reason behind this is that
+due to performance considerations only a pointer to the string is stored in
+the debug feature. If you log a string that is freed afterwards, you will get
+an OOPS when inspecting the debug feature, because then the debug feature will
+access the already freed memory.
+
+NOTE: If using the sprintf view do NOT use other event/exception functions
+than the sprintf-event and -exception functions.
+
+The format of the hex_ascii and sprintf view is as follows:
+- Number of area
+- Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated 
+  Universal Time (UTC), January 1, 1970)
+- level of debug entry
+- Exception flag (* = Exception)
+- Cpu-Number of calling task
+- Return Address to caller
+- data field
+
+The format of the raw view is:
+- Header as described in debug.h
+- datafield 
+
+A typical line of the hex_ascii view will look like the following (first line 
+is only for explanation and will not be displayed when 'cating' the view):
+
+area  time           level exception cpu caller    data (hex + ascii)
+--------------------------------------------------------------------------
+00    00964419409:440690 1 -         00  88023fe   
+
+
+Defining views
+--------------
+
+Views are specified with the 'debug_view' structure. There are defined
+callback functions which are used for reading and writing the debugfs files:
+
+struct debug_view {
+        char name[DEBUG_MAX_PROCF_LEN];  
+        debug_prolog_proc_t* prolog_proc; 
+        debug_header_proc_t* header_proc;
+        debug_format_proc_t* format_proc;
+        debug_input_proc_t*  input_proc;
+	void*                private_data;
+};
+
+where
+
+typedef int (debug_header_proc_t) (debug_info_t* id,
+                                   struct debug_view* view,
+                                   int area,
+                                   debug_entry_t* entry,
+                                   char* out_buf);
+
+typedef int (debug_format_proc_t) (debug_info_t* id,
+                                   struct debug_view* view, char* out_buf,
+                                   const char* in_buf);
+typedef int (debug_prolog_proc_t) (debug_info_t* id,
+                                   struct debug_view* view,
+                                   char* out_buf);
+typedef int (debug_input_proc_t) (debug_info_t* id,
+                                  struct debug_view* view,
+                                  struct file* file, const char* user_buf,
+                                  size_t in_buf_size, loff_t* offset);
+
+
+The "private_data" member can be used as pointer to view specific data.
+It is not used by the debug feature itself.
+
+The output when reading a debugfs file is structured like this:
+
+"prolog_proc output"
+
+"header_proc output 1"  "format_proc output 1"
+"header_proc output 2"  "format_proc output 2"
+"header_proc output 3"  "format_proc output 3"
+...
+
+When a view is read from the debugfs, the Debug Feature calls the
+'prolog_proc' once for writing the prolog.
+Then 'header_proc' and 'format_proc' are called for each 
+existing debug entry.
+
+The input_proc can be used to implement functionality when it is written to 
+the view (e.g. like with 'echo "0" > /sys/kernel/debug/s390dbf/dasd/level).
+
+For header_proc there can be used the default function
+debug_dflt_header_fn() which is defined in debug.h.
+and which produces the same header output as the predefined views.
+E.g:
+00 00964419409:440761 2 - 00 88023ec
+
+In order to see how to use the callback functions check the implementation
+of the default views!
+
+Example
+
+#include <asm/debug.h>
+
+#define UNKNOWNSTR "data: %08x"
+
+const char* messages[] =
+{"This error...........\n",
+ "That error...........\n",
+ "Problem..............\n",
+ "Something went wrong.\n",
+ "Everything ok........\n",
+ NULL
+};
+
+static int debug_test_format_fn(
+   debug_info_t * id, struct debug_view *view, 
+   char *out_buf, const char *in_buf
+)
+{
+  int i, rc = 0;
+
+  if(id->buf_size >= 4) {
+     int msg_nr = *((int*)in_buf);
+     if(msg_nr < sizeof(messages)/sizeof(char*) - 1)
+        rc += sprintf(out_buf, "%s", messages[msg_nr]);	
+     else
+        rc += sprintf(out_buf, UNKNOWNSTR, msg_nr);
+  }
+ out:
+   return rc;
+}
+
+struct debug_view debug_test_view = {
+  "myview",                 /* name of view */
+  NULL,                     /* no prolog */
+  &debug_dflt_header_fn,    /* default header for each entry */
+  &debug_test_format_fn,    /* our own format function */
+  NULL,                     /* no input function */
+  NULL                      /* no private data */
+};
+
+=====
+test:
+=====
+debug_info_t *debug_info;
+...
+debug_info = debug_register ("test", 0, 4, 4 ));
+debug_register_view(debug_info, &debug_test_view);
+for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i);
+
+> cat /sys/kernel/debug/s390dbf/test/myview
+00 00964419734:611402 1 - 00 88042ca   This error...........
+00 00964419734:611405 1 - 00 88042ca   That error...........
+00 00964419734:611408 1 - 00 88042ca   Problem..............
+00 00964419734:611411 1 - 00 88042ca   Something went wrong.
+00 00964419734:611414 1 - 00 88042ca   Everything ok........
+00 00964419734:611417 1 - 00 88042ca   data: 00000005
+00 00964419734:611419 1 - 00 88042ca   data: 00000006
+00 00964419734:611422 1 - 00 88042ca   data: 00000007
+00 00964419734:611425 1 - 00 88042ca   data: 00000008
+00 00964419734:611428 1 - 00 88042ca   data: 00000009
diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt
new file mode 100644
index 000000000..2be11ad86
--- /dev/null
+++ b/Documentation/s390/vfio-ccw.txt
@@ -0,0 +1,300 @@
+vfio-ccw: the basic infrastructure
+==================================
+
+Introduction
+------------
+
+Here we describe the vfio support for I/O subchannel devices for
+Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
+virtual machine, while vfio is the means.
+
+Different than other hardware architectures, s390 has defined a unified
+I/O access method, which is so called Channel I/O. It has its own access
+patterns:
+- Channel programs run asynchronously on a separate (co)processor.
+- The channel subsystem will access any memory designated by the caller
+  in the channel program directly, i.e. there is no iommu involved.
+Thus when we introduce vfio support for these devices, we realize it
+with a mediated device (mdev) implementation. The vfio mdev will be
+added to an iommu group, so as to make itself able to be managed by the
+vfio framework. And we add read/write callbacks for special vfio I/O
+regions to pass the channel programs from the mdev to its parent device
+(the real I/O subchannel device) to do further address translation and
+to perform I/O instructions.
+
+This document does not intend to explain the s390 I/O architecture in
+every detail. More information/reference could be found here:
+- A good start to know Channel I/O in general:
+  https://en.wikipedia.org/wiki/Channel_I/O
+- s390 architecture:
+  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+- The existing QEMU code which implements a simple emulated channel
+  subsystem could also be a good reference. It makes it easier to follow
+  the flow.
+  qemu/hw/s390x/css.c
+
+For vfio mediated device framework:
+- Documentation/vfio-mediated-device.txt
+
+Motivation of vfio-ccw
+----------------------
+
+Typically, a guest virtualized via QEMU/KVM on s390 only sees
+paravirtualized virtio devices via the "Virtio Over Channel I/O
+(virtio-ccw)" transport. This makes virtio devices discoverable via
+standard operating system algorithms for handling channel devices.
+
+However this is not enough. On s390 for the majority of devices, which
+use the standard Channel I/O based mechanism, we also need to provide
+the functionality of passing through them to a QEMU virtual machine.
+This includes devices that don't have a virtio counterpart (e.g. tape
+drives) or that have specific characteristics which guests want to
+exploit.
+
+For passing a device to a guest, we want to use the same interface as
+everybody else, namely vfio. We implement this vfio support for channel
+devices via the vfio mediated device framework and the subchannel device
+driver "vfio_ccw".
+
+Access patterns of CCW devices
+------------------------------
+
+s390 architecture has implemented a so called channel subsystem, that
+provides a unified view of the devices physically attached to the
+systems. Though the s390 hardware platform knows about a huge variety of
+different peripheral attachments like disk devices (aka. DASDs), tapes,
+communication controllers, etc. They can all be accessed by a well
+defined access method and they are presenting I/O completion a unified
+way: I/O interruptions.
+
+All I/O requires the use of channel command words (CCWs). A CCW is an
+instruction to a specialized I/O channel processor. A channel program is
+a sequence of CCWs which are executed by the I/O channel subsystem.  To
+issue a channel program to the channel subsystem, it is required to
+build an operation request block (ORB), which can be used to point out
+the format of the CCW and other control information to the system. The
+operating system signals the I/O channel subsystem to begin executing
+the channel program with a SSCH (start sub-channel) instruction. The
+central processor is then free to proceed with non-I/O instructions
+until interrupted. The I/O completion result is received by the
+interrupt handler in the form of interrupt response block (IRB).
+
+Back to vfio-ccw, in short:
+- ORBs and channel programs are built in guest kernel (with guest
+  physical addresses).
+- ORBs and channel programs are passed to the host kernel.
+- Host kernel translates the guest physical addresses to real addresses
+  and starts the I/O with issuing a privileged Channel I/O instruction
+  (e.g SSCH).
+- channel programs run asynchronously on a separate processor.
+- I/O completion will be signaled to the host with I/O interruptions.
+  And it will be copied as IRB to user space to pass it back to the
+  guest.
+
+Physical vfio ccw device and its child mdev
+-------------------------------------------
+
+As mentioned above, we realize vfio-ccw with a mdev implementation.
+
+Channel I/O does not have IOMMU hardware support, so the physical
+vfio-ccw device does not have an IOMMU level translation or isolation.
+
+Subchannel I/O instructions are all privileged instructions. When
+handling the I/O instruction interception, vfio-ccw has the software
+policing and translation how the channel program is programmed before
+it gets sent to hardware.
+
+Within this implementation, we have two drivers for two types of
+devices:
+- The vfio_ccw driver for the physical subchannel device.
+  This is an I/O subchannel driver for the real subchannel device.  It
+  realizes a group of callbacks and registers to the mdev framework as a
+  parent (physical) device. As a consequence, mdev provides vfio_ccw a
+  generic interface (sysfs) to create mdev devices. A vfio mdev could be
+  created by vfio_ccw then and added to the mediated bus. It is the vfio
+  device that added to an IOMMU group and a vfio group.
+  vfio_ccw also provides an I/O region to accept channel program
+  request from user space and store I/O interrupt result for user
+  space to retrieve. To notify user space an I/O completion, it offers
+  an interface to setup an eventfd fd for asynchronous signaling.
+
+- The vfio_mdev driver for the mediated vfio ccw device.
+  This is provided by the mdev framework. It is a vfio device driver for
+  the mdev that created by vfio_ccw.
+  It realizes a group of vfio device driver callbacks, adds itself to a
+  vfio group, and registers itself to the mdev framework as a mdev
+  driver.
+  It uses a vfio iommu backend that uses the existing map and unmap
+  ioctls, but rather than programming them into an IOMMU for a device,
+  it simply stores the translations for use by later requests. This
+  means that a device programmed in a VM with guest physical addresses
+  can have the vfio kernel convert that address to process virtual
+  address, pin the page and program the hardware with the host physical
+  address in one step.
+  For a mdev, the vfio iommu backend will not pin the pages during the
+  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
+  of the iova<->vaddr mappings in this operation. And they export a
+  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
+  backend for the physical devices to pin and unpin pages by demand.
+
+Below is a high Level block diagram.
+
+ +-------------+
+ |             |
+ | +---------+ | mdev_register_driver() +--------------+
+ | |  Mdev   | +<-----------------------+              |
+ | |  bus    | |                        | vfio_mdev.ko |
+ | | driver  | +----------------------->+              |<-> VFIO user
+ | +---------+ |    probe()/remove()    +--------------+    APIs
+ |             |
+ |  MDEV CORE  |
+ |   MODULE    |
+ |   mdev.ko   |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+              |
+ | | device  | |                        |  vfio_ccw.ko |<-> subchannel
+ | |interface| +----------------------->+              |     device
+ | +---------+ |       callback         +--------------+
+ +-------------+
+
+The process of how these work together.
+1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
+   physical device (with callbacks) to mdev framework.
+   When vfio_ccw probing the subchannel device, it registers device
+   pointer and callbacks to the mdev framework. Mdev related file nodes
+   under the device node in sysfs would be created for the subchannel
+   device, namely 'mdev_create', 'mdev_destroy' and
+   'mdev_supported_types'.
+2. Create a mediated vfio ccw device.
+   Use the 'mdev_create' sysfs file, we need to manually create one (and
+   only one for our case) mediated device.
+3. vfio_mdev.ko drives the mediated ccw device.
+   vfio_mdev is also the vfio device drvier. It will probe the mdev and
+   add it to an iommu_group and a vfio_group. Then we could pass through
+   the mdev to a guest.
+
+vfio-ccw I/O region
+-------------------
+
+An I/O region is used to accept channel program request from user
+space and store I/O interrupt result for user space to retrieve. The
+definition of the region is:
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+	__u8	orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+	__u8	scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+	__u8	irb_area[IRB_AREA_SIZE];
+	__u32	ret_code;
+} __packed;
+
+While starting an I/O request, orb_area should be filled with the
+guest ORB, and scsw_area should be filled with the SCSW of the Virtual
+Subchannel.
+
+irb_area stores the I/O result.
+
+ret_code stores a return code for each access of the region.
+
+vfio-ccw operation details
+--------------------------
+
+vfio-ccw follows what vfio-pci did on the s390 platform and uses
+vfio-iommu-type1 as the vfio iommu backend.
+
+* CCW translation APIs
+  A group of APIs (start with 'cp_') to do CCW translation. The CCWs
+  passed in by a user space program are organized with their guest
+  physical memory addresses. These APIs will copy the CCWs into kernel
+  space, and assemble a runnable kernel channel program by updating the
+  guest physical addresses with their corresponding host physical addresses.
+  Note that we have to use IDALs even for direct-access CCWs, as the
+  referenced memory can be located anywhere, including above 2G.
+
+* vfio_ccw device driver
+  This driver utilizes the CCW translation APIs and introduces
+  vfio_ccw, which is the driver for the I/O subchannel devices you want
+  to pass through.
+  vfio_ccw implements the following vfio ioctls:
+    VFIO_DEVICE_GET_INFO
+    VFIO_DEVICE_GET_IRQ_INFO
+    VFIO_DEVICE_GET_REGION_INFO
+    VFIO_DEVICE_RESET
+    VFIO_DEVICE_SET_IRQS
+  This provides an I/O region, so that the user space program can pass a
+  channel program to the kernel, to do further CCW translation before
+  issuing them to a real device.
+  This also provides the SET_IRQ ioctl to setup an event notifier to
+  notify the user space program the I/O completion in an asynchronous
+  way.
+
+The use of vfio-ccw is not limited to QEMU, while QEMU is definitely a
+good example to get understand how these patches work. Here is a little
+bit more detail how an I/O request triggered by the QEMU guest will be
+handled (without error handling).
+
+Explanation:
+Q1-Q7: QEMU side process.
+K1-K5: Kernel side process.
+
+Q1. Get I/O region info during initialization.
+Q2. Setup event notifier and handler to handle I/O completion.
+
+... ...
+
+Q3. Intercept a ssch instruction.
+Q4. Write the guest channel program and ORB to the I/O region.
+    K1. Copy from guest to kernel.
+    K2. Translate the guest channel program to a host kernel space
+        channel program, which becomes runnable for a real device.
+    K3. With the necessary information contained in the orb passed in
+        by QEMU, issue the ccwchain to the device.
+    K4. Return the ssch CC code.
+Q5. Return the CC code to the guest.
+
+... ...
+
+    K5. Interrupt handler gets the I/O result and write the result to
+        the I/O region.
+    K6. Signal QEMU to retrieve the result.
+Q6. Get the signal and event handler reads out the result from the I/O
+    region.
+Q7. Update the irb for the guest.
+
+Limitations
+-----------
+
+The current vfio-ccw implementation focuses on supporting basic commands
+needed to implement block device functionality (read/write) of DASD/ECKD
+device only. Some commands may need special handling in the future, for
+example, anything related to path grouping.
+
+DASD is a kind of storage device. While ECKD is a data recording format.
+More information for DASD and ECKD could be found here:
+https://en.wikipedia.org/wiki/Direct-access_storage_device
+https://en.wikipedia.org/wiki/Count_key_data
+
+Together with the corresponding work in QEMU, we can bring the passed
+through DASD/ECKD device online in a guest now and use it as a block
+device.
+
+While the current code allows the guest to start channel programs via
+START SUBCHANNEL, support for HALT SUBCHANNEL or CLEAR SUBCHANNEL is
+not yet implemented.
+
+vfio-ccw supports classic (command mode) channel I/O only. Transport
+mode (HPF) is not supported.
+
+QDIO subchannels are currently not supported. Classic devices other than
+DASD/ECKD might work, but have not been tested.
+
+Reference
+---------
+1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
+3. https://en.wikipedia.org/wiki/Channel_I/O
+4. Documentation/s390/cds.txt
+5. Documentation/vfio.txt
+6. Documentation/vfio-mediated-device.txt
diff --git a/Documentation/s390/zfcpdump.txt b/Documentation/s390/zfcpdump.txt
new file mode 100644
index 000000000..b064aa597
--- /dev/null
+++ b/Documentation/s390/zfcpdump.txt
@@ -0,0 +1,48 @@
+The s390 SCSI dump tool (zfcpdump)
+
+System z machines (z900 or higher) provide hardware support for creating system
+dumps on SCSI disks. The dump process is initiated by booting a dump tool, which
+has to create a dump of the current (probably crashed) Linux image. In order to
+not overwrite memory of the crashed Linux with data of the dump tool, the
+hardware saves some memory plus the register sets of the boot CPU before the
+dump tool is loaded. There exists an SCLP hardware interface to obtain the saved
+memory afterwards. Currently 32 MB are saved.
+
+This zfcpdump implementation consists of a Linux dump kernel together with
+a user space dump tool, which are loaded together into the saved memory region
+below 32 MB. zfcpdump is installed on a SCSI disk using zipl (as contained in
+the s390-tools package) to make the device bootable. The operator of a Linux
+system can then trigger a SCSI dump by booting the SCSI disk, where zfcpdump
+resides on.
+
+The user space dump tool accesses the memory of the crashed system by means
+of the /proc/vmcore interface. This interface exports the crashed system's
+memory and registers in ELF core dump format. To access the memory which has
+been saved by the hardware SCLP requests will be created at the time the data
+is needed by /proc/vmcore. The tail part of the crashed systems memory which
+has not been stashed by hardware can just be copied from real memory.
+
+To build a dump enabled kernel the kernel config option CONFIG_CRASH_DUMP
+has to be set.
+
+To get a valid zfcpdump kernel configuration use "make zfcpdump_defconfig".
+
+The s390 zipl tool looks for the zfcpdump kernel and optional initrd/initramfs
+under the following locations:
+
+* kernel:  <zfcpdump directory>/zfcpdump.image
+* ramdisk: <zfcpdump directory>/zfcpdump.rd
+
+The zfcpdump directory is defined in the s390-tools package.
+
+The user space application of zfcpdump can reside in an intitramfs or an
+initrd. It can also be included in a built-in kernel initramfs. The application
+reads from /proc/vmcore or zcore/mem and writes the system dump to a SCSI disk.
+
+The s390-tools package version 1.24.0 and above builds an external zfcpdump
+initramfs with a user space application that writes the dump to a SCSI
+partition.
+
+For more information on how to use zfcpdump refer to the s390 'Using the Dump
+Tools book', which is available from
+http://www.ibm.com/developerworks/linux/linux390.
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-05-06 01:02:30 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-05-06 01:02:30 +0000
commit	76cb841cb886eef6b3bee341a2266c76578724ad (patch)
tree	f5892e5ba6cc11949952a6ce4ecbe6d516d6ce58 /Documentation/s390
parent	Initial commit. (diff)
download	linux-76cb841cb886eef6b3bee341a2266c76578724ad.tar.xz linux-76cb841cb886eef6b3bee341a2266c76578724ad.zip