diff options
Diffstat (limited to '')
-rw-r--r-- | usr/dash/TOUR | 343 |
1 files changed, 343 insertions, 0 deletions
diff --git a/usr/dash/TOUR b/usr/dash/TOUR new file mode 100644 index 0000000..056e79b --- /dev/null +++ b/usr/dash/TOUR @@ -0,0 +1,343 @@ +# @(#)TOUR 8.1 (Berkeley) 5/31/93 + +NOTE -- This is the original TOUR paper distributed with ash and +does not represent the current state of the shell. It is provided anyway +since it provides helpful information for how the shell is structured, +but be warned that things have changed -- the current shell is +still under development. + +================================================================ + + A Tour through Ash + + Copyright 1989 by Kenneth Almquist. + + +DIRECTORIES: The subdirectory bltin contains commands which can +be compiled stand-alone. The rest of the source is in the main +ash directory. + +SOURCE CODE GENERATORS: Files whose names begin with "mk" are +programs that generate source code. A complete list of these +programs is: + + program intput files generates + ------- ------------ --------- + mkbuiltins builtins builtins.h builtins.c + mkinit *.c init.c + mknodes nodetypes nodes.h nodes.c + mksignames - signames.h signames.c + mksyntax - syntax.h syntax.c + mktokens - token.h + bltin/mkexpr unary_op binary_op operators.h operators.c + +There are undoubtedly too many of these. Mkinit searches all the +C source files for entries looking like: + + INIT { + x = 1; /* executed during initialization */ + } + + RESET { + x = 2; /* executed when the shell does a longjmp + back to the main command loop */ + } + +It pulls this code out into routines which are when particular +events occur. The intent is to improve modularity by isolating +the information about which modules need to be explicitly +initialized/reset within the modules themselves. + +Mkinit recognizes several constructs for placing declarations in +the init.c file. + INCLUDE "file.h" +includes a file. The storage class MKINIT makes a declaration +available in the init.c file, for example: + MKINIT int funcnest; /* depth of function calls */ +MKINIT alone on a line introduces a structure or union declara- +tion: + MKINIT + struct redirtab { + short renamed[10]; + }; +Preprocessor #define statements are copied to init.c without any +special action to request this. + +INDENTATION: The ash source is indented in multiples of six +spaces. The only study that I have heard of on the subject con- +cluded that the optimal amount to indent is in the range of four +to six spaces. I use six spaces since it is not too big a jump +from the widely used eight spaces. If you really hate six space +indentation, use the adjind (source included) program to change +it to something else. + +EXCEPTIONS: Code for dealing with exceptions appears in +exceptions.c. The C language doesn't include exception handling, +so I implement it using setjmp and longjmp. The global variable +exception contains the type of exception. EXERROR is raised by +calling error. EXINT is an interrupt. + +INTERRUPTS: In an interactive shell, an interrupt will cause an +EXINT exception to return to the main command loop. (Exception: +EXINT is not raised if the user traps interrupts using the trap +command.) The INTOFF and INTON macros (defined in exception.h) +provide uninterruptable critical sections. Between the execution +of INTOFF and the execution of INTON, interrupt signals will be +held for later delivery. INTOFF and INTON can be nested. + +MEMALLOC.C: Memalloc.c defines versions of malloc and realloc +which call error when there is no memory left. It also defines a +stack oriented memory allocation scheme. Allocating off a stack +is probably more efficient than allocation using malloc, but the +big advantage is that when an exception occurs all we have to do +to free up the memory in use at the time of the exception is to +restore the stack pointer. The stack is implemented using a +linked list of blocks. + +STPUTC: If the stack were contiguous, it would be easy to store +strings on the stack without knowing in advance how long the +string was going to be: + p = stackptr; + *p++ = c; /* repeated as many times as needed */ + stackptr = p; +The folloing three macros (defined in memalloc.h) perform these +operations, but grow the stack if you run off the end: + STARTSTACKSTR(p); + STPUTC(c, p); /* repeated as many times as needed */ + grabstackstr(p); + +We now start a top-down look at the code: + +MAIN.C: The main routine performs some initialization, executes +the user's profile if necessary, and calls cmdloop. Cmdloop is +repeatedly parses and executes commands. + +OPTIONS.C: This file contains the option processing code. It is +called from main to parse the shell arguments when the shell is +invoked, and it also contains the set builtin. The -i and -j op- +tions (the latter turns on job control) require changes in signal +handling. The routines setjobctl (in jobs.c) and setinteractive +(in trap.c) are called to handle changes to these options. + +PARSING: The parser code is all in parser.c. A recursive des- +cent parser is used. Syntax tables (generated by mksyntax) are +used to classify characters during lexical analysis. There are +three tables: one for normal use, one for use when inside single +quotes, and one for use when inside double quotes. The tables +are machine dependent because they are indexed by character vari- +ables and the range of a char varies from machine to machine. + +PARSE OUTPUT: The output of the parser consists of a tree of +nodes. The various types of nodes are defined in the file node- +types. + +Nodes of type NARG are used to represent both words and the con- +tents of here documents. An early version of ash kept the con- +tents of here documents in temporary files, but keeping here do- +cuments in memory typically results in significantly better per- +formance. It would have been nice to make it an option to use +temporary files for here documents, for the benefit of small +machines, but the code to keep track of when to delete the tem- +porary files was complex and I never fixed all the bugs in it. +(AT&T has been maintaining the Bourne shell for more than ten +years, and to the best of my knowledge they still haven't gotten +it to handle temporary files correctly in obscure cases.) + +The text field of a NARG structure points to the text of the +word. The text consists of ordinary characters and a number of +special codes defined in parser.h. The special codes are: + + CTLVAR Variable substitution + CTLENDVAR End of variable substitution + CTLBACKQ Command substitution + CTLESC Escape next character + +A variable substitution contains the following elements: + + CTLVAR type name '=' [ alternative-text CTLENDVAR ] + +The type field is a single character specifying the type of sub- +stitution. The possible types are: + + VSNORMAL $var + VSMINUS ${var-text} + VSMINUS|VSNUL ${var:-text} + VSPLUS ${var+text} + VSPLUS|VSNUL ${var:+text} + VSQUESTION ${var?text} + VSQUESTION|VSNUL ${var:?text} + VSASSIGN ${var=text} + VSASSIGN|VSNUL ${var=text} + +The name of the variable comes next, terminated by an equals +sign. If the type is not VSNORMAL, then the text field in the +substitution follows, terminated by a CTLENDVAR byte. + +Commands in back quotes are parsed and stored in a linked list. +The locations of these commands in the string are indicated by +the CTLBACKQ character. + +The character CTLESC escapes the next character, so that in case +any of the CTL characters mentioned above appear in the input, +they can be passed through transparently. CTLESC is also used to +escape '*', '?', '[', and '!' characters which were quoted by the +user and thus should not be used for file name generation. + +CTLESC characters have proved to be particularly tricky to get +right. In the case of here documents which are not subject to +variable and command substitution, the parser doesn't insert any +CTLESC characters to begin with (so the contents of the text +field can be written without any processing). Other here docu- +ments, and words which are not subject to splitting and file name +generation, have the CTLESC characters removed during the vari- +able and command substitution phase. Words which are subject +splitting and file name generation have the CTLESC characters re- +moved as part of the file name phase. + +EXECUTION: Command execution is handled by the following files: + eval.c The top level routines. + redir.c Code to handle redirection of input and output. + jobs.c Code to handle forking, waiting, and job control. + exec.c Code to to path searches and the actual exec sys call. + expand.c Code to evaluate arguments. + var.c Maintains the variable symbol table. Called from expand.c. + +EVAL.C: Evaltree recursively executes a parse tree. The exit +status is returned in the global variable exitstatus. The alter- +native entry evalbackcmd is called to evaluate commands in back +quotes. It saves the result in memory if the command is a buil- +tin; otherwise it forks off a child to execute the command and +connects the standard output of the child to a pipe. + +JOBS.C: To create a process, you call makejob to return a job +structure, and then call forkshell (passing the job structure as +an argument) to create the process. Waitforjob waits for a job +to complete. These routines take care of process groups if job +control is defined. + +REDIR.C: Ash allows file descriptors to be redirected and then +restored without forking off a child process. This is accom- +plished by duplicating the original file descriptors. The redir- +tab structure records where the file descriptors have be dupli- +cated to. + +EXEC.C: The routine find_command locates a command, and enters +the command in the hash table if it is not already there. The +third argument specifies whether it is to print an error message +if the command is not found. (When a pipeline is set up, +find_command is called for all the commands in the pipeline be- +fore any forking is done, so to get the commands into the hash +table of the parent process. But to make command hashing as +transparent as possible, we silently ignore errors at that point +and only print error messages if the command cannot be found +later.) + +The routine shellexec is the interface to the exec system call. + +EXPAND.C: Arguments are processed in three passes. The first +(performed by the routine argstr) performs variable and command +substitution. The second (ifsbreakup) performs word splitting +and the third (expandmeta) performs file name generation. If the +"/u" directory is simulated, then when "/u/username" is replaced +by the user's home directory, the flag "didudir" is set. This +tells the cd command that it should print out the directory name, +just as it would if the "/u" directory were implemented using +symbolic links. + +VAR.C: Variables are stored in a hash table. Probably we should +switch to extensible hashing. The variable name is stored in the +same string as the value (using the format "name=value") so that +no string copying is needed to create the environment of a com- +mand. Variables which the shell references internally are preal- +located so that the shell can reference the values of these vari- +ables without doing a lookup. + +When a program is run, the code in eval.c sticks any environment +variables which precede the command (as in "PATH=xxx command") in +the variable table as the simplest way to strip duplicates, and +then calls "environment" to get the value of the environment. +There are two consequences of this. First, if an assignment to +PATH precedes the command, the value of PATH before the assign- +ment must be remembered and passed to shellexec. Second, if the +program turns out to be a shell procedure, the strings from the +environment variables which preceded the command must be pulled +out of the table and replaced with strings obtained from malloc, +since the former will automatically be freed when the stack (see +the entry on memalloc.c) is emptied. + +BUILTIN COMMANDS: The procedures for handling these are scat- +tered throughout the code, depending on which location appears +most appropriate. They can be recognized because their names al- +ways end in "cmd". The mapping from names to procedures is +specified in the file builtins, which is processed by the mkbuil- +tins command. + +A builtin command is invoked with argc and argv set up like a +normal program. A builtin command is allowed to overwrite its +arguments. Builtin routines can call nextopt to do option pars- +ing. This is kind of like getopt, but you don't pass argc and +argv to it. Builtin routines can also call error. This routine +normally terminates the shell (or returns to the main command +loop if the shell is interactive), but when called from a builtin +command it causes the builtin command to terminate with an exit +status of 2. + +The directory bltins contains commands which can be compiled in- +dependently but can also be built into the shell for efficiency +reasons. The makefile in this directory compiles these programs +in the normal fashion (so that they can be run regardless of +whether the invoker is ash), but also creates a library named +bltinlib.a which can be linked with ash. The header file bltin.h +takes care of most of the differences between the ash and the +stand-alone environment. The user should call the main routine +"main", and #define main to be the name of the routine to use +when the program is linked into ash. This #define should appear +before bltin.h is included; bltin.h will #undef main if the pro- +gram is to be compiled stand-alone. + +CD.C: This file defines the cd and pwd builtins. The pwd com- +mand runs /bin/pwd the first time it is invoked (unless the user +has already done a cd to an absolute pathname), but then +remembers the current directory and updates it when the cd com- +mand is run, so subsequent pwd commands run very fast. The main +complication in the cd command is in the docd command, which +resolves symbolic links into actual names and informs the user +where the user ended up if he crossed a symbolic link. + +SIGNALS: Trap.c implements the trap command. The routine set- +signal figures out what action should be taken when a signal is +received and invokes the signal system call to set the signal ac- +tion appropriately. When a signal that a user has set a trap for +is caught, the routine "onsig" sets a flag. The routine dotrap +is called at appropriate points to actually handle the signal. +When an interrupt is caught and no trap has been set for that +signal, the routine "onint" in error.c is called. + +OUTPUT: Ash uses it's own output routines. There are three out- +put structures allocated. "Output" represents the standard out- +put, "errout" the standard error, and "memout" contains output +which is to be stored in memory. This last is used when a buil- +tin command appears in backquotes, to allow its output to be col- +lected without doing any I/O through the UNIX operating system. +The variables out1 and out2 normally point to output and errout, +respectively, but they are set to point to memout when appropri- +ate inside backquotes. + +INPUT: The basic input routine is pgetc, which reads from the +current input file. There is a stack of input files; the current +input file is the top file on this stack. The code allows the +input to come from a string rather than a file. (This is for the +-c option and the "." and eval builtin commands.) The global +variable plinno is saved and restored when files are pushed and +popped from the stack. The parser routines store the number of +the current line in this variable. + +DEBUGGING: If DEBUG is defined in shell.h, then the shell will +write debugging information to the file $HOME/trace. Most of +this is done using the TRACE macro, which takes a set of printf +arguments inside two sets of parenthesis. Example: +"TRACE(("n=%d0, n))". The double parenthesis are necessary be- +cause the preprocessor can't handle functions with a variable +number of arguments. Defining DEBUG also causes the shell to +generate a core dump if it is sent a quit signal. The tracing +code is in show.c. |