diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/archlinux/man1p/yacc.1p | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/archlinux/man1p/yacc.1p')
-rw-r--r-- | upstream/archlinux/man1p/yacc.1p | 1593 |
1 files changed, 1593 insertions, 0 deletions
diff --git a/upstream/archlinux/man1p/yacc.1p b/upstream/archlinux/man1p/yacc.1p new file mode 100644 index 00000000..0a06390d --- /dev/null +++ b/upstream/archlinux/man1p/yacc.1p @@ -0,0 +1,1593 @@ +'\" et +.TH YACC "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.\" +.SH NAME +yacc +\(em yet another compiler compiler (\fBDEVELOPMENT\fP) +.SH SYNOPSIS +.LP +.nf +yacc \fB[\fR-dltv\fB] [\fR-b \fIfile_prefix\fB] [\fR-p \fIsym_prefix\fB]\fI grammar\fR +.fi +.SH DESCRIPTION +The +.IR yacc +utility shall read a description of a context-free grammar in +.IR grammar +and write C source code, conforming to the ISO\ C standard, to a code file, +and optionally header information into a header file, in the current +directory. The generated source code shall not depend on any undefined, +unspecified, or implementation-defined behavior, except in cases where +it is copied directly from the supplied grammar, or in cases that are +documented by the implementation. The C code shall define a function +and related routines and macros for an automaton that executes a parsing +algorithm meeting the requirements in +.IR "Algorithms". +.P +The form and meaning of the grammar are described in the EXTENDED +DESCRIPTION section. +.P +The C source code and header file shall be produced in a form suitable +as input for the C compiler (see +.IR "\fIc99\fR\^"). +.SH OPTIONS +The +.IR yacc +utility shall conform to the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 12.2" ", " "Utility Syntax Guidelines", +except for Guideline 9. +.P +The following options shall be supported: +.IP "\fB\-b\ \fIfile_prefix\fR" 10 +Use +.IR file_prefix +instead of +.BR y +as the prefix for all output filenames. The code file +.BR y.tab.c , +the header file +.BR y.tab.h +(created when +.BR \-d +is specified), and the description file +.BR y.output +(created when +.BR \-v +is specified), shall be changed to +.IR file_prefix \c +.BR .tab.c , +.IR file_prefix \c +.BR .tab.h , +and +.IR file_prefix \c +.BR .output , +respectively. +.IP "\fB\-d\fP" 10 +Write the header file; by default only the code file is written. See +the OUTPUT FILES section. +.IP "\fB\-l\fP" 10 +Produce a code file that does not contain any +.BR #line +constructs. If this option is not present, it is unspecified whether +the code file or header file contains +.BR #line +directives. This should only be used after the grammar and the +associated actions are fully debugged. +.IP "\fB\-p\ \fIsym_prefix\fR" 10 +.br +Use +.IR sym_prefix +instead of +.BR yy +as the prefix for all external names produced by +.IR yacc . +The names affected shall include the functions +\fIyyparse\fR(), +\fIyylex\fR(), +and +\fIyyerror\fR(), +and the variables +.IR yylval , +.IR yychar , +and +.IR yydebug . +(In the remainder of this section, the six symbols cited are referenced +using their default names only as a notational convenience.) Local +names may also be affected by the +.BR \-p +option; however, the +.BR \-p +option shall not affect +.BR #define +symbols generated by +.IR yacc . +.IP "\fB\-t\fP" 10 +Modify conditional compilation directives to permit compilation of +debugging code in the code file. Runtime debugging statements shall +always be contained in the code file, but by default conditional +compilation directives prevent their compilation. +.IP "\fB\-v\fP" 10 +Write a file containing a description of the parser and a report of +conflicts generated by ambiguities in the grammar. +.br +.SH OPERANDS +The following operand is required: +.IP "\fIgrammar\fR" 10 +A pathname of a file containing instructions, hereafter called +.IR grammar , +for which a parser is to be created. The format for the grammar is +described in the EXTENDED DESCRIPTION section. +.SH STDIN +Not used. +.SH "INPUT FILES" +The file +.IR grammar +shall be a text file formatted as specified in the EXTENDED DESCRIPTION +section. +.SH "ENVIRONMENT VARIABLES" +The following environment variables shall affect the execution of +.IR yacc : +.IP "\fILANG\fP" 10 +Provide a default value for the internationalization variables that are +unset or null. (See the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 8.2" ", " "Internationalization Variables" +for the precedence of internationalization variables used to determine +the values of locale categories.) +.IP "\fILC_ALL\fP" 10 +If set to a non-empty string value, override the values of all the +other internationalization variables. +.IP "\fILC_CTYPE\fP" 10 +Determine the locale for the interpretation of sequences of bytes of +text data as characters (for example, single-byte as opposed to +multi-byte characters in arguments and input files). +.IP "\fILC_MESSAGES\fP" 10 +.br +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard error. +.IP "\fINLSPATH\fP" 10 +Determine the location of message catalogs for the processing of +.IR LC_MESSAGES . +.P +The +.IR LANG +and +.IR LC_* +variables affect the execution of the +.IR yacc +utility as stated. The +\fImain\fR() +function defined in +.IR "Yacc Library" +shall call: +.sp +.RS 4 +.nf + +setlocale(LC_ALL, "") +.fi +.P +.RE +.P +and thus the program generated by +.IR yacc +shall also be affected by the contents of these variables at runtime. +.SH "ASYNCHRONOUS EVENTS" +Default. +.SH STDOUT +Not used. +.SH STDERR +If shift/reduce or reduce/reduce conflicts are detected in +.IR grammar , +.IR yacc +shall write a report of those conflicts to the standard error in an +unspecified format. +.P +Standard error shall also be used for diagnostic messages. +.SH "OUTPUT FILES" +The code file, the header file, and the description file shall be text +files. All are described in the following sections. +.SS "Code File" +.P +This file shall contain the C source code for the +\fIyyparse\fR() +function. It shall contain code for the various semantic actions with +macro substitution performed on them as described in the EXTENDED +DESCRIPTION section. It also shall contain a copy of the +.BR #define +statements in the header file. If a +.BR %union +declaration is used, the declaration for YYSTYPE shall also be included +in this file. +.SS "Header File" +.P +The header file shall contain +.BR #define +statements that associate the token numbers with the token names. This +allows source files other than the code file to access the token +codes. If a +.BR %union +declaration is used, the declaration for YYSTYPE and an +.IR "extern YYSTYPE yylval" +declaration shall also be included in this file. +.SS "Description File" +.P +The description file shall be a text file containing a description of +the state machine corresponding to the parser, using an unspecified +format. Limits for internal tables (see +.IR "Limits") +shall also be reported, in an implementation-defined manner. (Some +implementations may use dynamic allocation techniques and have no +specific limit values to report.) +.SH "EXTENDED DESCRIPTION" +The +.IR yacc +command accepts a language that is used to define a grammar for a +target language to be parsed by the tables and code generated by +.IR yacc . +The language accepted by +.IR yacc +as a grammar for the target language is described below using the +.IR yacc +input language itself. +.P +The input +.IR grammar +includes rules describing the input structure of the target language +and code to be invoked when these rules are recognized to provide the +associated semantic action. The code to be executed shall appear as bodies +of text that are intended to be C-language code. These bodies of text +shall not contain C-language trigraphs. The C-language inclusions are +presumed to form a correct function when processed by +.IR yacc +into its output files. The code included in this way shall be executed +during the recognition of the target language. +.P +Given a grammar, the +.IR yacc +utility generates the files described in the OUTPUT FILES section. The +code file can be compiled and linked using +.IR c99 . +If the declaration and programs sections of the grammar file did not +include definitions of +\fImain\fR(), +\fIyylex\fR(), +and +\fIyyerror\fR(), +the compiled output requires linking with externally supplied versions +of those functions. Default versions of +\fImain\fR() +and +\fIyyerror\fR() +are supplied in the +.IR yacc +library and can be linked in by using the +.BR "\-l\ y" +operand to +.IR c99 . +The +.IR yacc +library interfaces need not support interfaces with other than the +default +.BR yy +symbol prefix. The application provides the lexical analyzer function, +\fIyylex\fR(); +the +.IR lex +utility is specifically designed to generate such a routine. +.SS "Input Language" +.P +The application shall ensure that every specification file consists of +three sections in order: +.IR declarations , +.IR "grammar rules" , +and +.IR programs , +separated by double +<percent-sign> +characters (\c +.BR \(dq%%\(dq ). +The declarations and programs sections can be empty. If the latter is +empty, the preceding +.BR \(dq%%\(dq +mark separating it from the rules section can be omitted. +.P +The input is free form text following the structure of the grammar +defined below. +.SS "Lexical Structure of the Grammar" +.P +The +<blank>, +<newline>, +and +<form-feed> +character shall be ignored, except that the application shall ensure that +they do not appear in names or multi-character reserved symbols. Comments +shall be enclosed in +.BR \(dq/*\ ...\ */\(dq , +and can appear wherever a name is valid. +.P +Names are of arbitrary length, made up of letters, periods (\c +.BR '.' ), +underscores (\c +.BR '_' ), +and non-initial digits. Uppercase and lowercase letters are distinct. +Conforming applications shall not use names beginning in +.BR yy +or +.BR YY +since the +.IR yacc +parser uses such names. Many of the names appear in the final output +of +.IR yacc , +and thus they should be chosen to conform with any additional rules +created by the C compiler to be used. In particular they appear in +.BR #define +statements. +.P +A literal shall consist of a single character enclosed in single-quote +characters. All of the escape sequences supported for character constants +by the ISO\ C standard shall be supported by +.IR yacc . +.P +The relationship with the lexical analyzer is discussed in detail below. +.P +The application shall ensure that the NUL character is not used in +grammar rules or literals. +.SS "Declarations Section" +.P +The declarations section is used to define the symbols used to define +the target language and their relationship with each other. In +particular, much of the additional information required to resolve +ambiguities in the context-free grammar for the target language is +provided here. +.P +Usually +.IR yacc +assigns the relationship between the symbolic names it generates and +their underlying numeric value. The declarations section makes it +possible to control the assignment of these values. +.P +It is also possible to keep semantic information associated with the +tokens currently on the parse stack in a user-defined C-language +.BR union , +if the members of the union are associated with the various names in +the grammar. The declarations section provides for this as well. +.P +The first group of declarators below all take a list of names as +arguments. That list can optionally be preceded by the name of a C +union member (called a +.IR tag +below) appearing within +.BR '<' +and +.BR '>' . +(As an exception to the typographical conventions of the rest of this volume of POSIX.1\(hy2017, +in this case <\fItag\fP> does not represent a metavariable, but the +literal angle bracket characters surrounding a symbol.) The use of +.IR tag +specifies that the tokens named on this line shall be of the same C +type as the union member referenced by +.IR tag . +This is discussed in more detail below. +.P +For lists used to define tokens, the first appearance of a given token +can be followed by a positive integer (as a string of decimal digits). +If this is done, the underlying value assigned to it for lexical +purposes shall be taken to be that number. +.P +The following declares +.IR name +to be a token: +.sp +.RS 4 +.nf + +%token \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR... +.fi +.P +.RE +.P +If +.IR tag +is present, the C type for all tokens on this line shall be declared to +be the type referenced by +.IR tag . +If a positive integer, +.IR number , +follows a +.IR name , +that value shall be assigned to the token. +.P +The following declares +.IR name +to be a token, and assigns precedence to it: +.sp +.RS 4 +.nf + +%left \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR... +%right \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR... +.fi +.P +.RE +.P +One or more lines, each beginning with one of these symbols, can appear +in this section. All tokens on the same line have the same precedence +level and associativity; the lines are in order of increasing +precedence or binding strength. +.BR %left +denotes that the operators on that line are left associative, and +.BR %right +similarly denotes right associative operators. If +.IR tag +is present, it shall declare a C type for +.IR name s +as described for +.BR %token . +.P +The following declares +.IR name +to be a token, and indicates that this cannot be used associatively: +.sp +.RS 4 +.nf + +%nonassoc \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR... +.fi +.P +.RE +.P +If the parser encounters associative use of this token it reports an +error. If +.IR tag +is present, it shall declare a C type for +.IR name s +as described for +.BR %token . +.P +The following declares that union member +.IR name s +are non-terminals, and thus it is required to have a +.IR tag +field at its beginning: +.sp +.RS 4 +.nf + +%type <\fItag\fR> \fIname\fR... +.fi +.P +.RE +.P +Because it deals with non-terminals only, assigning a token number or +using a literal is also prohibited. If this construct is present, +.IR yacc +shall perform type checking; if this construct is not present, the +parse stack shall hold only the +.BR int +type. +.P +Every name used in +.IR grammar +not defined by a +.BR %token , +.BR %left , +.BR %right , +or +.BR %nonassoc +declaration is assumed to represent a non-terminal symbol. The +.IR yacc +utility shall report an error for any non-terminal symbol that does not +appear on the left side of at least one grammar rule. +.P +Once the type, precedence, or token number of a name is specified, it +shall not be changed. If the first declaration of a token does not +assign a token number, +.IR yacc +shall assign a token number. Once this assignment is made, the token +number shall not be changed by explicit assignment. +.P +The following declarators do not follow the previous pattern. +.P +The following declares the non-terminal +.IR name +to be the +.IR "start symbol" , +which represents the largest, most general structure described by the +grammar rules: +.sp +.RS 4 +.nf + +%start \fIname\fR +.fi +.P +.RE +.P +By default, it is the left-hand side of the first grammar rule; this +default can be overridden with this declaration. +.P +The following declares the +.IR yacc +value stack to be a union of the various types of values desired. +.sp +.RS 4 +.nf + +%union { \fIbody of union\fR (\fIin C\fR) } +.fi +.P +.RE +.P +The body of the union shall not contain unbalanced curly brace +preprocessing tokens. +.P +By default, the values returned by actions (see below) and the lexical +analyzer shall be of type +.BR int . +The +.IR yacc +utility keeps track of types, and it shall insert corresponding union +member names in order to perform strict type checking of the resulting +parser. +.P +Alternatively, given that at least one <\fItag\fP> construct is used, +the union can be declared in a header file (which shall be included in +the declarations section by using a +.BR #include +construct within +.BR %{ +and +.BR %} ), +and a +.BR typedef +used to define the symbol YYSTYPE to represent this union. The effect +of +.BR %union +is to provide the declaration of YYSTYPE directly from the +.IR yacc +input. +.P +C-language declarations and definitions can appear in the declarations +section, enclosed by the following marks: +.sp +.RS 4 +.nf + +%{ ... %} +.fi +.P +.RE +.P +These statements shall be copied into the code file, and have global +scope within it so that they can be used in the rules and program +sections. The statements shall not contain +.BR \(dq%}\(dq +outside a comment, string literal, or multi-character constant. +.P +The application shall ensure that the declarations section is +terminated by the token +.BR %% . +.SS "Grammar Rules in yacc" +.P +The rules section defines the context-free grammar to be accepted by +the function +.IR yacc +generates, and associates with those rules C-language actions and +additional precedence information. The grammar is described below, and +a formal definition follows. +.P +The rules section is comprised of one or more grammar rules. A grammar +rule has the form: +.sp +.RS 4 +.nf + +A : BODY ; +.fi +.P +.RE +.P +The symbol +.BR A +represents a non-terminal name, and +.BR BODY +represents a sequence of zero or more +.IR name s, +.IR literal s, +and +.IR "semantic action" s +that can then be followed by optional +.IR "precedence rule" s. +Only the names and literals participate in the formation of the +grammar; the semantic actions and precedence rules are used in other +ways. The +<colon> +and the +<semicolon> +are +.IR yacc +punctuation. If there are several successive grammar rules with the +same left-hand side, the +<vertical-line> +(\c +.BR '|' ) +can be used to avoid rewriting the left-hand side; in this case the +<semicolon> +appears only after the last rule. The BODY part can be empty +(or empty of names and literals) to indicate that the non-terminal +symbol matches the empty string. +.P +The +.IR yacc +utility assigns a unique number to each rule. Rules using the vertical +bar notation are distinct rules. The number assigned to the rule +appears in the description file. +.P +The elements comprising a BODY are: +.IP "\fIname\fR,\ \fIliteral\fR" 10 +These form the rules of the grammar: +.IR name +is either a +.IR token +or a +.IR non-terminal ; +.IR literal +stands for itself (less the lexically required quotation marks). +.IP "\fIsemantic action\fR" 10 +.br +With each grammar rule, the user can associate actions to be performed +each time the rule is recognized in the input process. (Note that the +word ``action'' can also refer to the actions of the parser\(emshift, +reduce, and so on.) +.RS 10 +.P +These actions can return values and can obtain the values returned by +previous actions. These values are kept in objects of type YYSTYPE +(see +.BR %union ). +The result value of the action shall be kept on the parse stack with +the left-hand side of the rule, to be accessed by other reductions as +part of their right-hand side. By using the <\fItag\fP> information +provided in the declarations section, the code generated by +.IR yacc +can be strictly type checked and contain arbitrary information. In +addition, the lexical analyzer can provide the same kinds of values for +tokens, if desired. +.P +An action is an arbitrary C statement and as such can do input or +output, call subprograms, and alter external variables. An action is +one or more C statements enclosed in curly braces +.BR '{' +and +.BR '}' . +The statements shall not contain unbalanced curly brace preprocessing +tokens. +.P +Certain pseudo-variables can be used in the action. These are macros +for access to data structures known internally to +.IR yacc . +.IP $$ 10 +The value of the action can be set by assigning it to $$. If type +checking is enabled and the type of the value to be assigned cannot be +determined, a diagnostic message may be generated. +.IP "$\fInumber\fR" 10 +This refers to the value returned by the component specified by the +token +.IR number +in the right side of a rule, reading from left to right; +.IR number +can be zero or negative. If +.IR number +is zero or negative, it refers to the data associated with the name on +the parser's stack preceding the leftmost symbol of the current rule. +(That is, +.BR \(dq$0\(dq +refers to the name immediately preceding the leftmost name in the +current rule to be found on the parser's stack and +.BR \(dq$-1\(dq +refers to the symbol to +.IR its +left.) If +.IR number +refers to an element past the current point in the rule, or beyond the +bottom of the stack, the result is undefined. If type checking is +enabled and the type of the value to be assigned cannot be determined, +a diagnostic message may be generated. +.IP "$<\fItag\fR>\fInumber\fR" 10 +.br +These correspond exactly to the corresponding symbols without the +.IR tag +inclusion, but allow for strict type checking (and preclude unwanted +type conversions). The effect is that the macro is expanded to use +.IR tag +to select an element from the YYSTYPE union (using +.IR dataname.tag ). +This is particularly useful if +.IR number +is not positive. +.IP "$<\fItag\fR>$" 10 +This imposes on the reference the type of the union member referenced +by +.IR tag . +This construction is applicable when a reference to a left context +value occurs in the grammar, and provides +.IR yacc +with a means for selecting a type. +.P +Actions can occur anywhere in a rule (not just at the end); an +action can access values returned by actions to its left, and in turn +the value it returns can be accessed by actions to its right. An +action appearing in the middle of a rule shall be equivalent to +replacing the action with a new non-terminal symbol and adding an empty +rule with that non-terminal symbol on the left-hand side. The semantic +action associated with the new rule shall be equivalent to the original +action. The use of actions within rules might introduce conflicts that +would not otherwise exist. +.P +By default, the value of a rule shall be the value of the first element +in it. If the first element does not have a type (particularly in the +case of a literal) and type checking is turned on by +.BR %type , +an error message shall result. +.RE +.IP "\fIprecedence\fR" 10 +The keyword +.BR %prec +can be used to change the precedence level associated with a particular +grammar rule. Examples of this are in cases where a unary and binary +operator have the same symbolic representation, but need to be given +different precedences, or where the handling of an ambiguous if-else +construction is necessary. The reserved symbol +.BR %prec +can appear immediately after the body of the grammar rule and can be +followed by a token name or a literal. It shall cause the precedence +of the grammar rule to become that of the following token name or +literal. The action for the rule as a whole can follow +.BR %prec . +.P +If a program section follows, the application shall ensure that the +grammar rules are terminated by +.BR %% . +.SS "Programs Section" +.P +The +.IR programs +section can include the definition of the lexical analyzer +\fIyylex\fR(), +and any other functions; for example, those used in the actions +specified in the grammar rules. It is unspecified whether the programs +section precedes or follows the semantic actions in the output file; +therefore, if the application contains any macro definitions and +declarations intended to apply to the code in the semantic actions, it +shall place them within +.BR \(dq%{\ ...\ %}\(dq +in the declarations section. +.SS "Input Grammar" +.P +The following input to +.IR yacc +yields a parser for the input to +.IR yacc . +This formal syntax takes precedence over the preceding text syntax +description. +.P +The lexical structure is defined less precisely; +.IR "Lexical Structure of the Grammar" +defines most terms. The correspondence between the previous terms and +the tokens below is as follows. +.IP "\fBIDENTIFIER\fR" 12 +This corresponds to the concept of +.IR name , +given previously. It also includes literals as defined previously. +.IP "\fBC_IDENTIFIER\fR" 12 +This is a name, and additionally it is known to be followed by a +<colon>. +A literal cannot yield this token. +.IP "\fBNUMBER\fR" 12 +A string of digits (a non-negative decimal integer). +.IP "\fBTYPE\fR,\ \fBLEFT\fR,\ \fBMARK\fR,\ \fBLCURL\fR,\ \fBRCURL\fR" 12 +.br +These correspond directly to +.BR %type , +.BR %left , +.BR %% , +.BR %{ , +and +.BR %} . +.IP "\fB{\ .\|.\|.\ }\fR" 12 +This indicates C-language source code, with the possible inclusion of +.BR '$' +macros as discussed previously. +.sp +.RS 4 +.nf + +/* Grammar for the input to yacc. */ +/* Basic entries. */ +/* The following are recognized by the lexical analyzer. */ +.P +%token IDENTIFIER /* Includes identifiers and literals */ +%token C_IDENTIFIER /* identifier (but not literal) + followed by a :. */ +%token NUMBER /* [0-9][0-9]* */ +.P +/* Reserved words : %type=>TYPE %left=>LEFT, and so on */ +.P +%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION +.P +%token MARK /* The %% mark. */ +%token LCURL /* The %{ mark. */ +%token RCURL /* The %} mark. */ +.P +/* 8-bit character literals stand for themselves; */ +/* tokens have to be defined for multi-byte characters. */ +.P +%start spec +.P +%% +.P +spec : defs MARK rules tail + ; +tail : MARK + { + /* In this action, set up the rest of the file. */ + } + | /* Empty; the second MARK is optional. */ + ; +defs : /* Empty. */ + | defs def + ; +def : START IDENTIFIER + | UNION + { + /* Copy union definition to output. */ + } + | LCURL + { + /* Copy C code to output file. */ + } + RCURL + | rword tag nlist + ; +rword : TOKEN + | LEFT + | RIGHT + | NONASSOC + | TYPE + ; +tag : /* Empty: union tag ID optional. */ + | \(aq<\(aq IDENTIFIER \(aq>\(aq + ; +nlist : nmno + | nlist nmno + ; +nmno : IDENTIFIER /* Note: literal invalid with % type. */ + | IDENTIFIER NUMBER /* Note: invalid with % type. */ + ; +.P +/* Rule section */ +.P +rules : C_IDENTIFIER rbody prec + | rules rule + ; +rule : C_IDENTIFIER rbody prec + | \(aq|\(aq rbody prec + ; +rbody : /* empty */ + | rbody IDENTIFIER + | rbody act + ; +act : \(aq{\(aq + { + /* Copy action, translate $$, and so on. */ + } + \(aq}\(aq + ; +prec : /* Empty */ + | PREC IDENTIFIER + | PREC IDENTIFIER act + | prec \(aq;\(aq + ; +.fi +.P +.RE +.sp +.SS "Conflicts" +.P +The parser produced for an input grammar may contain states in which +conflicts occur. The conflicts occur because the grammar is not +LALR(1). An ambiguous grammar always contains at least one LALR(1) +conflict. The +.IR yacc +utility shall resolve all conflicts, using either default rules or +user-specified precedence rules. +.P +Conflicts are either shift/reduce conflicts or reduce/reduce +conflicts. A shift/reduce conflict is where, for a given state and +lookahead symbol, both a shift action and a reduce action are +possible. A reduce/reduce conflict is where, for a given state and +lookahead symbol, reductions by two different rules are possible. +.P +The rules below describe how to specify what actions to take when a +conflict occurs. Not all shift/reduce conflicts can be successfully +resolved this way because the conflict may be due to something other +than ambiguity, so incautious use of these facilities can cause the +language accepted by the parser to be much different from that which +was intended. The description file shall contain sufficient +information to understand the cause of the conflict. Where ambiguity +is the reason either the default or explicit rules should be adequate +to produce a working parser. +.P +The declared precedences and associativities (see +.IR "Declarations Section") +are used to resolve parsing conflicts as follows: +.IP " 1." 4 +A precedence and associativity is associated with each grammar rule; it +is the precedence and associativity of the last token or literal in the +body of the rule. If the +.BR %prec +keyword is used, it overrides this default. Some grammar rules might +not have both precedence and associativity. +.IP " 2." 4 +If there is a shift/reduce conflict, and both the grammar rule and the +input symbol have precedence and associativity associated with them, +then the conflict is resolved in favor of the action (shift or reduce) +associated with the higher precedence. If the precedences are the +same, then the associativity is used; left associative implies reduce, +right associative implies shift, and non-associative implies an error +in the string being parsed. +.IP " 3." 4 +When there is a shift/reduce conflict that cannot be resolved by rule +2, the shift is done. Conflicts resolved this way are counted in the +diagnostic output described in +.IR "Error Handling". +.IP " 4." 4 +When there is a reduce/reduce conflict, a reduction is done by the +grammar rule that occurs earlier in the input sequence. Conflicts +resolved this way are counted in the diagnostic output described in +.IR "Error Handling". +.P +Conflicts resolved by precedence or associativity shall not be counted +in the shift/reduce and reduce/reduce conflicts reported by +.IR yacc +on either standard error or in the description file. +.SS "Error Handling" +.P +The token +.BR error +shall be reserved for error handling. The name +.BR error +can be used in grammar rules. It indicates places where the parser can +recover from a syntax error. The default value of +.BR error +shall be 256. Its value can be changed using a +.BR %token +declaration. The lexical analyzer should not return the value of +.BR error . +.P +The parser shall detect a syntax error when it is in a state where the +action associated with the lookahead symbol is +.BR error . +A semantic action can cause the parser to initiate error handling by +executing the macro YYERROR. When YYERROR is executed, the semantic +action passes control back to the parser. YYERROR cannot be used +outside of semantic actions. +.P +When the parser detects a syntax error, it normally calls +\fIyyerror\fR() +with the character string +.BR \(dqsyntax\ error\(dq +as its argument. The call shall not be made if the parser is still +recovering from a previous error when the error is detected. The +parser is considered to be recovering from a previous error until the +parser has shifted over at least three normal input symbols since the +last error was detected or a semantic action has executed the macro +.IR yyerrok . +The parser shall not call +\fIyyerror\fR() +when YYERROR is executed. +.P +The macro function YYRECOVERING shall return 1 if a syntax error +has been detected and the parser has not yet fully recovered from it. +Otherwise, zero shall be returned. +.P +When a syntax error is detected by the parser, the parser shall check +if a previous syntax error has been detected. If a previous error was +detected, and if no normal input symbols have been shifted since the +preceding error was detected, the parser checks if the lookahead symbol +is an endmarker (see +.IR "Interface to the Lexical Analyzer"). +If it is, the parser shall return with a non-zero value. Otherwise, +the lookahead symbol shall be discarded and normal parsing shall +resume. +.P +When YYERROR is executed or when the parser detects a syntax error and +no previous error has been detected, or at least one normal input +symbol has been shifted since the previous error was detected, the +parser shall pop back one state at a time until the parse stack is +empty or the current state allows a shift over +.BR error . +If the parser empties the parse stack, it shall return with a non-zero +value. Otherwise, it shall shift over +.BR error +and then resume normal parsing. If the parser reads a lookahead symbol +before the error was detected, that symbol shall still be the lookahead +symbol when parsing is resumed. +.P +The macro +.IR yyerrok +in a semantic action shall cause the parser to act as if it has fully +recovered from any previous errors. The macro +.IR yyclearin +shall cause the parser to discard the current lookahead token. If the +current lookahead token has not yet been read, +.IR yyclearin +shall have no effect. +.P +The macro YYACCEPT shall cause the parser to return with the value +zero. The macro YYABORT shall cause the parser to return with a +non-zero value. +.SS "Interface to the Lexical Analyzer" +.P +The +\fIyylex\fR() +function is an integer-valued function that returns a +.IR "token number" +representing the kind of token read. If there is a value associated +with the token returned by +\fIyylex\fR() +(see the discussion of +.IR tag +above), it shall be assigned to the external variable +.IR yylval . +.P +If the parser and +\fIyylex\fR() +do not agree on these token numbers, reliable communication between +them cannot occur. For (single-byte character) literals, the token is +simply the numeric value of the character in the current character set. +The numbers for other tokens can either be chosen by +.IR yacc , +or chosen by the user. In either case, the +.BR #define +construct of C is used to allow +\fIyylex\fR() +to return these numbers symbolically. The +.BR #define +statements are put into the code file, and the header file if that file +is requested. The set of characters permitted by +.IR yacc +in an identifier is larger than that permitted by C. Token names found +to contain such characters shall not be included in the +.BR #define +declarations. +.P +If the token numbers are chosen by +.IR yacc , +the tokens other than literals shall be assigned numbers greater than +256, although no order is implied. A token can be explicitly assigned a +number by following its first appearance in the declarations section +with a number. Names and literals not defined this way retain their +default definition. All token numbers assigned by +.IR yacc +shall be unique and distinct from the token numbers used for literals +and user-assigned tokens. If duplicate token numbers cause conflicts in +parser generation, +.IR yacc +shall report an error; otherwise, it is unspecified whether the token +assignment is accepted or an error is reported. +.P +The end of the input is marked by a special token called the +.IR endmarker , +which has a token number that is zero or negative. (These values are +invalid for any other token.) All lexical analyzers shall return zero +or negative as a token number upon reaching the end of their input. If +the tokens up to, but excluding, the endmarker form a structure that +matches the start symbol, the parser shall accept the input. If the +endmarker is seen in any other context, it shall be considered an +error. +.SS "Completing the Program" +.P +In addition to +\fIyyparse\fR() +and +\fIyylex\fR(), +the functions +\fIyyerror\fR() +and +\fImain\fR() +are required to make a complete program. The application can supply +\fImain\fR() +and +\fIyyerror\fR(), +or those routines can be obtained from the +.IR yacc +library. +.SS "Yacc Library" +.P +The following functions shall appear only in the +.IR yacc +library accessible through the +.BR "\-l\ y" +operand to +.IR c99 ; +they can therefore be redefined by a conforming application: +.IP "\fBint\ \fImain\fR(\fBvoid\fR)" 6 +.br +This function shall call +\fIyyparse\fR() +and exit with an unspecified value. Other actions within this function +are unspecified. +.IP "\fBint\ \fIyyerror\fR(\fBconst\ char\fR\ *\fIs\fR)" 6 +.br +This function shall write the NUL-terminated argument to standard +error, followed by a +<newline>. +.P +The order of the +.BR "\-l\ y" +and +.BR "\-l\ l" +operands given to +.IR c99 +is significant; the application shall either provide its own +\fImain\fR() +function or ensure that +.BR "\-l\ y" +precedes +.BR "\-l\ l" . +.SS "Debugging the Parser" +.P +The parser generated by +.IR yacc +shall have diagnostic facilities in it that can be optionally enabled +at either compile time or at runtime (if enabled at compile time). +The compilation of the runtime debugging code is under the control of +YYDEBUG, a preprocessor symbol. If YYDEBUG has a non-zero value, the +debugging code shall be included. If its value is zero, the code shall +not be included. +.P +In parsers where the debugging code has been included, the external +.BR int +.IR yydebug +can be used to turn debugging on (with a non-zero value) and off (zero +value) at runtime. The initial value of +.IR yydebug +shall be zero. +.P +When +.BR \-t +is specified, the code file shall be built such that, if YYDEBUG is not +already defined at compilation time (using the +.IR c99 +.BR \-D +YYDEBUG option, for example), YYDEBUG shall be set explicitly to 1. +When +.BR \-t +is not specified, the code file shall be built such that, if YYDEBUG is +not already defined, it shall be set explicitly to zero. +.P +The format of the debugging output is unspecified but includes at least +enough information to determine the shift and reduce actions, and the +input symbols. It also provides information about error recovery. +.SS "Algorithms" +.P +The parser constructed by +.IR yacc +implements an LALR(1) parsing algorithm as documented in the +literature. It is unspecified whether the parser is table-driven or +direct-coded. +.P +A parser generated by +.IR yacc +shall never request an input symbol from +\fIyylex\fR() +while in a state where the only actions other than the error action are +reductions by a single rule. +.P +The literature of parsing theory defines these concepts. +.SS "Limits" +.P +The +.IR yacc +utility may have several internal tables. The minimum maximums for +these tables are shown in the following table. The exact meaning of +these values is implementation-defined. The implementation shall +define the relationship between these values and between them and any +error messages that the implementation may generate should it run out +of space for any internal structure. An implementation may combine +groups of these resources into a single pool as long as the total +available to the user does not fall below the sum of the sizes +specified by this section. +.br +.sp +.ce 1 +\fBTable: Internal Limits in \fIyacc\fP\fR +.ad l +.TS +center box tab(@); +cB | cB | cB +cB | cB | cB +l | n | lw(3i). +@Minimum +Limit@Maximum@Description +_ +{NTERMS}@126@Number of tokens. +{NNONTERM}@200@Number of non-terminals. +{NPROD}@300@Number of rules. +{NSTATES}@600@Number of states. +{MEMSIZE}@5\|200@T{ +Length of rules. The total length, in names (tokens and +non-terminals), of all the rules of the grammar. The left-hand side is +counted for each rule, even if it is not explicitly repeated, as +specified in +.IR "Grammar Rules in yacc". +T} +{ACTSIZE}@4\|000@T{ +Number of actions. ``Actions'' here (and in the description file) +refer to parser actions (shift, reduce, and so on) not to semantic +actions defined in +.IR "Grammar Rules in yacc". +T} +.TE +.ad b +.SH "EXIT STATUS" +The following exit values shall be returned: +.IP "\00" 6 +Successful completion. +.IP >0 6 +An error occurred. +.SH "CONSEQUENCES OF ERRORS" +If any errors are encountered, the run is aborted and +.IR yacc +exits with a non-zero status. Partial code files and header files +may be produced. The summary information in the description file +shall always be produced if the +.BR \-v +flag is present. +.LP +.IR "The following sections are informative." +.SH "APPLICATION USAGE" +Historical implementations experience name conflicts on the names +.BR yacc.tmp , +.BR yacc.acts , +.BR yacc.debug , +.BR y.tab.c , +.BR y.tab.h , +and +.BR y.output +if more than one copy of +.IR yacc +is running in a single directory at one time. The +.BR \-b +option was added to overcome this problem. The related problem of +allowing multiple +.IR yacc +parsers to be placed in the same file was addressed by adding a +.BR \-p +option to override the previously hard-coded +.BR yy +variable prefix. +.P +The description of the +.BR \-p +option specifies the minimal set of function and variable names that +cause conflict when multiple parsers are linked together. YYSTYPE does +not need to be changed. Instead, the programmer can use +.BR \-b +to give the header files for different parsers different names, and +then the file with the +\fIyylex\fR() +for a given parser can include the header for that parser. Names such +as +.IR yyclearerr +do not need to be changed because they are used only in the actions; +they do not have linkage. It is possible that an implementation has +other names, either internal ones for implementing things such as +.IR yyclearerr , +or providing non-standard features that it wants to change with +.BR \-p . +.P +Unary operators that are the same token as a binary operator in general +need their precedence adjusted. This is handled by the +.BR %prec +advisory symbol associated with the particular grammar rule defining +that unary operator. (See +.IR "Grammar Rules in yacc".) +Applications are not required to use this operator for unary operators, +but the grammars that do not require it are rare. +.SH EXAMPLES +Access to the +.IR yacc +library is obtained with library search operands to +.IR c99 . +To use the +.IR yacc +library +\fImain\fR(): +.sp +.RS 4 +.nf + +c99 y.tab.c -l y +.fi +.P +.RE +.P +Both the +.IR lex +library and the +.IR yacc +library contain +\fImain\fR(). +To access the +.IR yacc +\fImain\fR(): +.sp +.RS 4 +.nf + +c99 y.tab.c lex.yy.c -l y -l l +.fi +.P +.RE +.P +This ensures that the +.IR yacc +library is searched first, so that its +\fImain\fR() +is used. +.P +The historical +.IR yacc +libraries have contained two simple functions that are normally coded +by the application programmer. These functions are similar to the +following code: +.sp +.RS 4 +.nf + +#include <locale.h> +int main(void) +{ + extern int yyparse(); +.P + setlocale(LC_ALL, ""); +.P + /* If the following parser is one created by lex, the + application must be careful to ensure that LC_CTYPE + and LC_COLLATE are set to the POSIX locale. */ + (void) yyparse(); + return (0); +} +.P +#include <stdio.h> +.P +int yyerror(const char *msg) +{ + (void) fprintf(stderr, "%s\en", msg); + return (0); +} +.fi +.P +.RE +.SH RATIONALE +The references in +.BR "Referenced Documents" +may be helpful in constructing the parser generator. The referenced DeRemer and Pennello article (along +with the works it references) describes a technique to generate parsers +that conform to this volume of POSIX.1\(hy2017. Work in this area continues to be done, so +implementors should consult current literature before doing any new +implementations. The original Knuth article is the theoretical basis for this +kind of parser, but the tables it generates are impractically large for +reasonable grammars and should not be used. The ``equivalent to'' +wording is intentional to assure that the best tables that are LALR(1) +can be generated. +.P +There has been confusion between the class of grammars, the algorithms +needed to generate parsers, and the algorithms needed to parse the +languages. They are all reasonably orthogonal. In particular, a parser +generator that accepts the full range of LR(1) grammars need not +generate a table any more complex than one that accepts SLR(1) (a +relatively weak class of LR grammars) for a grammar that happens to be +SLR(1). Such an implementation need not recognize the case, either; +table compression can yield the SLR(1) table (or one even smaller than +that) without recognizing that the grammar is SLR(1). +The speed of an LR(1) parser for any class is dependent more upon the +table representation and compression (or the code generation if a +direct parser is generated) than upon the class of grammar that the +table generator handles. +.P +The speed of the parser generator is somewhat dependent upon the class +of grammar it handles. However, the original Knuth article algorithms for +constructing LR parsers were judged by its author to be impractically +slow at that time. Although full LR is more complex than LALR(1), as +computer speeds and algorithms improve, the difference (in terms of +acceptable wall-clock execution time) is becoming less significant. +.P +Potential authors are cautioned that the referenced DeRemer and Pennello article previously cited +identifies a bug (an over-simplification of the computation of LALR(1) +lookahead sets) in some of the LALR(1) algorithm statements that +preceded it to publication. They should take the time to seek out that +paper, as well as current relevant work, particularly Aho's. +.P +The +.BR \-b +option was added to provide a portable method for permitting +.IR yacc +to work on multiple separate parsers in the same directory. If a +directory contains more than one +.IR yacc +grammar, and both grammars are constructed at the same time (by, for +example, a parallel +.IR make +program), conflict results. While the solution is not historical +practice, it corrects a known deficiency in historical implementations. +Corresponding changes were made to all sections that referenced the +filenames +.BR y.tab.c +(now ``the code file''), +.BR y.tab.h +(now ``the header file''), and +.BR y.output +(now ``the description file''). +.P +The grammar for +.IR yacc +input is based on System V documentation. The textual description shows +there that the +.BR ';' +is required at the end of the rule. The grammar and the implementation +do not require this. (The use of +.BR C_IDENTIFIER +causes a reduce to occur in the right place.) +.P +Also, in that implementation, the constructs such as +.BR %token +can be terminated by a +<semicolon>, +but this is not permitted by the grammar. The keywords such as +.BR %token +can also appear in uppercase, which is again not discussed. In most +places where +.BR '%' +is used, +<backslash> +can be substituted, and there are alternate spellings for some of the +symbols (for example, +.BR %LEFT +can be +.BR \(dq%<\(dq +or even +.BR \(dq\e<\(dq ). +.P +Historically, <\fItag\fP> can contain any characters except +.BR '>' , +including white space, in the implementation. However, since the +.IR tag +must reference an ISO\ C standard union member, in practice conforming +implementations need to support only the set of characters for ISO\ C standard +identifiers in this context. +.P +Some historical implementations are known to accept actions that are +terminated by a period. Historical implementations often allow +.BR '$' +in names. A conforming implementation does not need to support either +of these behaviors. +.P +Deciding when to use +.BR %prec +illustrates the difficulty in specifying the behavior of +.IR yacc . +There may be situations in which the +.IR grammar +is not, strictly speaking, in error, and yet +.IR yacc +cannot interpret it unambiguously. The resolution of ambiguities in the +grammar can in many instances be resolved by providing additional +information, such as using +.BR %type +or +.BR %union +declarations. It is often easier and it usually yields a smaller parser +to take this alternative when it is appropriate. +.P +The size and execution time of a program produced without the runtime +debugging code is usually smaller and slightly faster in historical +implementations. +.P +Statistics messages from several historical implementations include the +following types of information: +.sp +.RS 4 +.nf + +\fIn\fR/512 terminals, \fIn\fR/300 non-terminals +\fIn\fR/600 grammar rules, \fIn\fR/1\|500 states +\fIn\fR shift/reduce, \fIn\fR reduce/reduce conflicts reported +\fIn\fR/350 working sets used +Memory: states, etc. \fIn\fR/15\|000, parser \fIn\fR/15\|000 +\fIn\fR/600 distinct lookahead sets +\fIn\fR extra closures +\fIn\fR shift entries, \fIn\fR exceptions +\fIn\fR goto entries +\fIn\fR entries saved by goto default +Optimizer space used: input \fIn\fR/15\|000, output \fIn\fR/15\|000 +\fIn\fR table entries, \fIn\fR zero +Maximum spread: \fIn\fR, Maximum offset: \fIn\fR +.fi +.P +.RE +.P +The report of internal tables in the description file is left +implementation-defined because all aspects of these limits are also +implementation-defined. Some implementations may use dynamic +allocation techniques and have no specific limit values to report. +.P +The format of the +.BR y.output +file is not given because specification of the format was not seen to +enhance applications portability. The listing is primarily intended to +help human users understand and debug the parser; use of +.BR y.output +by a conforming application script would be unusual. Furthermore, +implementations have not produced consistent output and no popular +format was apparent. The format selected by the implementation should +be human-readable, in addition to the requirement that it be a text +file. +.P +Standard error reports are not specifically described because they are +seldom of use to conforming applications and there was no reason to +restrict implementations. +.P +Some implementations recognize +.BR \(dq={\(dq +as equivalent to +.BR '{' +because it appears in historical documentation. This construction was +recognized and documented as obsolete as long ago as 1978, in the +referenced \fIYacc: Yet Another Compiler-Compiler\fP. This volume of POSIX.1\(hy2017 chose to leave it as obsolete and omit it. +.P +Multi-byte characters should be recognized by the lexical analyzer and +returned as tokens. They should not be returned as multi-byte +character literals. The token +.BR error +that is used for error recovery is normally assigned the value 256 in +the historical implementation. Thus, the token value 256, which is used +in many multi-byte character sets, is not available for use as the +value of a user-defined token. +.SH "FUTURE DIRECTIONS" +None. +.SH "SEE ALSO" +.IR "\fIc99\fR\^", +.IR "\fIlex\fR\^" +.P +The Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 8" ", " "Environment Variables", +.IR "Section 12.2" ", " "Utility Syntax Guidelines" +.\" +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1-2017, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 7, 2018 Edition, +Copyright (C) 2018 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. +In the event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . +.PP +Any typographical or formatting errors that appear +in this page are most likely +to have been introduced during the conversion of the source files to +man page format. To report such errors, see +https://www.kernel.org/doc/man-pages/reporting_bugs.html . |