diff options
Diffstat (limited to 'src/VBox/Devices/EFI/Firmware/BaseTools/Source/C/VfrCompile/Pccts/CHANGES_FROM_133_BEFORE_MR13.txt')
-rw-r--r-- | src/VBox/Devices/EFI/Firmware/BaseTools/Source/C/VfrCompile/Pccts/CHANGES_FROM_133_BEFORE_MR13.txt | 3666 |
1 files changed, 3666 insertions, 0 deletions
diff --git a/src/VBox/Devices/EFI/Firmware/BaseTools/Source/C/VfrCompile/Pccts/CHANGES_FROM_133_BEFORE_MR13.txt b/src/VBox/Devices/EFI/Firmware/BaseTools/Source/C/VfrCompile/Pccts/CHANGES_FROM_133_BEFORE_MR13.txt new file mode 100644 index 00000000..56bced8c --- /dev/null +++ b/src/VBox/Devices/EFI/Firmware/BaseTools/Source/C/VfrCompile/Pccts/CHANGES_FROM_133_BEFORE_MR13.txt @@ -0,0 +1,3666 @@ + + ------------------------------------------------------------ + This is the second part of a two part file. + This is a list of changes to pccts 1.33 prior to MR13 + For more recent information see CHANGES_FROM_133.txt + ------------------------------------------------------------ + + DISCLAIMER + + The software and these notes are provided "as is". They may include + typographical or technical errors and their authors disclaims all + liability of any kind or nature for damages due to error, fault, + defect, or deficiency regardless of cause. All warranties of any + kind, either express or implied, including, but not limited to, the + implied warranties of merchantability and fitness for a particular + purpose are disclaimed. + + +#153. (Changed in MR12b) Bug in computation of -mrhoist suppression set + + Consider the following grammar with k=1 and "-mrhoist on": + + r1 : (A)? => ((p>>? x /* l1 */ + | r2 /* l2 */ + ; + r2 : A /* l4 */ + | (B)? => <<q>>? y /* l5 */ + ; + + In earlier versions the mrhoist routine would see that both l1 and + l2 contained predicates and would assume that this prevented either + from acting to suppress the other predicate. In the example above + it didn't realize the A at line l4 is capable of suppressing the + predicate at l1 even though alt l2 contains (indirectly) a predicate. + + This is fixed in MR12b. + + Reported by Reinier van den Born (reinier@vnet.ibm.com) + +#153. (Changed in MR12a) Bug in computation of -mrhoist suppression set + + An oversight similar to that described in Item #152 appeared in + the computation of the set that "covered" a predicate. If a + predicate expression included a term such as p=AND(q,r) the context + of p was taken to be context(q) & context(r), when it should have + been context(q) | context(r). This is fixed in MR12a. + +#152. (Changed in MR12) Bug in generation of predicate expressions + + The primary purpose for MR12 is to make quite clear that MR11 is + obsolete and to fix the bug related to predicate expressions. + + In MR10 code was added to optimize the code generated for + predicate expression tests. Unfortunately, there was a + significant oversight in the code which resulted in a bug in + the generation of code for predicate expression tests which + contained predicates combined using AND: + + r0 : (r1)* "@" ; + r1 : (AAA)? => <<p LATEXT(1)>>? r2 ; + r2 : (BBB)? => <<q LATEXT(1)>>? Q + | (BBB)? => <<r LATEXT(1)>>? Q + ; + + In MR11 (and MR10 when using "-mrhoist on") the code generated + for r0 to predict r1 would be equivalent to: + + if ( LA(1)==Q && + (LA(1)==AAA && LA(1)==BBB) && + ( p && ( q || r )) ) { + + This is incorrect because it expresses the idea that LA(1) + *must* be AAA in order to attempt r1, and *must* be BBB to + attempt r2. The result was that r1 became unreachable since + both condition can not be simultaneously true. + + The general philosophy of code generation for predicates + can be summarized as follows: + + a. If the context is true don't enter an alt + for which the corresponding predicate is false. + + If the context is false then it is okay to enter + the alt without evaluating the predicate at all. + + b. A predicate created by ORing of predicates has + context which is the OR of their individual contexts. + + c. A predicate created by ANDing of predicates has + (surprise) context which is the OR of their individual + contexts. + + d. Apply these rules recursively. + + e. Remember rule (a) + + The correct code should express the idea that *if* LA(1) is + AAA then p must be true to attempt r1, but if LA(1) is *not* + AAA then it is okay to attempt r1, provided that *if* LA(1) is + BBB then one of q or r must be true. + + if ( LA(1)==Q && + ( !(LA(1)==AAA || LA(1)==BBB) || + ( ! LA(1) == AAA || p) && + ( ! LA(1) == BBB || q || r ) ) ) { + + I believe this is fixed in MR12. + + Reported by Reinier van den Born (reinier@vnet.ibm.com) + +#151a. (Changed in MR12) ANTLRParser::getLexer() + + As a result of several requests, I have added public methods to + get a pointer to the lexer belonging to a parser. + + ANTLRTokenStream *ANTLRParser::getLexer() const + + Returns a pointer to the lexer being used by the + parser. ANTLRTokenStream is the base class of + DLGLexer + + ANTLRTokenStream *ANTLRTokenBuffer::getLexer() const + + Returns a pointer to the lexer being used by the + ANTLRTokenBuffer. ANTLRTokenStream is the base + class of DLGLexer + + You must manually cast the ANTLRTokenStream to your program's + lexer class. Because the name of the lexer's class is not fixed. + Thus it is impossible to incorporate it into the DLGLexerBase + class. + +#151b.(Changed in MR12) ParserBlackBox member getLexer() + + The template class ParserBlackBox now has a member getLexer() + which returns a pointer to the lexer. + +#150. (Changed in MR12) syntaxErrCount and lexErrCount now public + + See Item #127 for more information. + +#149. (Changed in MR12) antlr option -info o (letter o for orphan) + + If there is more than one rule which is not referenced by any + other rule then all such rules are listed. This is useful for + alerting one to rules which are not used, but which can still + contribute to ambiguity. For example: + + start : a Z ; + unused: a A ; + a : (A)+ ; + + will cause an ambiguity report for rule "a" which will be + difficult to understand if the user forgets about rule "unused" + simply because it is not used in the grammar. + +#148. (Changed in MR11) #token names appearing in zztokens,token_tbl + + In a #token statement like the following: + + #token Plus "\+" + + the string "Plus" appears in the zztokens array (C mode) and + token_tbl (C++ mode). This string is used in most error + messages. In MR11 one has the option of using some other string, + (e.g. "+") in those tables. + + In MR11 one can write: + + #token Plus ("+") "\+" + #token RP ("(") "\(" + #token COM ("comment begin") "/\*" + + A #token statement is allowed to appear in more than one #lexclass + with different regular expressions. However, the token name appears + only once in the zztokens/token_tbl array. This means that only + one substitute can be specified for a given #token name. The second + attempt to define a substitute name (different from the first) will + result in an error message. + +#147. (Changed in MR11) Bug in follow set computation + + There is a bug in 1.33 vanilla and all maintenance releases + prior to MR11 in the computation of the follow set. The bug is + different than that described in Item #82 and probably more + common. It was discovered in the ansi.g grammar while testing + the "ambiguity aid" (Item #119). The search for a bug started + when the ambiguity aid was unable to discover the actual source + of an ambiguity reported by antlr. + + The problem appears when an optimization of the follow set + computation is used inappropriately. The result is that the + follow set used is the "worst case". In other words, the error + can lead to false reports of ambiguity. The good news is that + if you have a grammar in which you have addressed all reported + ambiguities you are ok. The bad news is that you may have spent + time fixing ambiguities that were not real, or used k=2 when + ck=2 might have been sufficient, and so on. + + The following grammar demonstrates the problem: + + ------------------------------------------------------------ + expr : ID ; + + start : stmt SEMI ; + + stmt : CASE expr COLON + | expr SEMI + | plain_stmt + ; + + plain_stmt : ID COLON ; + ------------------------------------------------------------ + + When compiled with k=1 and ck=2 it will report: + + warning: alts 2 and 3 of the rule itself ambiguous upon + { IDENTIFIER }, { COLON } + + When antlr analyzes "stmt" it computes the first[1] set of all + alternatives. It finds an ambiguity between alts 2 and 3 for ID. + It then computes the first[2] set for alternatives 2 and 3 to resolve + the ambiguity. In computing the first[2] set of "expr" (which is + only one token long) it needs to determine what could follow "expr". + Under a certain combination of circumstances antlr forgets that it + is trying to analyze "stmt" which can only be followed by SEMI and + adds to the first[2] set of "expr" the "global" follow set (including + "COLON") which could follow "expr" (under other conditions) in the + phrase "CASE expr COLON". + +#146. (Changed in MR11) Option -treport for locating "difficult" alts + + It can be difficult to determine which alternatives are causing + pccts to work hard to resolve an ambiguity. In some cases the + ambiguity is successfully resolved after much CPU time so there + is no message at all. + + A rough measure of the amount of work being peformed which is + independent of the CPU speed and system load is the number of + tnodes created. Using "-info t" gives information about the + total number of tnodes created and the peak number of tnodes. + + Tree Nodes: peak 1300k created 1416k lost 0 + + It also puts in the generated C or C++ file the number of tnodes + created for a rule (at the end of the rule). However this + information is not sufficient to locate the alternatives within + a rule which are causing the creation of tnodes. + + Using: + + antlr -treport 100000 .... + + causes antlr to list on stdout any alternatives which require the + creation of more than 100,000 tnodes, along with the lookahead sets + for those alternatives. + + The following is a trivial case from the ansi.g grammar which shows + the format of the report. This report might be of more interest + in cases where 1,000,000 tuples were created to resolve the ambiguity. + + ------------------------------------------------------------------------- + There were 0 tuples whose ambiguity could not be resolved + by full lookahead + There were 157 tnodes created to resolve ambiguity between: + + Choice 1: statement/2 line 475 file ansi.g + Choice 2: statement/3 line 476 file ansi.g + + Intersection of lookahead[1] sets: + + IDENTIFIER + + Intersection of lookahead[2] sets: + + LPARENTHESIS COLON AMPERSAND MINUS + STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT + NOT SIZEOF OCTALINT DECIMALINT + HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER + STRING CHARACTER + ------------------------------------------------------------------------- + +#145. (Documentation) Generation of Expression Trees + + Item #99 was misleading because it implied that the optimization + for tree expressions was available only for trees created by + predicate expressions and neglected to mention that it required + the use of "-mrhoist on". The optimization applies to tree + expressions created for grammars with k>1 and for predicates with + lookahead depth >1. + + In MR11 the optimized version is always used so the -mrhoist on + option need not be specified. + +#144. (Changed in MR11) Incorrect test for exception group + + In testing for a rule's exception group the label a pointer + is compared against '\0'. The intention is "*pointer". + + Reported by Jeffrey C. Fried (Jeff@Fried.net). + +#143. (Changed in MR11) Optional ";" at end of #token statement + + Fixes problem of: + + #token X "x" + + << + parser action + >> + + Being confused with: + + #token X "x" <<lexical action>> + +#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream + + Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class + BufFileInput derived from DLGInputStream which provides a + function lookahead(char *string) to test characters in the + input stream more than one character ahead. + + The default amount of lookahead is specified by the constructor + and defaults to 8 characters. This does *not* include the one + character of lookahead maintained internally by DLG in member "ch" + and which is not available for testing via BufFileInput::lookahead(). + + This is a useful class for overcoming the one-character-lookahead + limitation of DLG without resorting to a lexer capable of + backtracking (like flex) which is not integrated with antlr as is + DLG. + + There are no restrictions on copying or using BufFileInput.* except + that the authorship and related information must be retained in the + source code. + + The class is located in pccts/h/BufFileInput.* of the kit. + +#141. (Changed in MR11) ZZDEBUG_CONSUME for ANTLRParser::consume() + + A debug aid has been added to file ANTLRParser::consume() in + file AParser.cpp: + + #ifdef ZZDEBUG_CONSUME_ACTION + zzdebug_consume_action(); + #endif + + Suggested by Sramji Ramanathan (ps@kumaran.com). + +#140. (Changed in MR11) #pred to define predicates + + +---------------------------------------------------+ + | Note: Assume "-prc on" for this entire discussion | + +---------------------------------------------------+ + + A problem with predicates is that each one is regarded as + unique and capable of disambiguating cases where two + alternatives have identical lookahead. For example: + + rule : <<pred(LATEXT(1))>>? A + | <<pred(LATEXT(1))>>? A + ; + + will not cause any error messages or warnings to be issued + by earlier versions of pccts. To compare the text of the + predicates is an incomplete solution. + + In 1.33MR11 I am introducing the #pred statement in order to + solve some problems with predicates. The #pred statement allows + one to give a symbolic name to a "predicate literal" or a + "predicate expression" in order to refer to it in other predicate + expressions or in the rules of the grammar. + + The predicate literal associated with a predicate symbol is C + or C++ code which can be used to test the condition. A + predicate expression defines a predicate symbol in terms of other + predicate symbols using "!", "&&", and "||". A predicate symbol + can be defined in terms of a predicate literal, a predicate + expression, or *both*. + + When a predicate symbol is defined with both a predicate literal + and a predicate expression, the predicate literal is used to generate + code, but the predicate expression is used to check for two + alternatives with identical predicates in both alternatives. + + Here are some examples of #pred statements: + + #pred IsLabel <<isLabel(LATEXT(1))>>? + #pred IsLocalVar <<isLocalVar(LATEXT(1))>>? + #pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>? + #pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar + #pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar + + I hope that the use of EBNF notation to describe the syntax of the + #pred statement will not cause problems for my readers (joke). + + predStatement : "#pred" + CapitalizedName + ( + "<<predicate_literal>>?" + | "<<predicate_literal>>?" predOrExpr + | predOrExpr + ) + ; + + predOrExpr : predAndExpr ( "||" predAndExpr ) * ; + + predAndExpr : predPrimary ( "&&" predPrimary ) * ; + + predPrimary : CapitalizedName + | "!" predPrimary + | "(" predOrExpr ")" + ; + + What is the purpose of this nonsense ? + + To understand how predicate symbols help, you need to realize that + predicate symbols are used in two different ways with two different + goals. + + a. Allow simplification of predicates which have been combined + during predicate hoisting. + + b. Allow recognition of identical predicates which can't disambiguate + alternatives with common lookahead. + + First we will discuss goal (a). Consider the following rule: + + rule0: rule1 + | ID + | ... + ; + + rule1: rule2 + | rule3 + ; + + rule2: <<isX(LATEXT(1))>>? ID ; + rule3: <<!isX(LATEXT(1)>>? ID ; + + When the predicates in rule2 and rule3 are combined by hoisting + to create a prediction expression for rule1 the result is: + + if ( LA(1)==ID + && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ... + + This is inefficient, but more importantly, can lead to false + assumptions that the predicate expression distinguishes the rule1 + alternative with some other alternative with lookahead ID. In + MR11 one can write: + + #pred IsX <<isX(LATEXT(1))>>? + + ... + + rule2: <<IsX>>? ID ; + rule3: <<!IsX>>? ID ; + + During hoisting MR11 recognizes this as a special case and + eliminates the predicates. The result is a prediction + expression like the following: + + if ( LA(1)==ID ) { rule1(); ... + + Please note that the following cases which appear to be equivalent + *cannot* be simplified by MR11 during hoisting because the hoisting + logic only checks for a "!" in the predicate action, not in the + predicate expression for a predicate symbol. + + *Not* equivalent and is not simplified during hoisting: + + #pred IsX <<isX(LATEXT(1))>>? + #pred NotX <<!isX(LATEXT(1))>>? + ... + rule2: <<IsX>>? ID ; + rule3: <<NotX>>? ID ; + + *Not* equivalent and is not simplified during hoisting: + + #pred IsX <<isX(LATEXT(1))>>? + #pred NotX !IsX + ... + rule2: <<IsX>>? ID ; + rule3: <<NotX>>? ID ; + + Now we will discuss goal (b). + + When antlr discovers that there is a lookahead ambiguity between + two alternatives it attempts to resolve the ambiguity by searching + for predicates in both alternatives. In the past any predicate + would do, even if the same one appeared in both alternatives: + + rule: <<p(LATEXT(1))>>? X + | <<p(LATEXT(1))>>? X + ; + + The #pred statement is a start towards solving this problem. + During ambiguity resolution (*not* predicate hoisting) the + predicates for the two alternatives are expanded and compared. + Consider the following example: + + #pred Upper <<isUpper(LATEXT(1))>>? + #pred Lower <<isLower(LATEXT(1))>>? + #pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower + + rule0: rule1 + | <<Alpha>>? ID + ; + + rule1: + | rule2 + | rule3 + ... + ; + + rule2: <<Upper>>? ID; + rule3: <<Lower>>? ID; + + The definition of #pred Alpha expresses: + + a. to test the predicate use the C code "isAlpha(LATEXT(1))" + + b. to analyze the predicate use the information that + Alpha is equivalent to the union of Upper and Lower, + + During ambiguity resolution the definition of Alpha is expanded + into "Upper || Lower" and compared with the predicate in the other + alternative, which is also "Upper || Lower". Because they are + identical MR11 will report a problem. + + ------------------------------------------------------------------------- + t10.g, line 5: warning: the predicates used to disambiguate rule rule0 + (file t10.g alt 1 line 5 and alt 2 line 6) + are identical when compared without context and may have no + resolving power for some lookahead sequences. + ------------------------------------------------------------------------- + + If you use the "-info p" option the output file will contain: + + +----------------------------------------------------------------------+ + |#if 0 | + | | + |The following predicates are identical when compared without | + | lookahead context information. For some ambiguous lookahead | + | sequences they may not have any power to resolve the ambiguity. | + | | + |Choice 1: rule0/1 alt 1 line 5 file t10.g | + | | + | The original predicate for choice 1 with available context | + | information: | + | | + | OR expr | + | | + | pred << Upper>>? | + | depth=k=1 rule rule2 line 14 t10.g | + | set context: | + | ID | + | | + | pred << Lower>>? | + | depth=k=1 rule rule3 line 15 t10.g | + | set context: | + | ID | + | | + | The predicate for choice 1 after expansion (but without context | + | information): | + | | + | OR expr | + | | + | pred << isUpper(LATEXT(1))>>? | + | depth=k=1 rule line 1 t10.g | + | | + | pred << isLower(LATEXT(1))>>? | + | depth=k=1 rule line 2 t10.g | + | | + | | + |Choice 2: rule0/2 alt 2 line 6 file t10.g | + | | + | The original predicate for choice 2 with available context | + | information: | + | | + | pred << Alpha>>? | + | depth=k=1 rule rule0 line 6 t10.g | + | set context: | + | ID | + | | + | The predicate for choice 2 after expansion (but without context | + | information): | + | | + | OR expr | + | | + | pred << isUpper(LATEXT(1))>>? | + | depth=k=1 rule line 1 t10.g | + | | + | pred << isLower(LATEXT(1))>>? | + | depth=k=1 rule line 2 t10.g | + | | + | | + |#endif | + +----------------------------------------------------------------------+ + + The comparison of the predicates for the two alternatives takes + place without context information, which means that in some cases + the predicates will be considered identical even though they operate + on disjoint lookahead sets. Consider: + + #pred Alpha + + rule1: <<Alpha>>? ID + | <<Alpha>>? Label + ; + + Because the comparison of predicates takes place without context + these will be considered identical. The reason for comparing + without context is that otherwise it would be necessary to re-evaluate + the entire predicate expression for each possible lookahead sequence. + This would require more code to be written and more CPU time during + grammar analysis, and it is not yet clear whether anyone will even make + use of the new #pred facility. + + A temporary workaround might be to use different #pred statements + for predicates you know have different context. This would avoid + extraneous warnings. + + The above example might be termed a "false positive". Comparison + without context will also lead to "false negatives". Consider the + following example: + + #pred Alpha + #pred Beta + + rule1: <<Alpha>>? A + | rule2 + ; + + rule2: <<Alpha>>? A + | <<Beta>>? B + ; + + The predicate used for alt 2 of rule1 is (Alpha || Beta). This + appears to be different than the predicate Alpha used for alt1. + However, the context of Beta is B. Thus when the lookahead is A + Beta will have no resolving power and Alpha will be used for both + alternatives. Using the same predicate for both alternatives isn't + very helpful, but this will not be detected with 1.33MR11. + + To properly handle this the predicate expression would have to be + evaluated for each distinct lookahead context. + + To determine whether two predicate expressions are identical is + difficult. The routine may fail to identify identical predicates. + + The #pred feature also compares predicates to see if a choice between + alternatives which is resolved by a predicate which makes the second + choice unreachable. Consider the following example: + + #pred A <<A(LATEXT(1)>>? + #pred B <<B(LATEXT(1)>>? + #pred A_or_B A || B + + r : s + | t + ; + s : <<A_or_B>>? ID + ; + t : <<A>>? ID + ; + + ---------------------------------------------------------------------------- + t11.g, line 5: warning: the predicate used to disambiguate the + first choice of rule r + (file t11.g alt 1 line 5 and alt 2 line 6) + appears to "cover" the second predicate when compared without context. + The second predicate may have no resolving power for some lookahead + sequences. + ---------------------------------------------------------------------------- + +#139. (Changed in MR11) Problem with -gp in C++ mode + + The -gp option to add a prefix to rule names did not work in + C++ mode. This has been fixed. + + Reported by Alexey Demakov (demakov@kazbek.ispras.ru). + +#138. (Changed in MR11) Additional makefiles for non-MSVC++ MS systems + + Sramji Ramanathan (ps@kumaran.com) has supplied makefiles for + building antlr and dlg with Win95/NT development tools that + are not based on MSVC5. They are pccts/antlr/AntlrMS.mak and + pccts/dlg/DlgMS.mak. + + The first line of the makefiles require a definition of PCCTS_HOME. + + These are in addition to the AntlrMSVC50.* and DlgMSVC50.* + supplied by Jeff Vincent (JVincent@novell.com). + +#137. (Changed in MR11) Token getType(), getText(), getLine() const members + + -------------------------------------------------------------------- + If you use ANTLRCommonToken this change probably does not affect you. + -------------------------------------------------------------------- + + For a long time it has bothered me that these accessor functions + in ANTLRAbstractToken were not const member functions. I have + refrained from changing them because it require users to modify + existing token class definitions which are derived directly + from ANTLRAbstractToken. I think it is now time. + + For those who are not used to C++, a "const member function" is a + member function which does not modify its own object - the thing + to which "this" points. This is quite different from a function + which does not modify its arguments + + Most token definitions based on ANTLRAbstractToken have something like + the following in order to create concrete definitions of the pure + virtual methods in ANTLRAbstractToken: + + class MyToken : public ANTLRAbstractToken { + ... + ANTLRTokenType getType() {return _type; } + int getLine() {return _line; } + ANTLRChar * getText() {return _text; } + ... + } + + The required change is simply to put "const" following the function + prototype in the header (.h file) and the definition file (.cpp if + it is not inline): + + class MyToken : public ANTLRAbstractToken { + ... + ANTLRTokenType getType() const {return _type; } + int getLine() const {return _line; } + ANTLRChar * getText() const {return _text; } + ... + } + + This was originally proposed a long time ago by Bruce + Guenter (bruceg@qcc.sk.ca). + +#136. (Changed in MR11) Added getLength() to ANTLRCommonToken + + Classes ANTLRCommonToken and ANTLRCommonTokenNoRefCountToken + now have a member function: + + int getLength() const { return strlen(getText()) } + + Suggested by Sramji Ramanathan (ps@kumaran.com). + +#135. (Changed in MR11) Raised antlr's own default ZZLEXBUFSIZE to 8k + +#134a. (ansi_mr10.zip) T.J. Parr's ANSI C grammar made 1.33MR11 compatible + + There is a typographical error in the definition of BITWISEOREQ: + + #token BITWISEOREQ "!=" should be "\|=" + + When this change is combined with the bugfix to the follow set cache + problem (Item #147) and a minor rearrangement of the grammar + (Item #134b) it becomes a k=1 ck=2 grammar. + +#134b. (ansi_mr10.zip) T.J. Parr's ANSI C grammar made 1.33MR11 compatible + + The following changes were made in the ansi.g grammar (along with + using -mrhoist on): + + ansi.g + ====== + void tracein(char *) ====> void tracein(const char *) + void traceout(char *) ====> void traceout(const char *) + + <LT(1)->getType()==IDENTIFIER ? isTypeName(LT(1)->getText()) : 1>>? + ====> <<isTypeName(LT(1)->getText())>>? + + <<(LT(1)->getType()==LPARENTHESIS && LT(2)->getType()==IDENTIFIER) ? \ + isTypeName(LT(2)->getText()) : 1>>? + ====> (LPARENTHESIS IDENTIFIER)? => <<isTypeName(LT(2)->getText())>>? + + <<(LT(1)->getType()==LPARENTHESIS && LT(2)->getType()==IDENTIFIER) ? \ + isTypeName(LT(2)->getText()) : 1>>? + ====> (LPARENTHESIS IDENTIFIER)? => <<isTypeName(LT(2)->getText())>>? + + added to init(): traceOptionValueDefault=0; + added to init(): traceOption(-1); + + change rule "statement": + + statement + : plain_label_statement + | case_label_statement + | <<;>> expression SEMICOLON + | compound_statement + | selection_statement + | iteration_statement + | jump_statement + | SEMICOLON + ; + + plain_label_statement + : IDENTIFIER COLON statement + ; + + case_label_statement + : CASE constant_expression COLON statement + | DEFAULT COLON statement + ; + + support.cpp + =========== + void tracein(char *) ====> void tracein(const char *) + void traceout(char *) ====> void traceout(const char *) + + added to tracein(): ANTLRParser::tracein(r); // call superclass method + added to traceout(): ANTLRParser::traceout(r); // call superclass method + + Makefile + ======== + added to AFLAGS: -mrhoist on -prc on + +#133. (Changed in 1.33MR11) Make trace options public in ANTLRParser + + In checking T.J. Parr's ANSI C grammar for compatibility with + 1.33MR11 discovered that it was inconvenient to have the + trace facilities with protected access. + +#132. (Changed in 1.33MR11) Recognition of identical predicates in alts + + Prior to 1.33MR11, there would be no ambiguity warning when the + very same predicate was used to disambiguate both alternatives: + + test: ref B + | ref C + ; + + ref : <<pred(LATEXT(1)>>? A + + In 1.33MR11 this will cause the warning: + + warning: the predicates used to disambiguate rule test + (file v98.g alt 1 line 1 and alt 2 line 2) + are identical and have no resolving power + + ----------------- Note ----------------- + + This is different than the following case + + test: <<pred(LATEXT(1))>>? A B + | <<pred(LATEXT(1)>>? A C + ; + + In this case there are two distinct predicates + which have exactly the same text. In the first + example there are two references to the same + predicate. The problem represented by this + grammar will be addressed later. + +#131. (Changed in 1.33MR11) Case insensitive command line options + + Command line switches like "-CC" and keywords like "on", "off", + and "stdin" are no longer case sensitive in antlr, dlg, and sorcerer. + +#130. (Changed in 1.33MR11) Changed ANTLR_VERSION to int from string + + The ANTLR_VERSION was not an integer, making it difficult to + perform conditional compilation based on the antlr version. + + Henceforth, ANTLR_VERSION will be: + + (base_version * 10000) + release number + + thus 1.33MR11 will be: 133*100+11 = 13311 + + Suggested by Rainer Janssen (Rainer.Janssen@Informatik.Uni-Oldenburg.DE). + +#129. (Changed in 1.33MR11) Addition of ANTLR_VERSION to <parserName>.h + + The following code is now inserted into <parserName>.h amd + stdpccts.h: + + #ifndef ANTLR_VERSION + #define ANTLR_VERSION 13311 + #endif + + Suggested by Rainer Janssen (Rainer.Janssen@Informatik.Uni-Oldenburg.DE) + +#128. (Changed in 1.33MR11) Redundant predicate code in (<<pred>>? ...)+ + + Prior to 1.33MR11, the following grammar would generate + redundant tests for the "while" condition. + + rule2 : (<<pred>>? X)+ X + | B + ; + + The code would resemble: + + if (LA(1)==X) { + if (pred) { + do { + if (!pred) {zzfailed_pred(" pred");} + zzmatch(X); zzCONSUME; + } while (LA(1)==X && pred && pred); + } else {... + + With 1.33MR11 the redundant predicate test is omitted. + +#127. (Changed in 1.33MR11) + + Count Syntax Errors Count DLG Errors + ------------------- ---------------- + + C++ mode ANTLRParser:: DLGLexerBase:: + syntaxErrCount lexErrCount + C mode zzSyntaxErrCount zzLexErrCount + + The C mode variables are global and initialized to 0. + They are *not* reset to 0 automatically when antlr is + restarted. + + The C++ mode variables are public. They are initialized + to 0 by the constructors. They are *not* reset to 0 by the + ANTLRParser::init() method. + + Suggested by Reinier van den Born (reinier@vnet.ibm.com). + +#126. (Changed in 1.33MR11) Addition of #first <<...>> + + The #first <<...>> inserts the specified text in the output + files before any other #include statements required by pccts. + The only things before the #first text are comments and + a #define ANTLR_VERSION. + + Requested by and Esa Pulkkinen (esap@cs.tut.fi) and Alexin + Zoltan (alexin@inf.u-szeged.hu). + +#125. (Changed in 1.33MR11) Lookahead for (guard)? && <<p>>? predicates + + When implementing the new style of guard predicate (Item #113) + in 1.33MR10 I decided to temporarily ignore the problem of + computing the "narrowest" lookahead context. + + Consider the following k=1 grammar: + + start : a + | b + ; + + a : (A)? && <<pred1(LATEXT(1))>>? ab ; + b : (B)? && <<pred2(LATEXT(1))>>? ab ; + + ab : A | B ; + + In MR10 the context for both "a" and "b" was {A B} because this is + the first set of rule "ab". Normally, this is not a problem because + the predicate which follows the guard inhibits any ambiguity report + by antlr. + + In MR11 the first set for rule "a" is {A} and for rule "b" it is {B}. + +#124. A Note on the New "&&" Style Guarded Predicates + + I've been asked several times, "What is the difference between + the old "=>" style guard predicates and the new style "&&" guard + predicates, and how do you choose one over the other" ? + + The main difference is that the "=>" does not apply the + predicate if the context guard doesn't match, whereas + the && form always does. What is the significance ? + + If you have a predicate which is not on the "leading edge" + it cannot be hoisted. Suppose you need a predicate that + looks at LA(2). You must introduce it manually. The + classic example is: + + castExpr : + LP typeName RP + | .... + ; + + typeName : <<isTypeName(LATEXT(1))>>? ID + | STRUCT ID + ; + + The problem is that typeName isn't on the leading edge + of castExpr, so the predicate isTypeName won't be hoisted into + castExpr to help make a decision on which production to choose. + + The *first* attempt to fix it is this: + + castExpr : + <<isTypeName(LATEXT(2))>>? + LP typeName RP + | .... + ; + + Unfortunately, this won't work because it ignores + the problem of STRUCT. The solution is to apply + isTypeName() in castExpr if LA(2) is an ID and + don't apply it when LA(2) is STRUCT: + + castExpr : + (LP ID)? => <<isTypeName(LATEXT(2))>>? + LP typeName RP + | .... + ; + + In conclusion, the "=>" style guarded predicate is + useful when: + + a. the tokens required for the predicate + are not on the leading edge + b. there are alternatives in the expression + selected by the predicate for which the + predicate is inappropriate + + If (b) were false, then one could use a simple + predicate (assuming "-prc on"): + + castExpr : + <<isTypeName(LATEXT(2))>>? + LP typeName RP + | .... + ; + + typeName : <<isTypeName(LATEXT(1))>>? ID + ; + + So, when do you use the "&&" style guarded predicate ? + + The new-style "&&" predicate should always be used with + predicate context. The context guard is in ADDITION to + the automatically computed context. Thus it useful for + predicates which depend on the token type for reasons + other than context. + + The following example is contributed by Reinier van den Born + (reinier@vnet.ibm.com). + + +-------------------------------------------------------------------------+ + | This grammar has two ways to call functions: | + | | + | - a "standard" call syntax with parens and comma separated args | + | - a shell command like syntax (no parens and spacing separated args) | + | | + | The former also allows a variable to hold the name of the function, | + | the latter can also be used to call external commands. | + | | + | The grammar (simplified) looks like this: | + | | + | fun_call : ID "(" { expr ("," expr)* } ")" | + | /* ID is function name */ | + | | "@" ID "(" { expr ("," expr)* } ")" | + | /* ID is var containing fun name */ | + | ; | + | | + | command : ID expr* /* ID is function name */ | + | | path expr* /* path is external command name */ | + | ; | + | | + | path : ID /* left out slashes and such */ | + | | "@" ID /* ID is environment var */ | + | ; | + | | + | expr : .... | + | | "(" expr ")"; | + | | + | call : fun_call | + | | command | + | ; | + | | + | Obviously the call is wildly ambiguous. This is more or less how this | + | is to be resolved: | + | | + | A call begins with an ID or an @ followed by an ID. | + | | + | If it is an ID and if it is an ext. command name -> command | + | if followed by a paren -> fun_call | + | otherwise -> command | + | | + | If it is an @ and if the ID is a var name -> fun_call | + | otherwise -> command | + | | + | One can implement these rules quite neatly using && predicates: | + | | + | call : ("@" ID)? && <<isVarName(LT(2))>>? fun_call | + | | (ID)? && <<isExtCmdName>>? command | + | | (ID "(")? fun_call | + | | command | + | ; | + | | + | This can be done better, so it is not an ideal example, but it | + | conveys the principle. | + +-------------------------------------------------------------------------+ + +#123. (Changed in 1.33MR11) Correct definition of operators in ATokPtr.h + + The return value of operators in ANTLRTokenPtr: + + changed: unsigned ... operator !=(...) + to: int ... operator != (...) + changed: unsigned ... operator ==(...) + to: int ... operator == (...) + + Suggested by R.A. Nelson (cowboy@VNET.IBM.COM) + +#122. (Changed in 1.33MR11) Member functions to reset DLG in C++ mode + + void DLGFileReset(FILE *f) { input = f; found_eof = 0; } + void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; } + + Supplied by R.A. Nelson (cowboy@VNET.IBM.COM) + +#121. (Changed in 1.33MR11) Another attempt to fix -o (output dir) option + + Another attempt is made to improve the -o option of antlr, dlg, + and sorcerer. This one by JVincent (JVincent@novell.com). + + The current rule: + + a. If -o is not specified than any explicit directory + names are retained. + + b. If -o is specified than the -o directory name overrides any + explicit directory names. + + c. The directory name of the grammar file is *not* stripped + to create the main output file. However it is stil subject + to override by the -o directory name. + +#120. (Changed in 1.33MR11) "-info f" output to stdout rather than stderr + + Added option 0 (e.g. "-info 0") which is a noop. + +#119. (Changed in 1.33MR11) Ambiguity aid for grammars + + The user can ask for additional information on ambiguities reported + by antlr to stdout. At the moment, only one ambiguity report can + be created in an antlr run. + + This feature is enabled using the "-aa" (Ambiguity Aid) option. + + The following options control the reporting of ambiguities: + + -aa ruleName Selects reporting by name of rule + -aa lineNumber Selects reporting by line number + (file name not compared) + + -aam Selects "multiple" reporting for a token + in the intersection set of the + alternatives. + + For instance, the token ID may appear dozens + of times in various paths as the program + explores the rules which are reachable from + the point of an ambiguity. With option -aam + every possible path the search program + encounters is reported. + + Without -aam only the first encounter is + reported. This may result in incomplete + information, but the information may be + sufficient and much shorter. + + -aad depth Selects the depth of the search. + The default value is 1. + + The number of paths to be searched, and the + size of the report can grow geometrically + with the -ck value if a full search for all + contributions to the source of the ambiguity + is explored. + + The depth represents the number of tokens + in the lookahead set which are matched against + the set of ambiguous tokens. A depth of 1 + means that the search stops when a lookahead + sequence of just one token is matched. + + A k=1 ck=6 grammar might generate 5,000 items + in a report if a full depth 6 search is made + with the Ambiguity Aid. The source of the + problem may be in the first token and obscured + by the volume of data - I hesitate to call + it information. + + When the user selects a depth > 1, the search + is first performed at depth=1 for both + alternatives, then depth=2 for both alternatives, + etc. + + Sample output for rule grammar in antlr.g itself: + + +---------------------------------------------------------------------+ + | Ambiguity Aid | + | | + | Choice 1: grammar/70 line 632 file a.g | + | Choice 2: grammar/82 line 644 file a.g | + | | + | Intersection of lookahead[1] sets: | + | | + | "\}" "class" "#errclass" "#tokclass" | + | | + | Choice:1 Depth:1 Group:1 ("#errclass") | + | 1 in (...)* block grammar/70 line 632 a.g | + | 2 to error grammar/73 line 635 a.g | + | 3 error error/1 line 894 a.g | + | 4 #token "#errclass" error/2 line 895 a.g | + | | + | Choice:1 Depth:1 Group:2 ("#tokclass") | + | 2 to tclass grammar/74 line 636 a.g | + | 3 tclass tclass/1 line 937 a.g | + | 4 #token "#tokclass" tclass/2 line 938 a.g | + | | + | Choice:1 Depth:1 Group:3 ("class") | + | 2 to class_def grammar/75 line 637 a.g | + | 3 class_def class_def/1 line 669 a.g | + | 4 #token "class" class_def/3 line 671 a.g | + | | + | Choice:1 Depth:1 Group:4 ("\}") | + | 2 #token "\}" grammar/76 line 638 a.g | + | | + | Choice:2 Depth:1 Group:5 ("#errclass") | + | 1 in (...)* block grammar/83 line 645 a.g | + | 2 to error grammar/93 line 655 a.g | + | 3 error error/1 line 894 a.g | + | 4 #token "#errclass" error/2 line 895 a.g | + | | + | Choice:2 Depth:1 Group:6 ("#tokclass") | + | 2 to tclass grammar/94 line 656 a.g | + | 3 tclass tclass/1 line 937 a.g | + | 4 #token "#tokclass" tclass/2 line 938 a.g | + | | + | Choice:2 Depth:1 Group:7 ("class") | + | 2 to class_def grammar/95 line 657 a.g | + | 3 class_def class_def/1 line 669 a.g | + | 4 #token "class" class_def/3 line 671 a.g | + | | + | Choice:2 Depth:1 Group:8 ("\}") | + | 2 #token "\}" grammar/96 line 658 a.g | + +---------------------------------------------------------------------+ + + For a linear lookahead set ambiguity (where k=1 or for k>1 but + when all lookahead sets [i] with i<k all have degree one) the + reports appear in the following order: + + for (depth=1 ; depth <= "-aad depth" ; depth++) { + for (alternative=1; alternative <=2 ; alternative++) { + while (matches-are-found) { + group++; + print-report + }; + }; + }; + + For reporting a k-tuple ambiguity, the reports appear in the + following order: + + for (depth=1 ; depth <= "-aad depth" ; depth++) { + while (matches-are-found) { + for (alternative=1; alternative <=2 ; alternative++) { + group++; + print-report + }; + }; + }; + + This is because matches are generated in different ways for + linear lookahead and k-tuples. + +#118. (Changed in 1.33MR11) DEC VMS makefile and VMS related changes + + Revised makefiles for DEC/VMS operating system for antlr, dlg, + and sorcerer. + + Reduced names of routines with external linkage to less than 32 + characters to conform to DEC/VMS linker limitations. + + Jean-Francois Pieronne discovered problems with dlg and antlr + due to the VMS linker not being case sensitive for names with + external linkage. In dlg the problem was with "className" and + "ClassName". In antlr the problem was with "GenExprSets" and + "genExprSets". + + Added genmms, a version of genmk for the DEC/VMS version of make. + The source is in directory pccts/support/DECmms. + + All VMS contributions by Jean-Francois Pieronne (jfp@iname.com). + +#117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code + + The hoisting of predicates into rules to create prediction + expressions is a problem in antlr. Consider the following + example (k=1 with -prc on): + + start : (a)* "@" ; + a : b | c ; + b : <<isUpper(LATEXT(1))>>? A ; + c : A ; + + Prior to 1.33MR10 the code generated for "start" would resemble: + + while { + if (LA(1)==A && + (!LA(1)==A || isUpper())) { + a(); + } + }; + + This code is wrong because it makes rule "c" unreachable from + "start". The essence of the problem is that antlr fails to + recognize that there can be a valid alternative within "a" even + when the predicate <<isUpper(LATEXT(1))>>? is false. + + In 1.33MR10 with -mrhoist the hoisting of the predicate into + "start" is suppressed because it recognizes that "c" can + cover all the cases where the predicate is false: + + while { + if (LA(1)==A) { + a(); + } + }; + + With the antlr "-info p" switch the user will receive information + about the predicate suppression in the generated file: + + -------------------------------------------------------------- + #if 0 + + Hoisting of predicate suppressed by alternative without predicate. + The alt without the predicate includes all cases where + the predicate is false. + + WITH predicate: line 7 v1.g + WITHOUT predicate: line 7 v1.g + + The context set for the predicate: + + A + + The lookahead set for the alt WITHOUT the semantic predicate: + + A + + The predicate: + + pred << isUpper(LATEXT(1))>>? + depth=k=1 rule b line 9 v1.g + set context: + A + tree context: null + + Chain of referenced rules: + + #0 in rule start (line 5 v1.g) to rule a + #1 in rule a (line 7 v1.g) + + #endif + -------------------------------------------------------------- + + A predicate can be suppressed by a combination of alternatives + which, taken together, cover a predicate: + + start : (a)* "@" ; + + a : b | ca | cb | cc ; + + b : <<isUpper(LATEXT(1))>>? ( A | B | C ) ; + + ca : A ; + cb : B ; + cc : C ; + + Consider a more complex example in which "c" covers only part of + a predicate: + + start : (a)* "@" ; + + a : b + | c + ; + + b : <<isUpper(LATEXT(1))>>? + ( A + | X + ); + + c : A + ; + + Prior to 1.33MR10 the code generated for "start" would resemble: + + while { + if ( (LA(1)==A || LA(1)==X) && + (! (LA(1)==A || LA(1)==X) || isUpper()) { + a(); + } + }; + + With 1.33MR10 and -mrhoist the predicate context is restricted to + the non-covered lookahead. The code resembles: + + while { + if ( (LA(1)==A || LA(1)==X) && + (! (LA(1)==X) || isUpper()) { + a(); + } + }; + + With the antlr "-info p" switch the user will receive information + about the predicate restriction in the generated file: + + -------------------------------------------------------------- + #if 0 + + Restricting the context of a predicate because of overlap + in the lookahead set between the alternative with the + semantic predicate and one without + Without this restriction the alternative without the predicate + could not be reached when input matched the context of the + predicate and the predicate was false. + + WITH predicate: line 11 v4.g + WITHOUT predicate: line 12 v4.g + + The original context set for the predicate: + + A X + + The lookahead set for the alt WITHOUT the semantic predicate: + + A + + The intersection of the two sets + + A + + The original predicate: + + pred << isUpper(LATEXT(1))>>? + depth=k=1 rule b line 15 v4.g + set context: + A X + tree context: null + + The new (modified) form of the predicate: + + pred << isUpper(LATEXT(1))>>? + depth=k=1 rule b line 15 v4.g + set context: + X + tree context: null + + #endif + -------------------------------------------------------------- + + The bad news about -mrhoist: + + (a) -mrhoist does not analyze predicates with lookahead + depth > 1. + + (b) -mrhoist does not look past a guarded predicate to + find context which might cover other predicates. + + For these cases you might want to use syntactic predicates. + When a semantic predicate fails during guess mode the guess + fails and the next alternative is tried. + + Limitation (a) is illustrated by the following example: + + start : (stmt)* EOF ; + + stmt : cast + | expr + ; + cast : <<isTypename(LATEXT(2))>>? LP ID RP ; + + expr : LP ID RP ; + + This is not much different from the first example, except that + it requires two tokens of lookahead context to determine what + to do. This predicate is NOT suppressed because the current version + is unable to handle predicates with depth > 1. + + A predicate can be combined with other predicates during hoisting. + In those cases the depth=1 predicates are still handled. Thus, + in the following example the isUpper() predicate will be suppressed + by line #4 when hoisted from "bizarre" into "start", but will still + be present in "bizarre" in order to predict "stmt". + + start : (bizarre)* EOF ; // #1 + // #2 + bizarre : stmt // #3 + | A // #4 + ; + + stmt : cast + | expr + ; + + cast : <<isTypename(LATEXT(2))>>? LP ID RP ; + + expr : LP ID RP ; + | <<isUpper(LATEXT(1))>>? A + + Limitation (b) is illustrated by the following example of a + context guarded predicate: + + rule : (A)? <<p>>? // #1 + (A // #2 + |B // #3 + ) // #4 + | <<q>> B // #5 + ; + + Recall that this means that when the lookahead is NOT A then + the predicate "p" is ignored and it attempts to match "A|B". + Ideally, the "B" at line #3 should suppress predicate "q". + However, the current version does not attempt to look past + the guard predicate to find context which might suppress other + predicates. + + In some cases -mrhoist will lead to the reporting of ambiguities + which were not visible before: + + start : (a)* "@"; + a : bc | d; + bc : b | c ; + + b : <<isUpper(LATEXT(1))>>? A; + c : A ; + + d : A ; + + In this case there is a true ambiguity in "a" between "bc" and "d" + which can both match "A". Without -mrhoist the predicate in "b" + is hoisted into "a" and there is no ambiguity reported. However, + with -mrhoist, the predicate in "b" is suppressed by "c" (as it + should be) making the ambiguity in "a" apparent. + + The motivations for these changes were hoisting problems reported + by Reinier van den Born (reinier@vnet.ibm.com) and several others. + +#116. (Changed in 1.33MR10) C++ mode: tracein/traceout rule name is (const char *) + + The prototype for C++ mode routine tracein (and traceout) has changed from + "char *" to "const char *". + +#115. (Changed in 1.33MR10) Using guess mode with exception handlers in C mode + + The definition of the C mode macros zzmatch_wsig and zzsetmatch_wsig + neglected to consider guess mode. When control passed to the rule's + parse exception handler the routine would exit without ever closing the + guess block. This would lead to unpredictable behavior. + + In 1.33MR10 the behavior of exceptions in C mode and C++ mode should be + identical. + +#114. (Changed in 1.33MR10) difference in [zz]resynch() between C and C++ modes + + There was a slight difference in the way C and C++ mode resynchronized + following a parsing error. The C routine would sometimes skip an extra + token before attempting to resynchronize. + + The C routine was changed to match the C++ routine. + +#113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr + + The existing context guarded predicate: + + rule : (guard)? => <<p>>? expr + | next_alternative + ; + + generates code which resembles: + + if (lookahead(expr) && (!guard || pred)) { + expr() + } else .... + + This is not suitable for some applications because it allows + expr() to be invoked when the predicate is false. This is + intentional because it is meant to mimic automatically computed + predicate context. + + The new context guarded predicate uses the guard information + differently because it has a different goal. Consider: + + rule : (guard)? && <<p>>? expr + | next_alternative + ; + + The new style of context guarded predicate is equivalent to: + + rule : <<guard==true && pred>>? expr + | next_alternative + ; + + It generates code which resembles: + + if (lookahead(expr) && guard && pred) { + expr(); + } else ... + + Both forms of guarded predicates severely restrict the form of + the context guard: it can contain no rule references, no + (...)*, no (...)+, and no {...}. It may contain token and + token class references, and alternation ("|"). + + Addition for 1.33MR11: in the token expression all tokens must + be at the same height of the token tree: + + (A ( B | C))? && ... is ok (all height 2) + (A ( B | ))? && ... is not ok (some 1, some 2) + (A B C D | E F G H)? && ... is ok (all height 4) + (A B C D | E )? && ... is not ok (some 4, some 1) + + This restriction is required in order to properly compute the lookahead + set for expressions like: + + rule1 : (A B C)? && <<pred>>? rule2 ; + rule2 : (A|X) (B|Y) (C|Z); + + This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com) + +#112. (Changed in 1.33MR10) failed validation predicate in C guess mode + + John Lilley (jlilley@empathy.com) suggested that failed validation + predicates abort a guess rather than reporting a failed error. + This was installed in C++ mode (Item #4). Only now was it noticed + that the fix was never installed for C mode. + +#111. (Changed in 1.33MR10) moved zzTRACEIN to before init action + + When the antlr -gd switch is present antlr generates calls to + zzTRACEIN at the start of a rule and zzTRACEOUT at the exit + from a rule. Prior to 1.33MR10 Tthe call to zzTRACEIN was + after the init-action, which could cause confusion because the + init-actions were reported with the name of the enclosing rule, + rather than the active rule. + +#110. (Changed in 1.33MR10) antlr command line copied to generated file + + The antlr command line is now copied to the generated file near + the start. + +#109. (Changed in 1.33MR10) improved trace information + + The quality of the trace information provided by the "-gd" + switch has been improved significantly. Here is an example + of the output from a test program. It shows the rule name, + the first token of lookahead, the call depth, and the guess + status: + + exit rule gusxx {"?"} depth 2 + enter rule gusxx {"?"} depth 2 + enter rule gus1 {"o"} depth 3 guessing + guess done - returning to rule gus1 {"o"} at depth 3 + (guess mode continues - an enclosing guess is still active) + guess done - returning to rule gus1 {"Z"} at depth 3 + (guess mode continues - an enclosing guess is still active) + exit rule gus1 {"Z"} depth 3 guessing + guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends) + enter rule gus1 {"o"} depth 3 + guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends) + guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends) + exit rule gus1 {"Z"} depth 3 + line 1: syntax error at "Z" missing SC + ... + + Rule trace reporting is controlled by the value of the integer + [zz]traceOptionValue: when it is positive tracing is enabled, + otherwise it is disabled. Tracing during guess mode is controlled + by the value of the integer [zz]traceGuessOptionValue. When + it is positive AND [zz]traceOptionValue is positive rule trace + is reported in guess mode. + + The values of [zz]traceOptionValue and [zz]traceGuessOptionValue + can be adjusted by subroutine calls listed below. + + Depending on the presence or absence of the antlr -gd switch + the variable [zz]traceOptionValueDefault is set to 0 or 1. When + the parser is initialized or [zz]traceReset() is called the + value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue. + The value of [zz]traceGuessOptionValue is always initialized to 1, + but, as noted earlier, nothing will be reported unless + [zz]traceOptionValue is also positive. + + When the parser state is saved/restored the value of the trace + variables are also saved/restored. If a restore causes a change in + reporting behavior from on to off or vice versa this will be reported. + + When the -gd option is selected, the macro "#define zzTRACE_RULES" + is added to appropriate output files. + + C++ mode + -------- + int traceOption(int delta) + int traceGuessOption(int delta) + void traceReset() + int traceOptionValueDefault + + C mode + -------- + int zzTraceOption(int delta) + int zzTraceGuessOption(int delta) + void zzTraceReset() + int zzTraceOptionValueDefault + + The argument "delta" is added to the traceOptionValue. To + turn on trace when inside a particular rule one: + + rule : <<traceOption(+1);>> + ( + rest-of-rule + ) + <<traceOption(-1);>> + ; /* fail clause */ <<traceOption(-1);>> + + One can use the same idea to turn *off* tracing within a + rule by using a delta of (-1). + + An improvement in the rule trace was suggested by Sramji + Ramanathan (ps@kumaran.com). + +#108. A Note on Deallocation of Variables Allocated in Guess Mode + + NOTE + ------------------------------------------------------ + This mechanism only works for heap allocated variables + ------------------------------------------------------ + + The rewrite of the trace provides the machinery necessary + to properly free variables or undo actions following a + failed guess. + + The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded + as part of the zzGUESS macro. When a guess is opened + the value of zzrv is 0. When a longjmp() is executed to + undo the guess, the value of zzrv will be 1. + + The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded + as part of the zzGUESS_DONE macro. This is executed + whether the guess succeeds or fails as part of closing + the guess. + + The guessSeq is a sequence number which is assigned to each + guess and is incremented by 1 for each guess which becomes + active. It is needed by the user to associate the start of + a guess with the failure and/or completion (closing) of a + guess. + + Guesses are nested. They must be closed in the reverse + of the order that they are opened. + + In order to free memory used by a variable during a guess + a user must write a routine which can be called to + register the variable along with the current guess sequence + number provided by the zzUSER_GUESS_HOOK macro. If the guess + fails, all variables tagged with the corresponding guess + sequence number should be released. This is ugly, but + it would require a major rewrite of antlr 1.33 to use + some mechanism other than setjmp()/longjmp(). + + The order of calls for a *successful* guess would be: + + zzUSER_GUESS_HOOK(guessSeq,0); + zzUSER_GUESS_DONE_HOOK(guessSeq); + + The order of calls for a *failed* guess would be: + + zzUSER_GUESS_HOOK(guessSeq,0); + zzUSER_GUESS_HOOK(guessSeq,1); + zzUSER_GUESS_DONE_HOOK(guessSeq); + + The default definitions of these macros are empty strings. + + Here is an example in C++ mode. The zzUSER_GUESS_HOOK and + zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine + can be used without change in both C and C++ versions. + + ---------------------------------------------------------------------- + << + + #include "AToken.h" + + typedef ANTLRCommonToken ANTLRToken; + + #include "DLGLexer.h" + + int main() { + + { + DLGFileInput in(stdin); + DLGLexer lexer(&in,2000); + ANTLRTokenBuffer pipe(&lexer,1); + ANTLRCommonToken aToken; + P parser(&pipe); + + lexer.setToken(&aToken); + parser.init(); + parser.start(); + }; + + fclose(stdin); + fclose(stdout); + return 0; + } + + >> + + << + char *s=NULL; + + #undef zzUSER_GUESS_HOOK + #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv); + #undef zzUSER_GUESS_DONE_HOOK + #define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2); + + void myGuessHook(int guessSeq,int zzrv) { + if (zzrv == 0) { + fprintf(stderr,"User hook: starting guess #%d\n",guessSeq); + } else if (zzrv == 1) { + free (s); + s=NULL; + fprintf(stderr,"User hook: failed guess #%d\n",guessSeq); + } else if (zzrv == 2) { + free (s); + s=NULL; + fprintf(stderr,"User hook: ending guess #%d\n",guessSeq); + }; + } + + >> + + #token A "a" + #token "[\t \ \n]" <<skip();>> + + class P { + + start : (top)+ + ; + + top : (which) ? <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >> + | other <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >> + ; <<if (s != NULL) free(s); s=NULL; >> + + which : which2 + ; + + which2 : which3 + ; + which3 + : (label)? <<fprintf(stderr,"%s is a label\n",s);>> + | (global)? <<fprintf(stderr,"%s is a global\n",s);>> + | (exclamation)? <<fprintf(stderr,"%s is an exclamation\n",s);>> + ; + + label : <<s=strdup(LT(1)->getText());>> A ":" ; + + global : <<s=strdup(LT(1)->getText());>> A "::" ; + + exclamation : <<s=strdup(LT(1)->getText());>> A "!" ; + + other : <<s=strdup(LT(1)->getText());>> "other" ; + + } + ---------------------------------------------------------------------- + + This is a silly example, but illustrates the idea. For the input + "a ::" with tracing enabled the output begins: + + ---------------------------------------------------------------------- + enter rule "start" depth 1 + enter rule "top" depth 2 + User hook: starting guess #1 + enter rule "which" depth 3 guessing + enter rule "which2" depth 4 guessing + enter rule "which3" depth 5 guessing + User hook: starting guess #2 + enter rule "label" depth 6 guessing + guess failed + User hook: failed guess #2 + guess done - returning to rule "which3" at depth 5 (guess mode continues + - an enclosing guess is still active) + User hook: ending guess #2 + User hook: starting guess #3 + enter rule "global" depth 6 guessing + exit rule "global" depth 6 guessing + guess done - returning to rule "which3" at depth 5 (guess mode continues + - an enclosing guess is still active) + User hook: ending guess #3 + enter rule "global" depth 6 guessing + exit rule "global" depth 6 guessing + exit rule "which3" depth 5 guessing + exit rule "which2" depth 4 guessing + exit rule "which" depth 3 guessing + guess done - returning to rule "top" at depth 2 (guess mode ends) + User hook: ending guess #1 + enter rule "which" depth 3 + ..... + ---------------------------------------------------------------------- + + Remember: + + (a) Only init-actions are executed during guess mode. + (b) A rule can be invoked multiple times during guess mode. + (c) If the guess succeeds the rule will be called once more + without guess mode so that normal actions will be executed. + This means that the init-action might need to distinguish + between guess mode and non-guess mode using the variable + [zz]guessing. + +#107. (Changed in 1.33MR10) construction of ASTs in guess mode + + Prior to 1.33MR10, when using automatic AST construction in C++ + mode for a rule, an AST would be constructed for elements of the + rule even while in guess mode. In MR10 this no longer occurs. + +#106. (Changed in 1.33MR10) guess variable confusion + + In C++ mode a guess which failed always restored the parser state + using zzGUESS_DONE as part of zzGUESS_FAIL. Prior to 1.33MR10, + C mode required an explicit call to zzGUESS_DONE after the + call to zzGUESS_FAIL. + + Consider: + + rule : (alpha)? beta + | ... + ; + + The generated code resembles: + + zzGUESS + if (!zzrv && LA(1)==ID) { <==== line #1 + alpha + zzGUESS_DONE + beta + } else { + if (! zzrv) zzGUESS_DONE <==== line #2a + .... + + However, in some cases line #2 was rendered: + + if (guessing) zzGUESS_DONE <==== line #2b + + This would work for simple test cases, but would fail in + some cases where there was a guess while another guess was active. + One kind of failure would be to match up the zzGUESS_DONE at line + #2b with the "outer" guess which was still active. The outer + guess would "succeed" when only the inner guess should have + succeeded. + + In 1.33MR10 the behavior of zzGUESS and zzGUESS_FAIL in C and + and C++ mode should be identical. + + The same problem appears in 1.33 vanilla in some places. For + example: + + start : { (sub)? } ; + + or: + + start : ( + B + | ( sub )? + | C + )+ + ; + + generates incorrect code. + + The general principle is: + + (a) use [zz]guessing only when deciding between a call to zzFAIL + or zzGUESS_FAIL + + (b) use zzrv in all other cases + + This problem was discovered while testing changes to item #105. + I believe this is now fixed. My apologies. + +#105. (Changed in 1.33MR10) guess block as single alt of (...)+ + + Prior to 1.33MR10 the following constructs: + + rule_plus : ( + (sub)? + )+ + ; + + rule_star : ( + (sub)? + )* + ; + + generated incorrect code for the guess block (which could result + in runtime errors) because of an incorrect optimization of a + block with only a single alternative. + + The fix caused some changes to the fix described in Item #49 + because there are now three code generation sequences for (...)+ + blocks containing a guess block: + + a. single alternative which is a guess block + b. multiple alternatives in which the last is a guess block + c. all other cases + + Forms like "rule_star" can have unexpected behavior when there + is a syntax error: if the subrule "sub" is not matched *exactly* + then "rule_star" will consume no tokens. + + Reported by Esa Pulkkinen (esap@cs.tut.fi). + +#104. (Changed in 1.33MR10) -o option for dlg + + There was problem with the code added by item #74 to handle the + -o option of dlg. This should fix it. + +#103. (Changed in 1.33MR10) ANDed semantic predicates + + Rescinded. + + The optimization was a mistake. + The resulting problem is described in Item #150. + +#102. (Changed in 1.33MR10) allow "class parser : .... {" + + The syntax of the class statement ("class parser-name {") + has been extended to allow for the specification of base + classes. An arbitrary number of tokens may now appear + between the class name and the "{". They are output + again when the class declaration is generated. For + example: + + class Parser : public MyBaseClassANTLRparser { + + This was suggested by a user, but I don't have a record + of who it was. + +#101. (Changed in 1.33MR10) antlr -info command line switch + + -info + + p - extra predicate information in generated file + + t - information about tnode use: + at the end of each rule in generated file + summary on stderr at end of program + + m - monitor progress + prints name of each rule as it is started + flushes output at start of each rule + + f - first/follow set information to stdout + + 0 - no operation (added in 1.33MR11) + + The options may be combined and may appear in any order. + For example: + + antlr -info ptm -CC -gt -mrhoist on mygrammar.g + +#100a. (Changed in 1.33MR10) Predicate tree simplification + + When the same predicates can be referenced in more than one + alternative of a block large predicate trees can be formed. + + The difference that these optimizations make is so dramatic + that I have decided to use it even when -mrhoist is not selected. + + Consider the following grammar: + + start : ( all )* ; + + all : a + | d + | e + | f + ; + + a : c A B + | c A C + ; + + c : <<AAA(LATEXT(2))>>? + ; + + d : <<BBB(LATEXT(2))>>? B C + ; + + e : <<CCC(LATEXT(2))>>? B C + ; + + f : e X Y + ; + + In rule "a" there is a reference to rule "c" in both alternatives. + The length of the predicate AAA is k=2 and it can be followed in + alternative 1 only by (A B) while in alternative 2 it can be + followed only by (A C). Thus they do not have identical context. + + In rule "all" the alternatives which refer to rules "e" and "f" allow + elimination of the duplicate reference to predicate CCC. + + The table below summarized the kind of simplification performed by + 1.33MR10. In the table, X and Y stand for single predicates + (not trees). + + (OR X (OR Y (OR Z))) => (OR X Y Z) + (AND X (AND Y (AND Z))) => (AND X Y Z) + + (OR X (... (OR X Y) ... )) => (OR X (... Y ... )) + (AND X (... (AND X Y) ... )) => (AND X (... Y ... )) + (OR X (... (AND X Y) ... )) => (OR X (... ... )) + (AND X (... (OR X Y) ... )) => (AND X (... ... )) + + (AND X) => X + (OR X) => X + + In a test with a complex grammar for a real application, a predicate + tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)". + + In 1.33MR10 there is a greater effort to release memory used + by predicates once they are no longer in use. + +#100b. (Changed in 1.33MR10) Suppression of extra predicate tests + + The following optimizations require that -mrhoist be selected. + + It is relatively easy to optimize the code generated for predicate + gates when they are of the form: + + (AND X Y Z ...) + or (OR X Y Z ...) + + where X, Y, Z, and "..." represent individual predicates (leaves) not + predicate trees. + + If the predicate is an AND the contexts of the X, Y, Z, etc. are + ANDed together to create a single Tree context for the group and + context tests for the individual predicates are suppressed: + + -------------------------------------------------- + Note: This was incorrect. The contexts should be + ORed together. This has been fixed. A more + complete description is available in item #152. + --------------------------------------------------- + + Optimization 1: (AND X Y Z ...) + + Suppose the context for Xtest is LA(1)==LP and the context for + Ytest is LA(1)==LP && LA(2)==ID. + + Without the optimization the code would resemble: + + if (lookaheadContext && + !(LA(1)==LP && LA(1)==LP && LA(2)==ID) || + ( (! LA(1)==LP || Xtest) && + (! (LA(1)==LP || LA(2)==ID) || Xtest) + )) {... + + With the -mrhoist optimization the code would resemble: + + if (lookaheadContext && + ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {... + + Optimization 2: (OR X Y Z ...) with identical contexts + + Suppose the context for Xtest is LA(1)==ID and for Ytest + the context is also LA(1)==ID. + + Without the optimization the code would resemble: + + if (lookaheadContext && + ! (LA(1)==ID || LA(1)==ID) || + (LA(1)==ID && Xtest) || + (LA(1)==ID && Ytest) {... + + With the -mrhoist optimization the code would resemble: + + if (lookaheadContext && + (! LA(1)==ID) || (Xtest || Ytest) {... + + Optimization 3: (OR X Y Z ...) with distinct contexts + + Suppose the context for Xtest is LA(1)==ID and for Ytest + the context is LA(1)==LP. + + Without the optimization the code would resemble: + + if (lookaheadContext && + ! (LA(1)==ID || LA(1)==LP) || + (LA(1)==ID && Xtest) || + (LA(1)==LP && Ytest) {... + + With the -mrhoist optimization the code would resemble: + + if (lookaheadContext && + (zzpf=0, + (LA(1)==ID && (zzpf=1) && Xtest) || + (LA(1)==LP && (zzpf=1) && Ytest) || + !zzpf) { + + These may appear to be of similar complexity at first, + but the non-optimized version contains two tests of each + context while the optimized version contains only one + such test, as well as eliminating some of the inverted + logic (" !(...) || "). + + Optimization 4: Computation of predicate gate trees + + When generating code for the gates of predicate expressions + antlr 1.33 vanilla uses a recursive procedure to generate + "&&" and "||" expressions for testing the lookahead. As each + layer of the predicate tree is exposed a new set of "&&" and + "||" expressions on the lookahead are generated. In many + cases the lookahead being tested has already been tested. + + With -mrhoist a lookahead tree is computed for the entire + lookahead expression. This means that predicates with identical + context or context which is a subset of another predicate's + context disappear. + + This is especially important for predicates formed by rules + like the following: + + upperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel; + vowel: : <<isVowel(LATEXT(1))>>? LETTERS; + + These predicates are combined using AND since both must be + satisfied for rule upperCaseVowel. They have identical + context which makes this optimization very effective. + + The affect of Items #100a and #100b together can be dramatic. In + a very large (but real world) grammar one particular predicate + expression was reduced from an (unreadable) 50 predicate leaves, + 195 LA(1) terms, and 5500 characters to an (easily comprehensible) + 3 predicate leaves (all different) and a *single* LA(1) term. + +#99. (Changed in 1.33MR10) Code generation for expression trees + + Expression trees are used for k>1 grammars and predicates with + lookahead depth >1. This optimization must be enabled using + "-mrhoist on". (Clarification added for 1.33MR11). + + In the processing of expression trees, antlr can generate long chains + of token comparisons. Prior to 1.33MR10 there were many redundant + parenthesis which caused problems for compilers which could handle + expressions of only limited complexity. For example, to test an + expression tree (root R A B C D), antlr would generate something + resembling: + + (LA(1)==R && (LA(2)==A || (LA(2)==B || (LA(2)==C || LA(2)==D))))) + + If there were twenty tokens to test then there would be twenty + parenthesis at the end of the expression. + + In 1.33MR10 the generated code for tree expressions resembles: + + (LA(1)==R && (LA(2)==A || LA(2)==B || LA(2)==C || LA(2)==D)) + + For "complex" expressions the output is indented to reflect the LA + number being tested: + + (LA(1)==R + && (LA(2)==A || LA(2)==B || LA(2)==C || LA(2)==D + || LA(2)==E || LA(2)==F) + || LA(1)==S + && (LA(2)==G || LA(2)==H)) + + + Suggested by S. Bochnak (S.Bochnak@@microTool.com.pl), + +#98. (Changed in 1.33MR10) Option "-info p" + + When the user selects option "-info p" the program will generate + detailed information about predicates. If the user selects + "-mrhoist on" additional detail will be provided explaining + the promotion and suppression of predicates. The output is part + of the generated file and sandwiched between #if 0/#endif statements. + + Consider the following k=1 grammar: + + start : ( all ) * ; + + all : ( a + | b + ) + ; + + a : c B + ; + + c : <<LATEXT(1)>>? + | B + ; + + b : <<LATEXT(1)>>? X + ; + + Below is an excerpt of the output for rule "start" for the three + predicate options (off, on, and maintenance release style hoisting). + + For those who do not wish to use the "-mrhoist on" option for code + generation the option can be used in a "diagnostic" mode to provide + valuable information: + + a. where one should insert null actions to inhibit hoisting + b. a chain of rule references which shows where predicates are + being hoisted + + ====================================================================== + Example of "-info p" with "-mrhoist on" + ====================================================================== + #if 0 + + Hoisting of predicate suppressed by alternative without predicate. + The alt without the predicate includes all cases where the + predicate is false. + + WITH predicate: line 11 v36.g + WITHOUT predicate: line 12 v36.g + + The context set for the predicate: + + B + + The lookahead set for alt WITHOUT the semantic predicate: + + B + + The predicate: + + pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g + + set context: + B + tree context: null + + Chain of referenced rules: + + #0 in rule start (line 1 v36.g) to rule all + #1 in rule all (line 3 v36.g) to rule a + #2 in rule a (line 8 v36.g) to rule c + #3 in rule c (line 11 v36.g) + + #endif + && + #if 0 + + pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g + + set context: + X + tree context: null + + #endif + ====================================================================== + Example of "-info p" with the default -prc setting ( "-prc off") + ====================================================================== + #if 0 + + OR + pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g + + set context: + nil + tree context: null + + pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g + + set context: + nil + tree context: null + + #endif + ====================================================================== + Example of "-info p" with "-prc on" and "-mrhoist off" + ====================================================================== + #if 0 + + OR + pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g + + set context: + B + tree context: null + + pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g + + set context: + X + tree context: null + + #endif + ====================================================================== + +#97. (Fixed in 1.33MR10) "Predicate applied for more than one ... " + + In 1.33 vanilla, the grammar listed below produced this message for + the first alternative (only) of rule "b": + + warning: predicate applied for >1 lookahead 1-sequences + [you may only want one lookahead 1-sequence to apply. + Try using a context guard '(...)? =>' + + In 1.33MR10 the message is issued for both alternatives. + + top : (a)*; + a : b | c ; + + b : <<PPP(LATEXT(1))>>? ( AAA | BBB ) + | <<QQQ(LATEXT(1))>>? ( XXX | YYY ) + ; + + c : AAA | XXX; + +#96. (Fixed in 1.33MR10) Guard predicates ignored when -prc off + + Prior to 1.33MR10, guard predicate code was not generated unless + "-prc on" was selected. + + This was incorrect, since "-prc off" (the default) is supposed to + disable only AUTOMATIC computation of predicate context, not the + programmer specified context supplied by guard predicates. + +#95. (Fixed in 1.33MR10) Predicate guard context length was k, not max(k,ck) + + Prior to 1.33MR10, predicate guards were computed to k tokens rather + than max(k,ck). Consider the following grammar: + + a : ( A B C)? => <<AAA(LATEXT(1))>>? (A|X) (B|Y) (C|Z) ; + + The code generated by 1.33 vanilla with "-k 1 -ck 3 -prc on" + for the predicate in "a" resembles: + + if ( (! LA(1)==A) || AAA(LATEXT(1))) {... + + With 1.33MR10 and the same options the code resembles: + + if ( (! (LA(1)==A && LA(2)==B && LA(3)==C) || AAA(LATEXT(1))) {... + +#94. (Fixed in 1.33MR10) Predicates followed by rule references + + Prior to 1.33MR10, a semantic predicate which referenced a token + which was off the end of the rule caused an incomplete context + to be computed (with "-prc on") for the predicate under some circum- + stances. In some cases this manifested itself as illegal C code + (e.g. "LA(2)==[Ep](1)" in the k=2 examples below: + + all : ( a ) *; + + a : <<AAA(LATEXT(2))>>? ID X + | <<BBB(LATEXT(2))>>? Y + | Z + ; + + This might also occur when the semantic predicate was followed + by a rule reference which was shorter than the length of the + semantic predicate: + + all : ( a ) *; + + a : <<AAA(LATEXT(2))>>? ID X + | <<BBB(LATEXT(2))>>? y + | Z + ; + + y : Y ; + + Depending on circumstance, the resulting context might be too + generous because it was too short, or too restrictive because + of missing alternatives. + +#93. (Changed in 1.33MR10) Definition of Purify macro + + Ofer Ben-Ami (gremlin@cs.huji.ac.il) has supplied a definition + for the Purify macro: + + #define PURIFY(r, s) memset((char *) &(r), '\0', (s)); + + Note: This may not be the right thing to do for C++ objects that + have constructors. Reported by Bonny Rais (bonny@werple.net.au). + + For those cases one should #define PURIFY to an empty macro in the + #header or #first actions. + +#92. (Fixed in 1.33MR10) Guarded predicates and hoisting + + When a guarded predicate participates in hoisting it is linked into + a predicate expression tree. Prior to 1.33MR10 this link was never + cleared and the next time the guard was used to construct a new + tree the link could contain a spurious reference to another element + which had previously been joined to it in the semantic predicate tree. + + For example: + + start : ( all ) *; + all : ( a | b ) ; + + start2 : ( all2 ) *; + all2 : ( a ) ; + + a : (A)? => <<AAA(LATEXT(1))>>? A ; + b : (B)? => <<BBB(LATEXT(1))>>? B ; + + Prior to 1.33MR10 the code for "start2" would include a spurious + reference to the BBB predicate which was left from constructing + the predicate tree for rule "start" (i.e. or(AAA,BBB) ). + + In 1.33MR10 this problem is avoided by cloning the original guard + each time it is linked into a predicate tree. + +#91. (Changed in 1.33MR10) Extensive changes to semantic pred hoisting + + ============================================ + This has been rendered obsolete by Item #117 + ============================================ + +#90. (Fixed in 1.33MR10) Semantic pred with LT(i) and i>max(k,ck) + + There is a bug in antlr 1.33 vanilla and all maintenance releases + prior to 1.33MR10 which allows semantic predicates to reference + an LT(i) or LATEXT(i) where i is larger than max(k,ck). When + this occurs antlr will attempt to mark the ith element of an array + in which there are only max(k,ck) elements. The result cannot + be predicted. + + Using LT(i) or LATEXT(i) for i>max(k,ck) is reported as an error + in 1.33MR10. + +#89. Rescinded + +#88. (Fixed in 1.33MR10) Tokens used in semantic predicates in guess mode + + Consider the behavior of a semantic predicate during guess mode: + + rule : a:A ( + <<test($a)>>? b:B + | c:C + ); + + Prior to MR10 the assignment of the token or attribute to + $a did not occur during guess mode, which would cause the + semantic predicate to misbehave because $a would be null. + + In 1.33MR10 a semantic predicate with a reference to an + element label (such as $a) forces the assignment to take + place even in guess mode. + + In order to work, this fix REQUIRES use of the $label format + for token pointers and attributes referenced in semantic + predicates. + + The fix does not apply to semantic predicates using the + numeric form to refer to attributes (e.g. <<test($1)>>?). + The user will receive a warning for this case. + + Reported by Rob Trout (trout@mcs.cs.kent.edu). + +#87. (Fixed in 1.33MR10) Malformed guard predicates + + Context guard predicates may contain only references to + tokens. They may not contain references to (...)+ and + (...)* blocks. This is now checked. This replaces the + fatal error message in item #78 with an appropriate + (non-fatal) error message. + + In theory, context guards should be allowed to reference + rules. However, I have not had time to fix this. + Evaluation of the guard takes place before all rules have + been read, making it difficult to resolve a forward reference + to rule "zzz" - it hasn't been read yet ! To postpone evaluation + of the guard until all rules have been read is too much + for the moment. + +#86. (Fixed in 1.33MR10) Unequal set size in set_sub + + Routine set_sub() in pccts/support/set/set.h did not work + correctly when the sets were of unequal sizes. Rewrote + set_equ to make it simpler and remove unnecessary and + expensive calls to set_deg(). This routine was not used + in 1.33 vanilla. + +#85. (Changed in 1.33MR10) Allow redefinition of MaxNumFiles + + Raised the maximum number of input files to 99 from 20. + Put a #ifndef/#endif around the "#define MaxNumFiles 99". + +#84. (Fixed in 1.33MR10) Initialize zzBadTok in macro zzRULE + + Initialize zzBadTok to NULL in zzRULE macro of AParser.h. + in order to get rid of warning messages. + +#83. (Fixed in 1.33MR10) False warnings with -w2 for #tokclass + + When -w2 is selected antlr gives inappropriate warnings about + #tokclass names not having any associated regular expressions. + Since a #tokclass is not a "real" token it will never have an + associated regular expression and there should be no warning. + + Reported by Derek Pappas (derek.pappas@eng.sun.com) + +#82. (Fixed in 1.33MR10) Computation of follow sets with multiple cycles + + Reinier van den Born (reinier@vnet.ibm.com) reported a problem + in the computation of follow sets by antlr. The problem (bug) + exists in 1.33 vanilla and all maintenance releases prior to 1.33MR10. + + The problem involves the computation of follow sets when there are + cycles - rules which have mutual references. I believe the problem + is restricted to cases where there is more than one cycle AND + elements of those cycles have rules in common. Even when this + occurs it may not affect the code generated - but it might. It + might also lead to undetected ambiguities. + + There were no changes in antlr or dlg output from the revised version. + + The following fragment demonstrates the problem by giving different + follow sets (option -pa) for var_access when built with k=1 and ck=2 on + 1.33 vanilla and 1.33MR10: + + echo_statement : ECHO ( echo_expr )* + ; + + echo_expr : ( command )? + | expression + ; + + command : IDENTIFIER + { concat } + ; + + expression : operand ( OPERATOR operand )* + ; + + operand : value + | START command END + ; + + value : concat + | TYPE operand + ; + + concat : var_access { CONCAT value } + ; + + var_access : IDENTIFIER { INDEX } + + ; +#81. (Changed in 1.33MR10) C mode use of attributes and ASTs + + Reported by Isaac Clark (irclark@mindspring.com). + + C mode code ignores attributes returned by rules which are + referenced using element labels when ASTs are enabled (-gt option). + + 1. start : r:rule t:Token <<$start=$r;>> + + The $r reference will not work when combined with + the -gt option. + + 2. start : t:Token <<$start=$t;>> + + The $t reference works in all cases. + + 3. start : rule <<$0=$1;>> + + Numeric labels work in all cases. + + With MR10 the user will receive an error message for case 1 when + the -gt option is used. + +#80. (Fixed in 1.33MR10) (...)? as last alternative of block + + A construct like the following: + + rule : a + | (b)? + ; + + does not make sense because there is no alternative when + the guess block fails. This is now reported as a warning + to the user. + + Previously, there was a code generation error for this case: + the guess block was not "closed" when the guess failed. + This could cause an infinite loop or other problems. This + is now fixed. + + Example problem: + + #header<< + #include <stdio.h> + #include "charptr.h" + >> + + << + #include "charptr.c" + main () + { + ANTLR(start(),stdin); + } + >> + + #token "[\ \t]+" << zzskip(); >> + #token "[\n]" << zzline++; zzskip(); >> + + #token Word "[a-z]+" + #token Number "[0-9]+" + + + start : (test1)? + | (test2)? + ; + test1 : (Word Word Word Word)? + | (Word Word Word Number)? + ; + test2 : (Word Word Number Word)? + | (Word Word Number Number)? + ; + + Test data which caused infinite loop: + + a 1 a a + +#79. (Changed in 1.33MR10) Use of -fh with multiple parsers + + Previously, antlr always used the pre-processor symbol + STDPCCTS_H as a gate for the file stdpccts.h. This + caused problems when there were multiple parsers defined + because they used the same gate symbol. + + In 1.33MR10, the -fh filename is used to generate the + gate file for stdpccts.h. For instance: + + antlr -fh std_parser1.h + + generates the pre-processor symbol "STDPCCTS_std_parser1_H". + + Reported by Ramanathan Santhanam (ps@kumaran.com). + +#78. (Changed in 1.33MR9) Guard predicates that refer to rules + + ------------------------ + Please refer to Item #87 + ------------------------ + + Guard predicates are processed during an early phase + of antlr (during parsing) before all data structures + are completed. + + There is an apparent bug in earlier versions of 1.33 + which caused guard predicates which contained references + to rules (rather than tokens) to reference a structure + which hadn't yet been initialized. + + In some cases (perhaps all cases) references to rules + in guard predicates resulted in the use of "garbage". + +#79. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com) + + Previously, the maximum length file name was set + arbitrarily to 300 characters in antlr, dlg, and sorcerer. + + The config.h file now attempts to define the maximum length + filename using _MAX_PATH from stdlib.h before falling back + to using the value 300. + +#78. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com) + + Put #ifndef/#endif around definition of ZZLEXBUFSIZE in + antlr. + +#77. (Changed in 1.33MR9) Arithmetic overflow for very large grammars + + In routine HandleAmbiguities() antlr attempts to compute the + number of possible elements in a set that is order of + number-of-tokens raised to the number-of-lookahead-tokens power. + For large grammars or large lookahead (e.g. -ck 7) this can + cause arithmetic overflow. + + With 1.33MR9, arithmetic overflow in this computation is reported + the first time it happens. The program continues to run and + the program branches based on the assumption that the computed + value is larger than any number computed by counting actual cases + because 2**31 is larger than the number of bits in most computers. + + Before 1.33MR9 overflow was not reported. The behavior following + overflow is not predictable by anyone but the original author. + + NOTE + + In 1.33MR10 the warning message is suppressed. + The code which detects the overflow allows the + computation to continue without an error. The + error message itself made made users worry. + +#76. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com) + + Jeff Vincent has convinced me to make ANTLRCommonToken and + ANTLRCommonNoRefCountToken use variable length strings + allocated from the heap rather than fixed length strings. + By suitable definition of setText(), the copy constructor, + and operator =() it is possible to maintain "copy" semantics. + By "copy" semantics I mean that when a token is copied from + an existing token it receives its own, distinct, copy of the + text allocated from the heap rather than simply a pointer + to the original token's text. + + ============================================================ + W * A * R * N * I * N * G + ============================================================ + + It is possible that this may cause problems for some users. + For those users I have included the old version of AToken.h as + pccts/h/AToken_traditional.h. + +#75. (Changed in 1.33MR9) Bruce Guenter (bruceg@qcc.sk.ca) + + Make DLGStringInput const correct. Since this is infrequently + subclassed, it should affect few users, I hope. + +#74. (Changed in 1.33MR9) -o (output directory) option + + Antlr does not properly handle the -o output directory option + when the filename of the grammar contains a directory part. For + example: + + antlr -o outdir pccts_src/myfile.g + + causes antlr create a file called "outdir/pccts_src/myfile.cpp. + It SHOULD create outdir/myfile.cpp + + The suggested code fix has been installed in antlr, dlg, and + Sorcerer. + +#73. (Changed in 1.33MR9) Hoisting of semantic predicates and -mrhoist + + ============================================ + This has been rendered obsolete by Item #117 + ============================================ + +#72. (Changed in 1.33MR9) virtual saveState()/restoreState()/guess_XXX + + The following methods in ANTLRParser were made virtual at + the request of S. Bochnak (S.Bochnak@microTool.com.pl): + + saveState() and restoreState() + guess(), guess_fail(), and guess_done() + +#71. (Changed in 1.33MR9) Access to omitted command line argument + + If a switch requiring arguments is the last thing on the + command line, and the argument is omitted, antlr would core. + + antlr test.g -prc + + instead of + + antlr test.g -prc off + +#70. (Changed in 1.33MR9) Addition of MSVC .dsp and .mak build files + + The following MSVC .dsp and .mak files for pccts and sorcerer + were contributed by Stanislaw Bochnak (S.Bochnak@microTool.com.pl) + and Jeff Vincent (JVincent@novell.com) + + PCCTS Distribution Kit + ---------------------- + pccts/PCCTSMSVC50.dsw + + pccts/antlr/AntlrMSVC50.dsp + pccts/antlr/AntlrMSVC50.mak + + pccts/dlg/DlgMSVC50.dsp + pccts/dlg/DlgMSVC50.mak + + pccts/support/msvc.dsp + + Sorcerer Distribution Kit + ------------------------- + pccts/sorcerer/SorcererMSVC50.dsp + pccts/sorcerer/SorcererMSVC50.mak + + pccts/sorcerer/lib/msvc.dsp + +#69. (Changed in 1.33MR9) Change "unsigned int" to plain "int" + + Declaration of max_token_num in misc.c as "unsigned int" + caused comparison between signed and unsigned ints giving + warning message without any special benefit. + +#68. (Changed in 1.33MR9) Add void return for dlg internal_error() + + Get rid of "no return value" message in internal_error() + in file dlg/support.c and dlg/dlg.h. + +#67. (Changed in Sor) sor.g: lisp() has no return value + + Added a "void" for the return type. + +#66. (Added to Sor) sor.g: ZZLEXBUFSIZE enclosed in #ifndef/#endif + + A user needed to be able to change the ZZLEXBUFSIZE for + sor. Put the definition of ZZLEXBUFSIZE inside #ifndef/#endif + +#65. (Changed in 1.33MR9) PCCTSAST::deepCopy() and ast_dup() bug + + Jeff Vincent (JVincent@novell.com) found that deepCopy() + made new copies of only the direct descendents. No new + copies were made of sibling nodes, Sibling pointers are + set to zero by shallowCopy(). + + PCCTS_AST::deepCopy() has been changed to make a + deep copy in the traditional sense. + + The deepCopy() routine depends on the behavior of + shallowCopy(). In all sor examples I've found, + shallowCopy() zeroes the right and down pointers. + + Original Tree Original deepCopy() Revised deepCopy + ------------- ------------------- ---------------- + a->b->c A A + | | | + d->e->f D D->E->F + | | | + g->h->i G G->H->I + | | + j->k J->K + + While comparing deepCopy() for C++ mode with ast_dup for + C mode I found a problem with ast_dup(). + + Routine ast_dup() has been changed to make a deep copy + in the traditional sense. + + Original Tree Original ast_dup() Revised ast_dup() + ------------- ------------------- ---------------- + a->b->c A->B->C A + | | | + d->e->f D->E->F D->E->F + | | | + g->h->i G->H->I G->H->I + | | | + j->k J->K J->K + + + I believe this affects transform mode sorcerer programs only. + +#64. (Changed in 1.33MR9) anltr/hash.h prototype for killHashTable() + +#63. (Changed in 1.33MR8) h/charptr.h does not zero pointer after free + + The charptr.h routine now zeroes the pointer after free(). + + Reported by Jens Tingleff (jensting@imaginet.fr) + +#62. (Changed in 1.33MR8) ANTLRParser::resynch had static variable + + The static variable "consumed" in ANTLRParser::resynch was + changed into an instance variable of the class with the + name "resynchConsumed". + + Reported by S.Bochnak@microTool.com.pl + +#61. (Changed in 1.33MR8) Using rule>[i,j] when rule has no return values + + Previously, the following code would cause antlr to core when + it tried to generate code for rule1 because rule2 had no return + values ("upward inheritance"): + + rule1 : <<int i; int j>> + rule2 > [i,j] + ; + + rule2 : Anything ; + + Reported by S.Bochnak@microTool.com.pl + + Verified correct operation of antlr MR8 when missing or extra + inheritance arguments for all combinations. When there are + missing or extra arguments code will still be generated even + though this might cause the invocation of a subroutine with + the wrong number of arguments. + +#60. (Changed in 1.33MR7) Major changes to exception handling + + There were significant problems in the handling of exceptions + in 1.33 vanilla. The general problem is that it can only + process one level of exception handler. For example, a named + exception handler, an exception handler for an alternative, or + an exception for a subrule always went to the rule's exception + handler if there was no "catch" which matched the exception. + + In 1.33MR7 the exception handlers properly "nest". If an + exception handler does not have a matching "catch" then the + nextmost outer exception handler is checked for an appropriate + "catch" clause, and so on until an exception handler with an + appropriate "catch" is found. + + There are still undesirable features in the way exception + handlers are implemented, but I do not have time to fix them + at the moment: + + The exception handlers for alternatives are outside the + block containing the alternative. This makes it impossible + to access variables declared in a block or to resume the + parse by "falling through". The parse can still be easily + resumed in other ways, but not in the most natural fashion. + + This results in an inconsistency between named exception + handlers and exception handlers for alternatives. When + an exception handler for an alternative "falls through" + it goes to the nextmost outer handler - not the "normal + action". + + A major difference between 1.33MR7 and 1.33 vanilla is + the default action after an exception is caught: + + 1.33 Vanilla + ------------ + In 1.33 vanilla the signal value is set to zero ("NoSignal") + and the code drops through to the code following the exception. + For named exception handlers this is the "normal action". + For alternative exception handlers this is the rule's handler. + + 1.33MR7 + ------- + In 1.33MR7 the signal value is NOT automatically set to zero. + + There are two cases: + + For named exception handlers: if the signal value has been + set to zero the code drops through to the "normal action". + + For all other cases the code branches to the nextmost outer + exception handler until it reaches the handler for the rule. + + The following macros have been defined for convenience: + + C/C++ Mode Name + -------------------- + (zz)suppressSignal + set signal & return signal arg to 0 ("NoSignal") + (zz)setSignal(intValue) + set signal & return signal arg to some value + (zz)exportSignal + copy the signal value to the return signal arg + + I'm not sure why PCCTS make a distinction between the local + signal value and the return signal argument, but I'm loathe + to change the code. The burden of copying the local signal + value to the return signal argument can be given to the + default signal handler, I suppose. + +#59. (Changed in 1.33MR7) Prototypes for some functions + + Added prototypes for the following functions to antlr.h + + zzconsumeUntil() + zzconsumeUntilToken() + +#58. (Changed in 1.33MR7) Added definition of zzbufsize to dlgauto.h + +#57. (Changed in 1.33MR7) Format of #line directive + + Previously, the -gl directive for line 1234 would + resemble: "# 1234 filename.g". This caused problems + for some compilers/pre-processors. In MR7 it generates + "#line 1234 filename.g". + +#56. (Added in 1.33MR7) Jan Mikkelsen <janm@zeta.org.au> + + Move PURIFY macro invocation to after rule's init action. + +#55. (Fixed in 1.33MR7) Uninitialized variables in ANTLRParser + + Member variables inf_labase and inf_last were not initialized. + (See item #50.) + +#54. (Fixed in 1.33MR6) Brad Schick (schick@interacess.com) + + Previously, the following constructs generated the same + code: + + rule1 : (A B C)? + | something-else + ; + + rule2 : (A B C)? () + | something-else + ; + + In all versions of pccts rule1 guesses (A B C) and then + consume all three tokens if the guess succeeds. In MR6 + rule2 guesses (A B C) but consumes NONE of the tokens + when the guess succeeds because "()" matches epsilon. + +#53. (Explanation for 1.33MR6) What happens after an exception is caught ? + + The Book is silent about what happens after an exception + is caught. + + The following code fragment prints "Error Action" followed + by "Normal Action". + + test : Word ex:Number <<printf("Normal Action\n");>> + exception[ex] + catch NoViableAlt: + <<printf("Error Action\n");>> + ; + + The reason for "Normal Action" is that the normal flow of the + program after a user-written exception handler is to "drop through". + In the case of an exception handler for a rule this results in + the execution of a "return" statement. In the case of an + exception handler attached to an alternative, rule, or token + this is the code that would have executed had there been no + exception. + + The user can achieve the desired result by using a "return" + statement. + + test : Word ex:Number <<printf("Normal Action\n");>> + exception[ex] + catch NoViableAlt: + <<printf("Error Action\n"); return;>> + ; + + The most powerful mechanism for recovery from parse errors + in pccts is syntactic predicates because they provide + backtracking. Exceptions allow "return", "break", + "consumeUntil(...)", "goto _handler", "goto _fail", and + changing the _signal value. + +#52. (Fixed in 1.33MR6) Exceptions without syntactic predicates + + The following generates bad code in 1.33 if no syntactic + predicates are present in the grammar. + + test : Word ex:Number <<printf("Normal Action\n");>> + exception[ex] + catch NoViableAlt: + <<printf("Error Action\n");>> + + There is a reference to a guess variable. In C mode + this causes a compiler error. In C++ mode it generates + an extraneous check on member "guessing". + + In MR6 correct code is generated for both C and C++ mode. + +#51. (Added to 1.33MR6) Exception operator "@" used without exceptions + + In MR6 added a warning when the exception operator "@" is + used and no exception group is defined. This is probably + a case where "\@" or "@" is meant. + +#50. (Fixed in 1.33MR6) Gunnar Rxnning (gunnar@candleweb.no) + http://www.candleweb.no/~gunnar/ + + Routines zzsave_antlr_state and zzrestore_antlr_state don't + save and restore all the data needed when switching states. + + Suggested patch applied to antlr.h and err.h for MR6. + +#49. (Fixed in 1.33MR6) Sinan Karasu (sinan@boeing.com) + + Generated code failed to turn off guess mode when leaving a + (...)+ block which contained a guess block. The result was + an infinite loop. For example: + + rule : ( + (x)? + | y + )+ + + Suggested code fix implemented in MR6. Replaced + + ... else if (zzcnt>1) break; + + with: + + C++ mode: + ... else if (zzcnt>1) {if (!zzrv) zzGUESS_DONE; break;}; + C mode: + ... else if (zzcnt>1) {if (zzguessing) zzGUESS_DONE; break;}; + +#48. (Fixed in 1.33MR6) Invalid exception element causes core + + A label attached to an invalid construct can cause + pccts to crash while processing the exception associated + with the label. For example: + + rule : t:(B C) + exception[t] catch MismatchedToken: <<printf(...);>> + + Version MR6 generates the message: + + reference in exception handler to undefined label 't' + +#47. (Fixed in 1.33MR6) Manuel Ornato + + Under some circumstances involving a k >1 or ck >1 + grammar and a loop block (i.e. (...)* ) pccts will + fail to detect a syntax error and loop indefinitely. + The problem did not exist in 1.20, but has existed + from 1.23 to the present. + + Fixed in MR6. + + --------------------------------------------------- + Complete test program + --------------------------------------------------- + #header<< + #include <stdio.h> + #include "charptr.h" + >> + + << + #include "charptr.c" + main () + { + ANTLR(global(),stdin); + } + >> + + #token "[\ \t]+" << zzskip(); >> + #token "[\n]" << zzline++; zzskip(); >> + + #token B "b" + #token C "c" + #token D "d" + #token E "e" + #token LP "\(" + #token RP "\)" + + #token ANTLREOF "@" + + global : ( + (E liste) + | liste + | listed + ) ANTLREOF + ; + + listeb : LP ( B ( B | C )* ) RP ; + listec : LP ( C ( B | C )* ) RP ; + listed : LP ( D ( B | C )* ) RP ; + liste : ( listeb | listec )* ; + + --------------------------------------------------- + Sample data causing infinite loop + --------------------------------------------------- + e (d c) + --------------------------------------------------- + +#46. (Fixed in 1.33MR6) Robert Richter + (Robert.Richter@infotech.tu-chemnitz.de) + + This item from the list of known problems was + fixed by item #18 (below). + +#45. (Fixed in 1.33MR6) Brad Schick (schick@interaccess.com) + + The dependency scanner in VC++ mistakenly sees a + reference to an MPW #include file even though properly + #ifdef/#endif in config.h. The suggested workaround + has been implemented: + + #ifdef MPW + ..... + #define MPW_CursorCtl_Header <CursorCtl.h> + #include MPW_CursorCtl_Header + ..... + #endif + +#44. (Fixed in 1.33MR6) cast malloc() to (char *) in charptr.c + + Added (char *) cast for systems where malloc returns "void *". + +#43. (Added to 1.33MR6) Bruce Guenter (bruceg@qcc.sk.ca) + + Add setLeft() and setUp methods to ASTDoublyLinkedBase + for symmetry with setRight() and setDown() methods. + +#42. (Fixed in 1.33MR6) Jeff Katcher (jkatcher@nortel.ca) + + C++ style comment in antlr.c corrected. + +#41. (Added in 1.33MR6) antlr -stdout + + Using "antlr -stdout ..." forces the text that would + normally go to the grammar.c or grammar.cpp file to + stdout. + +#40. (Added in 1.33MR6) antlr -tab to change tab stops + + Using "antlr -tab number ..." changes the tab stops + for the grammar.c or grammar.cpp file. The number + must be between 0 and 8. Using 0 gives tab characters, + values between 1 and 8 give the appropriate number of + space characters. + +#39. (Fixed in 1.33MR5) Jan Mikkelsen <janm@zeta.org.au> + + Commas in function prototype still not correct under + some circumstances. Suggested code fix installed. + +#38. (Fixed in 1.33MR5) ANTLRTokenBuffer constructor + + Have ANTLRTokenBuffer ctor initialize member "parser" to null. + +#37. (Fixed in 1.33MR4) Bruce Guenter (bruceg@qcc.sk.ca) + + In ANTLRParser::FAIL(int k,...) released memory pointed to by + f[i] (as well as f itself. Should only free f itself. + +#36. (Fixed in 1.33MR3) Cortland D. Starrett (cort@shay.ecn.purdue.edu) + + Neglected to properly declare isDLGmaxToken() when fixing problem + reported by Andreas Magnusson. + + Undo "_retv=NULL;" change which caused problems for return values + from rules whose return values weren't pointers. + + Failed to create bin directory if it didn't exist. + +#35. (Fixed in 1.33MR2) Andreas Magnusson +(Andreas.Magnusson@mailbox.swipnet.se) + + Repair bug introduced by 1.33MR1 for #tokdefs. The original fix + placed "DLGmaxToken=9999" and "DLGminToken=0" in the TokenType enum + in order to fix a problem with an aggressive compiler assigning an 8 + bit enum which might be too narrow. This caused #tokdefs to assume + that there were 9999 real tokens. The repair to the fix causes antlr to + ignore TokenTypes "DLGmaxToken" and "DLGminToken" in a #tokdefs file. + +#34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue) + + Previously there was no public function for changing the line + number maintained by the lexer. + +#33. (Fixed in 1.33MR1) Franklin Chen (chen@adi.com) + + Accidental use of EXIT_FAILURE rather than PCCTS_EXIT_FAILURE + in pccts/h/AParser.cpp. + +#32. (Fixed in 1.33MR1) Franklin Chen (chen@adi.com) + + In PCCTSAST.cpp lines 405 and 466: Change + + free (t) + to + free ( (char *)t ); + + to match prototype. + +#31. (Added to 1.33MR1) Pointer to parser in ANTLRTokenBuffer + Pointer to parser in DLGLexerBase + + The ANTLRTokenBuffer class now contains a pointer to the + parser which is using it. This is established by the + ANTLRParser constructor calling ANTLRTokenBuffer:: + setParser(ANTLRParser *p). + + When ANTLRTokenBuffer::setParser(ANTLRParser *p) is + called it saves the pointer to the parser and then + calls ANTLRTokenStream::setParser(ANTLRParser *p) + so that the lexer can also save a pointer to the + parser. + + There is also a function getParser() in each class + with the obvious purpose. + + It is possible that these functions will return NULL + under some circumstances (e.g. a non-DLG lexer is used). + +#30. (Added to 1.33MR1) function tokenName(int token) standard + + The generated parser class now includes the + function: + + static const ANTLRChar * tokenName(int token) + + which returns a pointer to the "name" corresponding + to the token. + + The base class (ANTLRParser) always includes the + member function: + + const ANTLRChar * parserTokenName(int token) + + which can be accessed by objects which have a pointer + to an ANTLRParser, but do not know the name of the + parser class (e.g. ANTLRTokenBuffer and DLGLexerBase). + +#29. (Added to 1.33MR1) Debugging DLG lexers + + If the pre-processor symbol DEBUG_LEXER is defined + then DLexerBase will include code for printing out + key information about tokens which are recognized. + + The debug feature of the lexer is controlled by: + + int previousDebugValue=lexer.debugLexer(newValue); + + a value of 0 disables output + a value of 1 enables output + + Even if the lexer debug code is compiled into DLexerBase + it must be enabled before any output is generated. For + example: + + DLGFileInput in(stdin); + MyDLG lexer(&in,2000); + + lexer.setToken(&aToken); + + #if DEBUG_LEXER + lexer.debugLexer(1); // enable debug information + #endif + +#28. (Added to 1.33MR1) More control over DLG header + + Version 1.33MR1 adds the following directives to PCCTS + for C++ mode: + + #lexprefix <<source code>> + + Adds source code to the DLGLexer.h file + after the #include "DLexerBase.h" but + before the start of the class definition. + + #lexmember <<source code>> + + Adds source code to the DLGLexer.h file + as part of the DLGLexer class body. It + appears immediately after the start of + the class and a "public: statement. + +#27. (Fixed in 1.33MR1) Comments in DLG actions + + Previously, DLG would not recognize comments as a special case. + Thus, ">>" in the comments would cause errors. This is fixed. + +#26. (Fixed in 1.33MR1) Removed static variables from error routines + + Previously, the existence of statically allocated variables + in some of the parser's member functions posed a danger when + there was more than one parser active. + + Replaced with dynamically allocated/freed variables in 1.33MR1. + +#25. (Fixed in 1.33MR1) Use of string literals in semantic predicates + + Previously, it was not possible to place a string literal in + a semantic predicate because it was not properly "stringized" + for the report of a failed predicate. + +#24. (Fixed in 1.33MR1) Continuation lines for semantic predicates + + Previously, it was not possible to continue semantic + predicates across a line because it was not properly + "stringized" for the report of a failed predicate. + + rule : <<ifXYZ()>>?[ a very + long statement ] + +#23. (Fixed in 1.33MR1) {...} envelope for failed semantic predicates + + Previously, there was a code generation error for failed + semantic predicates: + + rule : <<xyz()>>?[ stmt1; stmt2; ] + + which generated code which resembled: + + if (! xyz()) stmt1; stmt2; + + It now puts the statements in a {...} envelope: + + if (! xyz()) { stmt1; stmt2; }; + +#22. (Fixed in 1.33MR1) Continuation of #token across lines using "\" + + Previously, it was not possible to continue a #token regular + expression across a line. The trailing "\" and newline caused + a newline to be inserted into the regular expression by DLG. + + Fixed in 1.33MR1. + +#21. (Fixed in 1.33MR1) Use of ">>" (right shift operator in DLG actions + + It is now possible to use the C++ right shift operator ">>" + in DLG actions by using the normal escapes: + + #token "shift-right" << value=value \>\> 1;>> + +#20. (Version 1.33/19-Jan-97 Karl Eccleson <karle@microrobotics.co.uk> + P.A. Keller (P.A.Keller@bath.ac.uk) + + There is a problem due to using exceptions with the -gh option. + + Suggested fix now in 1.33MR1. + +#19. (Fixed in 1.33MR1) Tom Piscotti and John Lilley + + There were problems suppressing messages to stdin and stdout + when running in a window environment because some functions + which uses fprint were not virtual. + + Suggested change now in 1.33MR1. + + I believe all functions containing error messages (excluding those + indicating internal inconsistency) have been placed in functions + which are virtual. + +#18. (Version 1.33/ 22-Nov-96) John Bair (jbair@iftime.com) + + Under some combination of options a required "return _retv" is + not generated. + + Suggested fix now in 1.33MR1. + +#17. (Version 1.33/3-Sep-96) Ron House (house@helios.usq.edu.au) + + The routine ASTBase::predorder_action omits two "tree->" + prefixes, which results in the preorder_action belonging + to the wrong node to be invoked. + + Suggested fix now in 1.33MR1. + +#16. (Version 1.33/7-Jun-96) Eli Sternheim <eli@interhdl.com> + + Routine consumeUntilToken() does not check for end-of-file + condition. + + Suggested fix now in 1.33MR1. + +#15. (Version 1.33/8 Apr 96) Asgeir Olafsson <olafsson@cstar.ac.com> + + Problem with tree duplication of doubly linked ASTs in ASTBase.cpp. + + Suggested fix now in 1.33MR1. + +#14. (Version 1.33/28-Feb-96) Andreas.Magnusson@mailbox.swipnet.se + + Problem with definition of operator = (const ANTLRTokenPtr rhs). + + Suggested fix now in 1.33MR1. + +#13. (Version 1.33/13-Feb-96) Franklin Chen (chen@adi.com) + + Sun C++ Compiler 3.0.1 can't compile testcpp/1 due to goto in + block with destructors. + + Apparently fixed. Can't locate "goto". + +#12. (Version 1.33/10-Nov-95) Minor problems with 1.33 code + + The following items have been fixed in 1.33MR1: + + 1. pccts/antlr/main.c line 142 + + "void" appears in classic C code + + 2. no makefile in support/genmk + + 3. EXIT_FAILURE/_SUCCESS instead of PCCTS_EXIT_FAILURE/_SUCCESS + + pccts/h/PCCTSAST.cpp + pccts/h/DLexerBase.cpp + pccts/testcpp/6/test.g + + 4. use of "signed int" isn't accepted by AT&T cfront + + pccts/h/PCCTSAST.h line 42 + + 5. in call to ANTLRParser::FAIL the var arg err_k is passed as + "int" but is declared "unsigned int". + + 6. I believe that a failed validation predicate still does not + get put in a "{...}" envelope, despite the release notes. + + 7. The #token ">>" appearing in the DLG grammar description + causes DLG to generate the string literal "\>\>" which + is non-conforming and will cause some compilers to + complain (scan.c function act10 line 143 of source code). + +#11. (Version 1.32b6) Dave Kuhlman (dkuhlman@netcom.com) + + Problem with file close in gen.c. Already fixed in 1.33. + +#10. (Version 1.32b6/29-Aug-95) + + pccts/antlr/main.c contains a C++ style comments on lines 149 + and 176 which causes problems for most C compilers. + + Already fixed in 1.33. + +#9. (Version 1.32b4/14-Mar-95) dlgauto.h #include "config.h" + + The file pccts/h/dlgauto.h should probably contain a #include + "config.h" as it uses the #define symbol __USE_PROTOS. + + Added to 1.33MR1. + +#8. (Version 1.32b4/6-Mar-95) Michael T. Richter (mtr@igs.net) + + In C++ output mode anonymous tokens from in-line regular expressions + can create enum values which are too wide for the datatype of the enum + assigned by the C++ compiler. + + Fixed in 1.33MR1. + +#7. (Version 1.32b4/6-Mar-95) C++ does not imply __STDC__ + + In err.h the combination of # directives assumes that a C++ + compiler has __STDC__ defined. This is not necessarily true. + + This problem also appears in the use of __USE_PROTOS which + is appropriate for both Standard C and C++ in antlr/gen.c + and antlr/lex.c + + Fixed in 1.33MR1. + +#6. (Version 1.32 ?/15-Feb-95) Name conflict for "TokenType" + + Already fixed in 1.33. + +#5. (23-Jan-95) Douglas_Cuthbertson.JTIDS@jtids_qmail.hanscom.af.mil + + The fail action following a semantic predicate is not enclosed in + "{...}". This can lead to problems when the fail action contains + more than one statement. + + Fixed in 1.33MR1. + +#4 . (Version 1.33/31-Mar-96) jlilley@empathy.com (John Lilley) + + Put briefly, a semantic predicate ought to abort a guess if it fails. + + Correction suggested by J. Lilley has been added to 1.33MR1. + +#3 . (Version 1.33) P.A.Keller@bath.ac.uk + + Extra commas are placed in the K&R style argument list for rules + when using both exceptions and ASTs. + + Fixed in 1.33MR1. + +#2. (Version 1.32b6/2-Oct-95) Brad Schick <schick@interaccess.com> + + Construct #[] generates zzastnew() in C++ mode. + + Already fixed in 1.33. + +#1. (Version 1.33) Bob Bailey (robert@oakhill.sps.mot.com) + + Previously, config.h assumed that all PC systems required + "short" file names. The user can now override that + assumption with "#define LONGFILENAMES". + + Added to 1.33MR1. |