diff options
Diffstat (limited to 'src/3rdparty/libcroco/docs/design/parser-architecture.txt')
-rw-r--r-- | src/3rdparty/libcroco/docs/design/parser-architecture.txt | 146 |
1 files changed, 146 insertions, 0 deletions
diff --git a/src/3rdparty/libcroco/docs/design/parser-architecture.txt b/src/3rdparty/libcroco/docs/design/parser-architecture.txt new file mode 100644 index 0000000..67b7713 --- /dev/null +++ b/src/3rdparty/libcroco/docs/design/parser-architecture.txt @@ -0,0 +1,146 @@ +Libcroco parser architecture +----------------------------- + +Author: Dodji Seketeli <dodji@seketeli.org> + +$Id$ + +I) Forethoughts. +=================== + +Libcroco's parser is a simple recursive descent parser. +The major design focus has been simplicity, reliability and +conformance. + +Simplicity +----------- +We want the code to be maintainable by anyone who knows the CSS spec +and who knows how to code in C. Therefore, we avoid to overuse +the C preprocessor magic and all the tricks that tend to turn C into +a maintenance nightmare. + +We also try to adhere to the Gnome coding guidelines specified +at http://developer.gnome.org/doc/guides/programming-guidelines. + + +Reliability +----------- +Each single function of the libcroco library should never crash, +and this, whatever the arguments it takes. +As a consequence we tend to be paranoid when it comes to check +pointers values before dereferencing them for example... + +Conformance +----------- +We try to stick to the CSS spec. We know this is almost impossible to achieve +given the resources we have but we think it is a sane target to chase. + +II) Overall architecture +========================= +The parser is organized around several main classes: + +1/ CRInput +2/ CRTknzr (Tokenizer or lexer) +3/ CRParser +4/ CROMParser + +II.1 The CRInput class +----------------------- +The CRInput class provides the abstraction of +an utf8-encoded character stream. + +Ideally, it should abstract local data sources +(local files and in-memory buffers) +and remote data sources (sockets, url-identified resources) but for the +moment, it can only abstract local data sources. + +Adding a new type of data source should be transparent for the +classes that already use CRInput. After all, this is what abstraction is about :) + + +II.2 The CRTknzr class +---------------------- +The main job of the tokenizer (or lexer) is to +provide a get_next_token() method. +This methods returns the next CSS token found in the input stream. +(Note that the input stream here is an instance of CRInput). + +This provides an extremely useful facility to the parser. + +II.3 The CRParser class +------------------------- +The core of the parser. + +The main job of this class is to provide a cr_parser_parse_stylesheet() +method. During the parsing (the execution of the cr_parser_stylesheet()) +the parser sends events to notify the application when it encounters +remarkable CSS constructions. This is the SAC (Simple API for CSS) API model. + +To achieve that task, almost each production of the CSS grammar +has a matching parsing function (or method) in this class. + +For example, the following production named "ruleset" (specified in the +CSS2 spec in appendix D.1): + +ruleset : selector [ ',' S* selector ]* + '{' S* declaration [ ';' S* declaration ]* '}' S* + +is "implemented" by the cr_parser_parse_ruleset() method. + +The same thing applies for the "selector" production: + +selector : simple_selector [ combinator simple_selector ]* + +which is implemented by the cr_parser_parse_selector() method... and so on +and so forth. + +II.3.1 Structure of a parsing method. +------------------------------------- +A parsing method (e.g cr_parser_parse_ruleset()) is there +to: + + * try to recognize a substring of the incoming character string + as something that matches a given CSS grammar production. + + e.g: the job of the cr_parser_parse_ruleset() is to try + to recognize if "what" comes next in the input stream + is a CSS2 "ruleset". + + * build a basic abstract data structure to + store the information encountered + during the parsing of the current character string. + + eg: cr_parser_parse_declaration() has the following prototype: + + enum CRStatus + cr_parser_parse_declaration (CRParser *a_this, GString **a_property, + CRTerm **a_value) ; + + In case of successful parsing, this method returns + (via its parameters) the property _and_ the + value of the CSS2 declaration. + Note that a CSS2 declaration is specified as follows: + + declaration : property ':' S* expr prio? + | /* empty */ + + * After completion, say if the parsing has succeeded or not. + + eg: cr_parser_parse_declaration() returns CR_OK if the + parsing has succeeded, and error code otherwise. Obviously, + the out parameters "a_property" and "a_value" are valid if and only + if the return value is CR_OK. + + * whenever the function is parsing a construct that must + be notified to the user as part of the SAC API spec, notify + the user by calling the right SAC callback. + + * if the parsing failed, leave the position in the stream unchanged. + That is, the position in the character stream should be as if + the parsing function hasn't been called at all. + + +II.4 The selection Engine. +-------------------------- + +Hmmh, I should kick my ass to write this down ...
\ No newline at end of file |