1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
|
Libcroco parser architecture
-----------------------------
Author: Dodji Seketeli <dodji@seketeli.org>
$Id$
I) Forethoughts.
===================
Libcroco's parser is a simple recursive descent parser.
The major design focus has been simplicity, reliability and
conformance.
Simplicity
-----------
We want the code to be maintainable by anyone who knows the CSS spec
and who knows how to code in C. Therefore, we avoid to overuse
the C preprocessor magic and all the tricks that tend to turn C into
a maintenance nightmare.
We also try to adhere to the Gnome coding guidelines specified
at http://developer.gnome.org/doc/guides/programming-guidelines.
Reliability
-----------
Each single function of the libcroco library should never crash,
and this, whatever the arguments it takes.
As a consequence we tend to be paranoid when it comes to check
pointers values before dereferencing them for example...
Conformance
-----------
We try to stick to the CSS spec. We know this is almost impossible to achieve
given the resources we have but we think it is a sane target to chase.
II) Overall architecture
=========================
The parser is organized around several main classes:
1/ CRInput
2/ CRTknzr (Tokenizer or lexer)
3/ CRParser
4/ CROMParser
II.1 The CRInput class
-----------------------
The CRInput class provides the abstraction of
an utf8-encoded character stream.
Ideally, it should abstract local data sources
(local files and in-memory buffers)
and remote data sources (sockets, url-identified resources) but for the
moment, it can only abstract local data sources.
Adding a new type of data source should be transparent for the
classes that already use CRInput. After all, this is what abstraction is about :)
II.2 The CRTknzr class
----------------------
The main job of the tokenizer (or lexer) is to
provide a get_next_token() method.
This methods returns the next CSS token found in the input stream.
(Note that the input stream here is an instance of CRInput).
This provides an extremely useful facility to the parser.
II.3 The CRParser class
-------------------------
The core of the parser.
The main job of this class is to provide a cr_parser_parse_stylesheet()
method. During the parsing (the execution of the cr_parser_stylesheet())
the parser sends events to notify the application when it encounters
remarkable CSS constructions. This is the SAC (Simple API for CSS) API model.
To achieve that task, almost each production of the CSS grammar
has a matching parsing function (or method) in this class.
For example, the following production named "ruleset" (specified in the
CSS2 spec in appendix D.1):
ruleset : selector [ ',' S* selector ]*
'{' S* declaration [ ';' S* declaration ]* '}' S*
is "implemented" by the cr_parser_parse_ruleset() method.
The same thing applies for the "selector" production:
selector : simple_selector [ combinator simple_selector ]*
which is implemented by the cr_parser_parse_selector() method... and so on
and so forth.
II.3.1 Structure of a parsing method.
-------------------------------------
A parsing method (e.g cr_parser_parse_ruleset()) is there
to:
* try to recognize a substring of the incoming character string
as something that matches a given CSS grammar production.
e.g: the job of the cr_parser_parse_ruleset() is to try
to recognize if "what" comes next in the input stream
is a CSS2 "ruleset".
* build a basic abstract data structure to
store the information encountered
during the parsing of the current character string.
eg: cr_parser_parse_declaration() has the following prototype:
enum CRStatus
cr_parser_parse_declaration (CRParser *a_this, GString **a_property,
CRTerm **a_value) ;
In case of successful parsing, this method returns
(via its parameters) the property _and_ the
value of the CSS2 declaration.
Note that a CSS2 declaration is specified as follows:
declaration : property ':' S* expr prio?
| /* empty */
* After completion, say if the parsing has succeeded or not.
eg: cr_parser_parse_declaration() returns CR_OK if the
parsing has succeeded, and error code otherwise. Obviously,
the out parameters "a_property" and "a_value" are valid if and only
if the return value is CR_OK.
* whenever the function is parsing a construct that must
be notified to the user as part of the SAC API spec, notify
the user by calling the right SAC callback.
* if the parsing failed, leave the position in the stream unchanged.
That is, the position in the character stream should be as if
the parsing function hasn't been called at all.
II.4 The selection Engine.
--------------------------
Hmmh, I should kick my ass to write this down ...
|