diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 05:54:39 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 05:54:39 +0000 |
commit | 267c6f2ac71f92999e969232431ba04678e7437e (patch) | |
tree | 358c9467650e1d0a1d7227a21dac2e3d08b622b2 /sw/README.md | |
parent | Initial commit. (diff) | |
download | libreoffice-267c6f2ac71f92999e969232431ba04678e7437e.tar.xz libreoffice-267c6f2ac71f92999e969232431ba04678e7437e.zip |
Adding upstream version 4:24.2.0.upstream/4%24.2.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'sw/README.md')
-rw-r--r-- | sw/README.md | 211 |
1 files changed, 211 insertions, 0 deletions
diff --git a/sw/README.md b/sw/README.md new file mode 100644 index 0000000000..58a6cc9e1b --- /dev/null +++ b/sw/README.md @@ -0,0 +1,211 @@ +# Writer Application Code + +Exact history was lost before Sept. 18th, 2000, but old source code +comments show that Writer core dates back until at least November +1990. + +## Module Contents + * `inc`: headers available to all source files inside the module + * `qa`: unit, slow and subsequent tests + * `sdi` + * `source`: see below + * `uiconfig`: user interface configuration + * `util`: UNO passive registration config + +## Source Contents + * `core`: Writer core (document model, layout, UNO API implementation) + * `filter`: Writer internal filters + * `ascii`: plain text filter + * `basflt` + * `docx`: wrapper for the UNO DOCX import filter (in writerfilter) for autotext purposes + * `html`: HTML filter + * `inc`: include files for filters + * `rtf`: thin copy&paste helper around the UNO RTF import filter (in writerfilter) + * `writer` + * `ww8`: DOC import, DOC/DOCX/RTF export + * `xml`: ODF import/export, subclassed from xmloff (where most of the work is done) + * `uibase`: user interface (those parts that are linked into `sw` & always loaded) + * `ui`: user interface (optional parts that are loaded on demand (`swui`)) + +## Core + +There is a good overview documentation of basic architecture of Writer core +in the OOo wiki: + +- <https://wiki.openoffice.org/wiki/Writer/Core_And_Layout> +- <https://wiki.openoffice.org/wiki/Writer/Text_Formatting> + +Writer specific WhichIds are defined in `sw/inc/hintids.hxx`. + +The details below are mainly about details missing from the wiki pages. + +### SwDoc + +The central class for a document is `SwDoc`, which represents a document. + +A lot of the functionality is split out into separate Manager classes, +each of which implements some `IDocument*` interface; there are +`SwDoc::getIDocument*()` methods to retrieve the managers. + +However there are still too many members and methods in this class, +many of which could be moved to some Manager or other... + +### SwNodes + +Basically a (fancy) array of `SwNode` pointers. There are special subclasses of +`SwNode` (`SwStartNode` and `SwEndNode`) which are used to encode a nested tree +structure into the flat array; the range of nodes from `SwStartNode` to its +corresponding `SwEndNode` is sometimes called a "section" (but is not necessarily +what the high-level document model calls a "Section"; that is just one of the +possibilities). + +The `SwNodes` contains the following top-level sections: + +1. Empty +2. Footnote content +3. Frame / Header / Footer content +4. Deleted Change Tracking content +5. Body content + +### Undo + +The Undo/Redo information is stored in a `sw::UndoManager` member of `SwDoc`, +which implements the `IDocumentUndoRedo` interface. +Its members include a `SwNodes` array containing the document content that +is currently not in the actual document but required for Undo/Redo, and +a stack of `SwUndo` actions, each of which represents one user-visible +Undo/Redo step. + +There are also `ListActions` which internally contain several individual `SwUndo` +actions; these are created by the StartUndo/EndUndo wrapper methods. + +### Text Attributes + +The sub-structure of paragraphs is stored in the `SwpHintsArray` member +`SwTextNode::m_pSwpHints`. There is a base class `SwTextAttr` with numerous +subclasses; the `SwTextAttr` has a start and end index and a `SfxPoolItem` +to store the actual formatting attribute. + +There are several sub-categories of `SwTextAttr`: + +- formatting attributes: Character Styles (`SwTextCharFormat`, `RES_TXTATR_CHARFMT`) + and Automatic Styles (no special class, `RES_TXTATR_AUTOFMT`): + these are handled by `SwpHintsArray::BuildPortions` and MergePortions, + which create non-overlapping portions of formatting attributes. + +- nesting attributes: Hyperlinks (`SwTextINetFormat`, `RES_TXTATR_INETFMT`), + Ruby (`SwTextRuby`, `RES_TXTATR_CJK_RUBY`) and Meta/MetaField (`SwTextMeta`, + `RES_TXTATR_META/RES_TXTATR_METAFIELD`): + these maintain a properly nested tree structure. + The Meta/Metafield are "special" because they have both start/end + and a dummy character at the start. + +- misc. attributes: Reference Marks, ToX Marks + +- attributes without end: Fields, Footnotes, Flys (`AS_CHAR`) + These all have a corresponding dummy character in the paragraph text, which + is a placeholder for the "expansion" of the attribute, e.g. field content. + +### Fields + +There are multiple model classes involved for fields: + +- `enum SwFieldIds` enumerates the different types of fields. +- `SwFieldType` contains some shared stuff for all fields of a type. + There are many subclasses of `SwFieldType`, one for each different type + of field. + For most types of fields there is one shared instance of this per type, + which is created in `DocumentFieldsManager::InitFieldTypes()` + but for some there are more than one, and they are dynamically created, see + `DocumentFieldsManager::InsertFieldType()`. An example for the latter are + variable fields (`SwFieldIds::GetExp/SwFieldIds::SetExp`), with one `SwFieldType` per + variable. +- `SwXFieldMaster` is the UNO wrapper of a field type. + It is a `SwClient` registered at the `SwFieldType`. + Its life-cycle is determined by UNO clients outside of `sw`; it will get + disposed when the `SwFieldType` dies. +- `SwFormatField` is the `SfxPoolItem` of a field. + The `SwFormatField` is a `SwClient` registered at its `SwFieldType`. + The `SwFormatField` owns the `SwField` of the field. +- `SwField` contains the core logic of a field. + The `SwField` is owned by the `SwFormatField` of the field. + There are many subclasses of `SwField`, one for each different type of field. + Note that there are not many places that can Expand the field to its + correct value, since for example page number fields require a View + with an up to date layout; therefore the correct expansion is cached. +- `SwTextField` is the text attribute of a field. + It owns the `SwFormatField` of the field (like all text attributes). +- `SwXTextField` is the UNO wrapper object of a field. + It is a `SwClient` registered at the `SwFormatField`. + Its life-cycle is determined by UNO clients outside of `sw`; it will get + disposed when the `SwFormatField` dies. + +### Lists + +- `SwNumFormat` (subclass of `SvxNumFormat`) determines the formatting of a single + numbering level. + +- `SwNumRule` (NOT a subclass of `SvxNumRule`) is a *list style*, containing one + `SwNumFormat` per list level. + `SwNumRule::maTextNodeList` is the list of `SwTextNode` that have this list style + applied. + +- `SwNumberTreeNode` is a base class that represents an abstract node in a + hierarchical tree of numbered nodes. + +- `SwNodeNum` is the subclass of `SwNumberTreeNode` that connects it with an + actual `SwTextNode` and also with a `SwNumRule`; + `SwTextNode::mpNodeNum` points back in the other direction + +- `SwList` represents a list, which is (mostly) a vector of `SwNodeNum` trees, + one per `SwNodes` top-level section (why that?). + +- `IDocumentListsAccess`, `sw::DocumentListsManager` owns all `SwList` instances, + and maintains mappings: + + from list-id to `SwList` + + from list style name to `SwList` (the "default" `SwList` for that list style) + +- `IDocumentListItems`, `sw::DocumentListItemsManager` contains a set of all + `SwNodeNum` instances, ordered by `SwNode` index + +- the special Outline numbering rule: `SwDoc::mpOutlineRule` + +- `IDocumentOutlineNodes`, `sw::DocumentOutlineNodesManager` maintain + a list (which is actually stored in `SwNodes::m_pOutlineNodes`) of `SwTextNodes` + that either have the Outline numrule applied, + or have the `RES_PARATR_OUTLINELEVEL` item set (note that in the latter case, + the `SwTextNode` does not have a `SwNodeNum` and is not associated with the + `SwDoc::mpOutlineRule`). + +- `SwTextNodes` and paragraph styles have items/properties: + + `RES_PARATR_OUTLINELEVEL/"OutlineLevel"` to specify an outline level without + necessarily having the outline `SwNumRule` assigned + + `RES_PARATR_NUMRULE/"NumberingStyleName"` the list style to apply; may be + empty `""` which means no list style (to override inherited value) + Only `SwTextNode` has these items: + + `RES_PARATR_LIST_ID/"ListId"` + determines the `SwList` to which the node is added + + `RES_PARATR_LIST_LEVEL/"NumberingLevel"` + the level at which the `SwTextNode` will appear in the list + + `RES_PARATR_LIST_ISRESTART/"ParaIsNumberingRestart"` + restart numbering sequence at this `SwTextNode` + + `RES_PARATR_LIST_RESTARTVALUE/"NumberingStartValue"` + restart numbering sequence at this `SwTextNode` with this value + + `RES_PARATR_LIST_ISCOUNTED/"NumberingIsNumber"` + determines if the node is actually counted in the numbering sequence; + these are different from `"phantoms"` because there's still a `SwTextNode`. + +Note that there is no UNO service to represent a list. + +### Layout + +The layout is a tree of `SwFrame` subclasses, the following relationships are +possible between frames: + +- You can visit the tree by following the upper, lower, next and previous pointers. +- The functionality of flowing of a frame across multiple parents (e.g. pages) + is implemented in `SwFlowFrame`, which is not an `SwFrame` subclass. The logical + chain of such frames can be visited using the follow and precede pointers. + ("Leaf" is a term that refers to such a relationship.) +- In case a frame is split into multiple parts, then the first one is called + master, while the others are called follows. |