5 files changed, 1206 insertions, 0 deletions
diff --git a/doc/overview/doc-orcus.rst b/doc/overview/doc-orcus.rst
new file mode 100644
index 0000000..0577d4e
--- /dev/null
+++ b/doc/overview/doc-orcus.rst
@@ -0,0 +1,176 @@
+
+.. highlight:: cpp
+
+Use orcus's spreadsheet document class
+======================================
+
+If you want to use orcus' :cpp:class:`~orcus::spreadsheet::document` as your
+document store, you can use the :cpp:class:`~orcus::spreadsheet::import_factory`
+class that orcus provides which already implements all necessary interfaces.
+The example code shown below illustrates how to do this:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+
+This example code loads a file saved in the Open Document Spreadsheet format
+stored in a directory whose path is to be defined in the environment variable
+named ``INPUTDIR``.  In this example, we don't check for the validity of ``INPUTDIR``
+for bravity's sake.
+
+The input file consists of the following content on its first sheet.
+
+.. figure:: /_static/images/overview/doc-content.png
+
+While it is not clear from this screenshot, cell C2 contains the formula
+**CONCATENATE(A2, " ", B2)** to concatenate the content of A2 and B2 with a
+space between them.  Cells C3 through C7 also contain similar formula
+expressions.
+
+Let's walk through this code step by step.  First, we need to instantiate the
+document store.  Here we are using the concrete :cpp:class:`~orcus::spreadsheet::document`
+class available in orcus.  Then immediately pass this document to the
+:cpp:class:`~orcus::spreadsheet::import_factory` instance also from orcus:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: instantiate
+   :end-before: //!code-end: instantiate
+   :dedent: 4
+
+The next step is to create the loader instance and pass the factory to it:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: loader
+   :end-before: //!code-end: loader
+   :dedent: 4
+
+In this example we are using the :cpp:class:`~orcus::orcus_ods` filter class
+because the document we are loading is of Open Document Spreadsheet type, but
+the process is the same for other document types, the only difference being
+the name of the class.  Once the filter object is constructed, we'll simply
+load the file by calling its :cpp:func:`~orcus::orcus_ods::read_file` method
+and passing the path to the file as its argument:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: read-file
+   :end-before: //!code-end: read-file
+   :dedent: 4
+
+Once this call returns, the document has been fully populated.  What the rest
+of the code does is to access the content of the first row of the first sheet of
+the document.  First, you need to get a reference to the internal cell value
+store that we call *model context*:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: model-context
+   :end-before: //!code-end: model-context
+   :dedent: 4
+
+Since the content of cell A1 is a string, to get the value you need to first
+get the ID of the string:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: string-id
+   :end-before: //!code-end: string-id
+   :dedent: 4
+
+Once you have the ID of the string, you can pass that to the model to get the
+actual string value and print it to the standard output:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: print-string
+   :end-before: //!code-end: print-string
+   :dedent: 4
+
+Here we assume that the string value exists for the given ID.  In case you
+pass a string ID value to the :cpp:func:`get_string` method and there isn't a string
+value associated with it, you'll get a null pointer returned from the call.
+
+The reason you need to take this 2-step process to get a string value is
+because all the string values stored in the cells are pooled at the document
+model level, and the cells themselves only store the ID values as integers.
+
+You may also have noticed that the types surrounding the :cpp:class:`ixion::model_context`
+class are all in the :cpp:any:`ixion` namespace.  It is because orcus' own
+:cpp:class:`~orcus::spreadsheet::document` class uses the formula engine and the
+document model from the `ixion library <https://gitlab.com/ixion/ixion>`_ to handle
+calculation of the formula cells stored in the document, and the formula engine
+requires all cell values to be stored in the :cpp:class:`ixion::model_context`
+instance.
+
+.. note:: The :cpp:class:`~orcus::spreadsheet::document` class in orcus uses
+   the formula engine from the `ixion library <https://gitlab.com/ixion/ixion>`_
+   to calculate the results of the formula cells stored in the document.
+
+The rest of the code basically repeats the same process for cells B1 and C1:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1.cpp
+   :language: C++
+   :start-after: //!code-start: rest
+   :end-before: //!code-end: rest
+   :dedent: 4
+
+and generate the following output:
+
+.. code-block:: text
+
+    A1: Number
+    B1: String
+    C1: Formula
+
+Accessing the numeric cell values are a bit simpler since the values are
+stored directly with the cells.  Using the document from the above code example
+code, the following code block:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1_num_and_formula.cpp
+   :language: C++
+   :start-after: //!code-start: print-numeric-cells
+   :end-before: //!code-end: print-numeric-cells
+   :dedent: 4
+
+will access the cells from A2 through A7 and print out their numeric values.
+You should see the following output generated from this code block:
+
+.. code-block:: text
+
+    A2: 1
+    A3: 2
+    A4: 3
+    A5: 4
+    A6: 5
+    A7: 6
+
+It's a bit more complex to handle formula cells.  Since each formula cell
+contains two things: 1) the formula expression which is stored as tokens
+internally, and 2) the cached result of the formula.  The following code
+illustrates how to retrieve the cached formula results of cells C2 through
+C7:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_1_num_and_formula.cpp
+   :language: C++
+   :start-after: //!code-start: print-formula-cells
+   :end-before: //!code-end: print-formula-cells
+   :dedent: 4
+
+For each cell, this code first accesses the stored formula cell instance, get
+a reference to its cached result, then obtain its string result value to print
+it out to the standard output.  Running this block of code will produce the
+following output:
+
+.. code-block:: text
+
+    C2: 1 Andy
+    C3: 2 Bruce
+    C4: 3 Charlie
+    C5: 4 David
+    C6: 5 Edward
+    C7: 6 Frank
+
+.. warning:: In production code, you should probabaly check the formula cell
+             pointer which may be null in case the cell at the specified
+             position is not a formula cell.
diff --git a/doc/overview/doc-user.rst b/doc/overview/doc-user.rst
new file mode 100644
index 0000000..a1292e5
--- /dev/null
+++ b/doc/overview/doc-user.rst
@@ -0,0 +1,574 @@
+
+.. highlight:: cpp
+
+Use a user-defined custom document class
+========================================
+
+In this section we will demonstrate how you can use orcus to populate your own
+custom document model by implementing your own set of interface classes and
+passing it to the orcus import filter.  The first example code shown below is
+the *absolute* minimum that you need to implement in order for the orcus
+filter to function properly:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2.cpp
+   :language: C++
+
+Just like the example we used in the previous section, we are also loading a
+document saved in the Open Document Spreadsheet format via
+:cpp:class:`~orcus::orcus_ods`.  The document being loaded is named
+multi-sheets.ods, and contains three sheets which are are named **'1st
+Sheet'**, **'2nd Sheet'**, and **'3rd Sheet'** in this exact order.  When you
+compile and execute the above code, you should get the following output:
+
+.. code-block:: text
+
+    append_sheet: sheet index: 0; sheet name: 1st Sheet
+    append_sheet: sheet index: 1; sheet name: 2nd Sheet
+    append_sheet: sheet index: 2; sheet name: 3rd Sheet
+
+One primary role the import factory plays is to provide the orcus import
+filter with the ability to create and insert a new sheet to the document.  As
+illustrated in the above code, it also provides access to existing sheets by
+its name or its position.  Every import factory implementation must be a
+derived class of the :cpp:class:`orcus::spreadsheet::iface::import_factory`
+interface base class.  At a minimum, it must implement
+
+* the :cpp:func:`~orcus::spreadsheet::iface::import_factory::append_sheet`
+  method which inserts a new sheet and return access to it,
+
+* two variants of the :cpp:func:`~orcus::spreadsheet::iface::import_factory::get_sheet`
+  method which returns access to an existing sheet, and
+
+* the :cpp:func:`~orcus::spreadsheet::iface::import_factory::finalize` method
+  which gets called exactly once at the very end of the import, to give the
+  implementation a chance to perform post-import tasks.
+
+in order for the code to be buildable.  Now, since all of the sheet accessor
+methods return null pointers in this code, the import filter has no way of
+populating the sheet data.  To actually receive the sheet data from the import
+filter, you must have these methods return valid pointers to sheet accessors.
+The next example shows how that can be done.
+
+
+Implement sheet interface
+-------------------------
+
+In this section we will expand on the code in the previous section to
+implement the sheet accessor interface, in order to receive cell values
+in each individual sheet.  In this example, we will define a structure
+to hold a cell value, and store them in a 2-dimensional array for each
+sheet.  First, let's define the cell value structure:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_no_string_pool.cpp
+   :language: C++
+   :start-after: //!code-start: cell_value
+   :end-before: //!code-end: cell_value
+
+As we will be handling only three cell types i.e. empty, numeric, or string
+cell type, this structure will work just fine.  We will also define a namespace
+alias called ``ss`` for convenience.  This will be used in later code.
+
+Next, we'll define a sheet class called ``my_sheet`` that stores the cell values
+in a 2-dimensional array, and implements all required interfaces as a child class
+of :cpp:class:`~orcus::spreadsheet::iface::import_sheet`.
+
+At a minimum, the sheet accessor class must implement the following virtual
+methods to satisfy the interface requirements of
+:cpp:class:`~orcus::spreadsheet::iface::import_sheet`.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_auto` - This is a
+  setter method for a cell whose type is undetermined.  The implementor must
+  determine the value type of this cell, from the raw string value of the
+  cell.  This method is used when loading a CSV document, for instance.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_string` - This is a
+  setter method for a cell that stores a string value.  All cell string values
+  are expectd to be pooled for the entire document, and this method only
+  receives a string index into a centrally-managed string table.  The document
+  model is expected to implement a central string table that can translate an
+  index into its actual string value.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_value` - This is a
+  setter method for a cell that stores a numeric value.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_bool` - This is a
+  setter method for a cell that stores a boolean value.  Note that not all
+  format types use this method, as some formats store boolean values as
+  numeric values.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_date_time` - This
+  is a setter method for a cell that stores a date time value.  As with
+  boolean value type, some format types may not use this method as they store
+  date time values as numeric values, typically as days since epoch.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_format` - This is a
+  setter method for applying cell formats.  Just like the string values, cell
+  format properties are expected to be stored in a document-wide cell format
+  properties table, and this method only receives an index into the table.
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_sheet::get_sheet_size` - This
+  method is expected to return the dimension of the sheet which the loader may
+  need in some operations.
+
+For now, we'll only implement
+:cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_string`,
+:cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_value`, and
+:cpp:func:`~orcus::spreadsheet::iface::import_sheet::get_sheet_size`, and
+leave the rest empty.
+
+Here is the actual code for class ``my_sheet``:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_no_string_pool.cpp
+   :language: C++
+   :start-after: //!code-start: my_sheet
+   :end-before: //!code-end: my_sheet
+
+Note that this class receives its sheet index value from the caller upon
+instantiation.  A sheet index is a 0-based value and represents its position
+within the sheet collection.
+
+Finally, we will modify the ``my_import_factory`` class to store and manage a
+collection of ``my_sheet`` instances and to return the pointer value to a
+correct sheet accessor instance as needed.
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_no_string_pool.cpp
+   :language: C++
+   :start-after: //!code-start: my_import_factory
+   :end-before: //!code-end: my_import_factory
+
+Let's put it all together and run this code:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_no_string_pool.cpp
+   :language: C++
+
+We'll be loading the same document we loaded in the previous example, but this
+time we will receive its cell values.  Let's go through each sheet one at a
+time.
+
+Data on the first sheet looks like this:
+
+.. figure:: /_static/images/overview/multi-sheets-sheet1.png
+
+It consists of 4 columns, with each column having a header row followed by
+exactly ten rows of data.  The first and forth columns contain numeric data,
+while the second and third columns contain string data.
+
+When you run the above code to load this sheet, you'll get the following output:
+
+.. code-block:: text
+
+    (sheet: 0; row: 0; col: 0): string index = 0
+    (sheet: 0; row: 0; col: 1): string index = 0
+    (sheet: 0; row: 0; col: 2): string index = 0
+    (sheet: 0; row: 0; col: 3): string index = 0
+    (sheet: 0; row: 1; col: 0): value = 1
+    (sheet: 0; row: 1; col: 1): string index = 0
+    (sheet: 0; row: 1; col: 2): string index = 0
+    (sheet: 0; row: 1; col: 3): value = 35
+    (sheet: 0; row: 2; col: 0): value = 2
+    (sheet: 0; row: 2; col: 1): string index = 0
+    (sheet: 0; row: 2; col: 2): string index = 0
+    (sheet: 0; row: 2; col: 3): value = 56
+    (sheet: 0; row: 3; col: 0): value = 3
+    (sheet: 0; row: 3; col: 1): string index = 0
+    (sheet: 0; row: 3; col: 2): string index = 0
+    (sheet: 0; row: 3; col: 3): value = 6
+    (sheet: 0; row: 4; col: 0): value = 4
+    (sheet: 0; row: 4; col: 1): string index = 0
+    (sheet: 0; row: 4; col: 2): string index = 0
+    (sheet: 0; row: 4; col: 3): value = 65
+    (sheet: 0; row: 5; col: 0): value = 5
+    (sheet: 0; row: 5; col: 1): string index = 0
+    (sheet: 0; row: 5; col: 2): string index = 0
+    (sheet: 0; row: 5; col: 3): value = 88
+    (sheet: 0; row: 6; col: 0): value = 6
+    (sheet: 0; row: 6; col: 1): string index = 0
+    (sheet: 0; row: 6; col: 2): string index = 0
+    (sheet: 0; row: 6; col: 3): value = 90
+    (sheet: 0; row: 7; col: 0): value = 7
+    (sheet: 0; row: 7; col: 1): string index = 0
+    (sheet: 0; row: 7; col: 2): string index = 0
+    (sheet: 0; row: 7; col: 3): value = 80
+    (sheet: 0; row: 8; col: 0): value = 8
+    (sheet: 0; row: 8; col: 1): string index = 0
+    (sheet: 0; row: 8; col: 2): string index = 0
+    (sheet: 0; row: 8; col: 3): value = 66
+    (sheet: 0; row: 9; col: 0): value = 9
+    (sheet: 0; row: 9; col: 1): string index = 0
+    (sheet: 0; row: 9; col: 2): string index = 0
+    (sheet: 0; row: 9; col: 3): value = 14
+    (sheet: 0; row: 10; col: 0): value = 10
+    (sheet: 0; row: 10; col: 1): string index = 0
+    (sheet: 0; row: 10; col: 2): string index = 0
+    (sheet: 0; row: 10; col: 3): value = 23
+
+There is a couple of things worth pointing out.  First, the cell data
+flows left to right first then top to bottom second.  Second, for this
+particular sheet and for this particular format, implementing just the
+two setter methods, namely
+:cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_string` and
+:cpp:func:`~orcus::spreadsheet::iface::import_sheet::set_value` are
+enough to receive all cell values.  However, we are getting a string
+index value of 0 for all string cells.  This is because orcus expects
+the backend document model to implement the shared strings interface
+which is responsible for providing correct string indices to the import
+filter, and we have not yet implemented one.  Let's fix that.
+
+
+Implement shared strings interface
+----------------------------------
+
+The first thing to do is define some types:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_string_pool.cpp
+   :language: C++
+   :start-after: //!code-start: types
+   :end-before: //!code-end: types
+
+Here, we define ``ss_type`` to be the authoritative store for the shared
+string values.  The string values will be stored as std::string type, and we
+use std::deque here to avoid re-allocation of internal buffers as the size
+of the container grows.
+
+Another type we define is ``ss_hash_type``, which will be the hash map type
+for storing string-to-index mapping entries.  Here, we are using std::string_view
+instead of std::string so that we can simply reference the string values stored in
+the first container.
+
+The shared string interface is designed to handle both unformatted and
+formatted string values.  The following two methods:
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::add`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append`
+
+are for unformatted string values.  The
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::add` method is
+used when passing a string value that may or may not already exist in the
+shared string pool.  The
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append` method,
+on the other hand, is used only when the string value being passed is a
+brand-new string not yet stored in the string pool.  When implementing the
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append` method,
+you may skip checking for the existance of the string value in the pool before
+inserting it.  Both of these methods are expected to return a positive integer
+value as the index of the string being passed.
+
+The following eight methods:
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::set_segment_bold`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::set_segment_font`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::set_segment_font_color`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::set_segment_font_name`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::set_segment_font_size`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::set_segment_italic`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append_segment`
+* :cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::commit_segments`
+
+are for receiving formatted string values.  Conceptually, a formatted string
+consists of a series of multiple string segments, where each segment may have
+different formatting attributes applied to it.  These ``set_segment_*``
+methods are used to set the individual formatting attributes for the current
+string segment, and the string value for the current segment is passed through
+the
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append_segment`
+call.  The order in which the ``set_segment_*`` methods are called is not
+specified, and not all of them may be called, but they are guaranteed to be
+called before the
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append_segment`
+method gets called.  The implementation should keep a buffer to store the
+formatting attributes for the current segment and apply each attribute to the
+buffer as one of the ``set_segment_*`` methods gets called.  When the
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append_segment`
+gets called, the implementation should apply the formatting attirbute set
+currently in the buffer to the current segment, and reset the buffer for the
+next segment.  When all of the string segments and their formatting attributes
+are passed,
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::commit_segments`
+gets called, signaling the implementation that now it's time to commit the
+string to the document model.
+
+As we are going to ignore the formatting attributes in our current example,
+the following code will do:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_string_pool.cpp
+   :language: C++
+   :start-after: //!code-start: my_shared_strings
+   :end-before: //!code-end: my_shared_strings
+
+Note that some import filters may use the
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::append_segment`
+and
+:cpp:func:`~orcus::spreadsheet::iface::import_shared_strings::commit_segments`
+combination even for unformatted strings.  Because of this, you still need to
+implement these two methods even if raw string values are all you care about.
+
+Note also that the container storing the string values is a reference.  The
+source container will be owned by ``my_import_factory`` who will also be the
+owner of the ``my_shared_strings`` instance.  Shown below is the modified
+version of ``my_import_factory`` that provides the shared string interface:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_string_pool.cpp
+   :language: C++
+   :start-after: //!code-start: my_import_factory
+   :end-before: //!code-end: my_import_factory
+
+The shared string store is also passed to each sheet instance, and we'll use
+that to fetch the string values from their respective string indices.
+
+Let's put this all together:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_string_pool.cpp
+   :language: C++
+
+The sheet class is largely unchanged except for one thing; it now takes a
+reference to the string pool and print the actual string value alongside the
+string index associated with it.  When you execute this code, you'll see the
+following output when loading the same sheet:
+
+.. code-block:: text
+
+    (sheet: 0; row: 0; col: 0): string index = 0 (ID)
+    (sheet: 0; row: 0; col: 1): string index = 1 (First Name)
+    (sheet: 0; row: 0; col: 2): string index = 2 (Last Name)
+    (sheet: 0; row: 0; col: 3): string index = 3 (Age)
+    (sheet: 0; row: 1; col: 0): value = 1
+    (sheet: 0; row: 1; col: 1): string index = 5 (Thia)
+    (sheet: 0; row: 1; col: 2): string index = 6 (Beauly)
+    (sheet: 0; row: 1; col: 3): value = 35
+    (sheet: 0; row: 2; col: 0): value = 2
+    (sheet: 0; row: 2; col: 1): string index = 9 (Pepito)
+    (sheet: 0; row: 2; col: 2): string index = 10 (Resun)
+    (sheet: 0; row: 2; col: 3): value = 56
+    (sheet: 0; row: 3; col: 0): value = 3
+    (sheet: 0; row: 3; col: 1): string index = 13 (Emera)
+    (sheet: 0; row: 3; col: 2): string index = 14 (Gravey)
+    (sheet: 0; row: 3; col: 3): value = 6
+    (sheet: 0; row: 4; col: 0): value = 4
+    (sheet: 0; row: 4; col: 1): string index = 17 (Erinn)
+    (sheet: 0; row: 4; col: 2): string index = 18 (Flucks)
+    (sheet: 0; row: 4; col: 3): value = 65
+    (sheet: 0; row: 5; col: 0): value = 5
+    (sheet: 0; row: 5; col: 1): string index = 21 (Giusto)
+    (sheet: 0; row: 5; col: 2): string index = 22 (Bambury)
+    (sheet: 0; row: 5; col: 3): value = 88
+    (sheet: 0; row: 6; col: 0): value = 6
+    (sheet: 0; row: 6; col: 1): string index = 25 (Neall)
+    (sheet: 0; row: 6; col: 2): string index = 26 (Scorton)
+    (sheet: 0; row: 6; col: 3): value = 90
+    (sheet: 0; row: 7; col: 0): value = 7
+    (sheet: 0; row: 7; col: 1): string index = 29 (Ervin)
+    (sheet: 0; row: 7; col: 2): string index = 30 (Foreman)
+    (sheet: 0; row: 7; col: 3): value = 80
+    (sheet: 0; row: 8; col: 0): value = 8
+    (sheet: 0; row: 8; col: 1): string index = 33 (Shoshana)
+    (sheet: 0; row: 8; col: 2): string index = 34 (Bohea)
+    (sheet: 0; row: 8; col: 3): value = 66
+    (sheet: 0; row: 9; col: 0): value = 9
+    (sheet: 0; row: 9; col: 1): string index = 37 (Gladys)
+    (sheet: 0; row: 9; col: 2): string index = 38 (Somner)
+    (sheet: 0; row: 9; col: 3): value = 14
+    (sheet: 0; row: 10; col: 0): value = 10
+    (sheet: 0; row: 10; col: 1): string index = 41 (Ephraim)
+    (sheet: 0; row: 10; col: 2): string index = 42 (Russell)
+    (sheet: 0; row: 10; col: 3): value = 23
+
+The string indices now increment nicely, and their respective string values
+look correct.
+
+Now, let's turn our attention to the second sheet, which contains formulas.
+First, here is what the second sheet looks like:
+
+.. figure:: /_static/images/overview/multi-sheets-sheet2.png
+
+It contains a simple table extending from A1 to C9.  It consists of three
+columns and the first row is a header row.  Cells in the the first and second
+columns contain simple numbers and the third column contains formulas that
+simply add the two numbers to the left of the same row.  When loading this
+sheet using the last code we used above, you'll see the following output:
+
+.. code-block:: text
+
+    (sheet: 1; row: 0; col: 0): string index = 44 (X)
+    (sheet: 1; row: 0; col: 1): string index = 45 (Y)
+    (sheet: 1; row: 0; col: 2): string index = 46 (X + Y)
+    (sheet: 1; row: 1; col: 0): value = 18
+    (sheet: 1; row: 1; col: 1): value = 79
+    (sheet: 1; row: 2; col: 0): value = 48
+    (sheet: 1; row: 2; col: 1): value = 55
+    (sheet: 1; row: 3; col: 0): value = 99
+    (sheet: 1; row: 3; col: 1): value = 35
+    (sheet: 1; row: 4; col: 0): value = 41
+    (sheet: 1; row: 4; col: 1): value = 69
+    (sheet: 1; row: 5; col: 0): value = 5
+    (sheet: 1; row: 5; col: 1): value = 18
+    (sheet: 1; row: 6; col: 0): value = 46
+    (sheet: 1; row: 6; col: 1): value = 69
+    (sheet: 1; row: 7; col: 0): value = 36
+    (sheet: 1; row: 7; col: 1): value = 67
+    (sheet: 1; row: 8; col: 0): value = 78
+    (sheet: 1; row: 8; col: 1): value = 2
+
+Everything looks fine except that the formula cells in C2:C9 are not loaded at
+all.  This is because, in order to receive formula cell data, you must
+implement the required :cpp:class:`~orcus::spreadsheet::iface::import_formula`
+interface, which we will cover in the next section.
+
+
+Implement formula interface
+---------------------------
+
+In this section we will extend the code from the previous section in order to
+receive and process formula cell values from the sheet.  We will need to make
+quite a few changes.  Let's go over this one thing at a time.  First, we are
+adding a new cell value type ``formula``:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_formula.cpp
+   :language: C++
+   :start-after: //!code-start: cell_value_type
+   :end-before: //!code-end: cell_value_type
+
+which should not come as a surprise.
+
+We are not making any change to the ``cell_value`` struct itself, but we are
+re-using its ``index`` member for a formula cell value such that, if the cell
+stores a formula, the index will refer to its actual formula data which will
+be stored in a separate data store, much like how strings are stored
+externally and referenced by their indices in the ``cell_value`` instances.
+
+We are also adding a brand-new class called ``cell_grid``, to add an extra
+layer over the raw cell value array:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_formula.cpp
+   :language: C++
+   :start-after: //!code-start: cell_grid
+   :end-before: //!code-end: cell_grid
+
+Each sheet instance will own one instance of ``cell_grid``, and the formula
+interface class instance will hold a reference to it and use it to insert
+formula cell values into it.  The same sheet instance will also hold a formula
+value store, and pass its reference to the formula interface class.
+
+The formula interface class must implement the following methods:
+
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_position`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_formula`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_shared_formula_index`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_string`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_value`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_empty`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_bool`
+* :cpp:func:`~orcus::spreadsheet::iface::import_formula::commit`
+
+Depending on the type of a formula cell, and depending on the format of the
+document, some methods may not be called.  The
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_position` method
+always gets called regardless of the formula cell type, to specify the
+position of the formula cell.  The
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_formula` gets
+called for a formula cell that does not share its formula expression with any
+other formula cells, or a formula cell that shares its formula expression with
+a group of other formuls cells and is the primary cell of that group.  If it's
+the primary cell of a grouped formula cells, the
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_shared_formula_index`
+method also gets called to receive the identifier value of that group.  All
+formula cells belonging to the same group receives the same identifier value
+via
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_shared_formula_index`,
+but only the primary cell of a group receives the formula expression string
+via :cpp:func:`~orcus::spreadsheet::iface::import_formula::set_formula`.  The
+rest of the methods -
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_string`,
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_value`,
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_empty` and
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_result_bool` - are
+called to deliver the cached formula cell value when applicable.
+
+The :cpp:func:`~orcus::spreadsheet::iface::import_formula::commit` method gets
+called at the very end to let the implementation commit the formula cell data
+to the backend document store.
+
+Without further ado, here is the formula interface implementation that we will
+use:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_formula.cpp
+   :language: C++
+   :start-after: //!code-start: my_formula
+   :end-before: //!code-end: my_formula
+
+and here is the defintion of the ``formula`` struct that stores a formula expression
+string as well as its grammer type:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_formula.cpp
+   :language: C++
+   :start-after: //!code-start: formula
+   :end-before: //!code-end: formula
+
+Note that since we are loading a OpenDocument Spereadsheet file (.ods) which
+does not support shared formulas, we do not need to handle the
+:cpp:func:`~orcus::spreadsheet::iface::import_formula::set_shared_formula_index`
+method.  Likewise, we are leaving the ``set_result_*`` methods unhandled for
+now.
+
+This interface class also stores references to ``cell_grid`` and
+``std::vector<formula>`` instances, both of which are passed from the parent
+sheet instance.
+
+We also need to make a few changes to the sheet interface class to provide a formula interface
+and add a formula value store:
+
+.. literalinclude:: ../../doc_example/spreadsheet_doc_2_sheets_with_formula.cpp
+   :language: C++
+   :start-after: //!code-start: my_sheet
+   :end-before: //!code-end: my_sheet
+
+We've added the
+:cpp:func:`~orcus::spreadsheet::iface::import_sheet::get_formula` method which
+returns a pointer to the ``my_formula`` class instance defined above.  The
+rest of the code is unchanged.
+
+Now let's see what happens when loading the same sheet from the previous
+section:
+
+.. code-block:: text
+
+    (sheet: 1; row: 0; col: 0): string index = 44 (X)
+    (sheet: 1; row: 0; col: 1): string index = 45 (Y)
+    (sheet: 1; row: 0; col: 2): string index = 46 (X + Y)
+    (sheet: 1; row: 1; col: 0): value = 18
+    (sheet: 1; row: 1; col: 1): value = 79
+    (sheet: 1; row: 2; col: 0): value = 48
+    (sheet: 1; row: 2; col: 1): value = 55
+    (sheet: 1; row: 3; col: 0): value = 99
+    (sheet: 1; row: 3; col: 1): value = 35
+    (sheet: 1; row: 4; col: 0): value = 41
+    (sheet: 1; row: 4; col: 1): value = 69
+    (sheet: 1; row: 5; col: 0): value = 5
+    (sheet: 1; row: 5; col: 1): value = 18
+    (sheet: 1; row: 6; col: 0): value = 46
+    (sheet: 1; row: 6; col: 1): value = 69
+    (sheet: 1; row: 7; col: 0): value = 36
+    (sheet: 1; row: 7; col: 1): value = 67
+    (sheet: 1; row: 8; col: 0): value = 78
+    (sheet: 1; row: 8; col: 1): value = 2
+    (sheet: 1; row: 1; col: 2): formula = [.A2]+[.B2] (ods)
+    (sheet: 1; row: 2; col: 2): formula = [.A3]+[.B3] (ods)
+    (sheet: 1; row: 3; col: 2): formula = [.A4]+[.B4] (ods)
+    (sheet: 1; row: 4; col: 2): formula = [.A5]+[.B5] (ods)
+    (sheet: 1; row: 5; col: 2): formula = [.A6]+[.B6] (ods)
+    (sheet: 1; row: 6; col: 2): formula = [.A7]+[.B7] (ods)
+    (sheet: 1; row: 7; col: 2): formula = [.A8]+[.B8] (ods)
+    (sheet: 1; row: 8; col: 2): formula = [.A9]+[.B9] (ods)
+
+Looks like we are getting the formula cell values this time around.
+
+One thing to note is that the formula expression strings you see here follow
+the syntax defined in the OpenFormula specifications, which is the formula syntax
+used in the OpenDocument Spreadsheet format.
+
+
+Implement more interfaces
+-------------------------
+
+This section has covered only a part of the available spreadsheet interfaces
+you can implement in your code.  Refer to the :ref:`spreadsheet-interfaces`
+section to see the complete list of interfaces.
diff --git a/doc/overview/index.rst b/doc/overview/index.rst
new file mode 100644
index 0000000..0b95f8b
--- /dev/null
+++ b/doc/overview/index.rst
@@ -0,0 +1,95 @@
+
+.. highlight:: cpp
+
+Overview
+========
+
+Composition of the library
+--------------------------
+
+The primary goal of the orcus library is to provide a framework to import the
+contents of documents stored in various spreadsheet or spreadsheet-like
+formats.  The library also provides several low-level parsers that can be used
+independently of the spreadsheet-related features if so desired.  In addition,
+the library also provides support for some hierarchical documents, such as JSON
+and YAML, which were a later addition to the library.
+
+You can use this library either through its C++ API, Python API, or CLI.  However,
+not all three methods equally expose all features of the library, and the C++ API
+is more complete than the other two.
+
+The library is physically split into four parts:
+
+    1. the parser part that provides the aforementioned low-level parsers,
+    2. the filter part that providers higher level import filters for spreadsheet
+       and hierarchical documents that internally use the low-level parsers,
+    3. the spreadsheet document model part that includes the document model suitable
+       for storing spreadsheet document contents, and
+    4. CLI for loading and converting spreadsheet and hierarchical documents.
+
+If you need to just use the parser part of the library, you need to only link
+against the ``liborcus-parser`` library file.  If you need to use the import
+filter part, link againt both the ``liborcus-parser`` and the ``liborcus``
+libraries.  Likewise, if you need to use the spreadsheet document model part,
+link against the aforementioned two plus the ``liborcus-spreadsheet-model``
+library.
+
+Also note that the spreadsheet document model part has additional dependency on
+the `ixion library <https://gitlab.com/ixion/ixion>`_ for handling formula
+re-calculations on document load.
+
+
+Loading spreadsheet documents
+-----------------------------
+
+The orcus library's primary aim is to provide a framework to import the contents
+of documents stored in various spreadsheet, or spreadsheet-like formats.  It
+supports two primary use cases.  The first use case is where the client
+program does not have its own document model, but needs to import data from a
+spreadsheet-like document file and access its content without implementing its
+own document store from scratch.  In this particular use case, you can simply
+use the :cpp:class:`~orcus::spreadsheet::document` class to get it populated,
+and access its content through its API afterward.
+
+The second use case, which is a bit more advanced, is where the client program
+already has its own internal document model, and needs to use orcus
+to populate its document model.  In this particular use case, you can
+implement your own set of classes that support necessary interfaces, and pass
+that to the orcus import filter.
+
+For each document type that orcus supports, there is a top-level import filter
+class that serves as an entry point for loading the content of a document you
+wish to load.  You don't pass your document to this filter directly; instead,
+you wrap your document with what we call an **import factory**, then pass this
+factory instance to the loader.  This import factory is then required to
+implement necessary interfaces that the filter class uses in order for it
+to pass data to the document as the file is getting parsed.
+
+When using orcus's own document model, you can simply use orcus's own import
+factory implementation to wrap its document.  When using your own document
+model, on the other hand, you'll need to implement your own set of interface
+classes to wrap your document with.
+
+The following sections describe how to load a spreadsheet document by using 1)
+orcus's own spreadsheet document class, and 2) a user-defined custom docuemnt
+class.
+
+.. toctree::
+   :maxdepth: 1
+
+   doc-orcus.rst
+   doc-user.rst
+
+
+Loading hierarchical documents
+------------------------------
+
+The orcus library also includes support for hierarchical document types such
+as JSON and YAML.  The following sections delve more into the support for
+these types of documents.
+
+.. toctree::
+   :maxdepth: 1
+
+   json.rst
+   yaml.rst
diff --git a/doc/overview/json.rst b/doc/overview/json.rst
new file mode 100644
index 0000000..0e252f9
--- /dev/null
+++ b/doc/overview/json.rst
@@ -0,0 +1,353 @@
+
+.. highlight:: cpp
+
+JSON
+====
+
+The JSON part of orcus consists of a low-level parser class that handles
+parsing of JSON strings, and a high-level document class that stores parsed
+JSON structures as a node tree.
+
+There are two approaches to processing JSON strings using the orcus library.
+One approach is to utilize the :cpp:class:`~orcus::json::document_tree` class
+to load and populate the JSON structure tree via its
+:cpp:func:`~orcus::json::document_tree::load()` method and traverse the tree
+through its :cpp:func:`~orcus::json::document_tree::get_document_root()` method.
+This approach is ideal if you want a quick way to parse and access the content
+of a JSON document with minimal effort.
+
+Another approach is to use the low-level :cpp:class:`~orcus::json_parser`
+class directly by providing your own handler class to receive callbacks from
+the parser.  This method requires a bit more effort on your part to provide
+and populate your own data structure, but if you already have a data structure
+to store the content of JSON, then this approach is ideal.  The
+:cpp:class:`~orcus::json::document_tree` class internally uses
+:cpp:class:`~orcus::json_parser` to parse JSON contents.
+
+
+Populating a document tree from JSON string
+-------------------------------------------
+
+The following code snippet shows an example of how to populate an instance of
+:cpp:class:`~orcus::json::document_tree` from a JSON string, and navigate its
+content tree afterward.
+
+.. literalinclude:: ../../doc_example/json_doc_1.cpp
+   :language: C++
+
+You'll see the following output when executing this code:
+
+.. code-block:: text
+
+    name: John Doe
+    occupation: Software Engineer
+    score:
+      - 89
+      - 67
+      - 90
+
+
+Using the low-level parser
+--------------------------
+
+The following code snippet shows how to use the low-level :cpp:class:`~orcus::json_parser`
+class by providing an own handler class and passing it as a template argument:
+
+.. literalinclude:: ../../doc_example/json_parser_1.cpp
+   :language: C++
+
+The parser constructor expects the char array, its length, and the handler
+instance.  The base handler class :cpp:class:`~orcus::json_handler` implements
+all required handler methods.  By inheriting from it, you only need to
+implement the handler methods you need.  In this example, we are only
+implementing the :cpp:func:`~orcus::json_handler::object_key`,
+:cpp:func:`~orcus::json_handler::string`, and :cpp:func:`~orcus::json_handler::number`
+methods to process object key values, string values and numeric values,
+respectively.  Refer to the :cpp:class:`~orcus::json_handler` class definition
+for all available handler methods.
+
+Executing this code will generate the following output:
+
+.. code-block:: text
+
+    JSON string: {"key1": [1,2,3,4,5], "key2": 12.3}
+    object key: key1
+    number: 1
+    number: 2
+    number: 3
+    number: 4
+    number: 5
+    object key: key2
+    number: 12.3
+
+
+Building a document tree directly
+---------------------------------
+
+You can also create and populate a JSON document tree directly without needing
+to parse a JSON string.  This approach is ideal if you want to create a JSON
+tree from scratch and export it as a string.  The following series of code
+snippets demonstrate how to exactly build JSON document trees directly and
+export their contents as JSON strings.
+
+The first example shows how to initialize the tree with a simple array:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: root list
+   :end-before: //!code-end: root list
+
+You can simply specify the content of the array via initialization list and
+assign it to the document.  The :cpp:func:`~orcus::json::document_tree::dump()`
+method then turns the content into a single string instance, which looks like
+the following:
+
+.. code-block:: text
+
+    [
+        1,
+        2,
+        "string value",
+        false,
+        null
+    ]
+
+If you need to build a array of arrays, do like the following:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: list nested
+   :end-before: //!code-end: list nested
+
+This will create an array of two nested child arrays with three values each.
+Dumping the content of the tree as a JSON string will produce something like
+the following:
+
+.. code-block:: text
+
+    [
+        [
+            true,
+            false,
+            null
+        ],
+        [
+            1.1,
+            2.2,
+            "text"
+        ]
+    ]
+
+Creating an object can be done by nesting one of more key-value pairs, each of
+which is surrounded by a pair of curly braces, inside another pair of curly
+braces.  For example, the following code:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: list object
+   :end-before: //!code-end: list object
+
+produces the following output:
+
+.. code-block:: text
+
+    {
+        "key1": 1.2,
+        "key2": "some text"
+    }
+
+indicating that the tree consists of a single object having two key-value
+pairs.
+
+You may notice that this syntax is identical to the syntax for
+creating an array of arrays as shown above.  In fact, in order for this to be
+an object, each of the inner sequences must have exactly two values, and its
+first value must be a string value.  Failing that, it will be interpreted as
+an array of arrays.
+
+As with arrays, nesting of objects is also supported.  The following code:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: list object 2
+   :end-before: //!code-end: list object 2
+
+creates a root object having two key-value pairs one of which contains
+another object having three key-value pairs, as evident in the following output
+generated by this code:
+
+.. code-block:: text
+
+    {
+        "parent1": {
+            "child1": true,
+            "child2": false,
+            "child3": 123.4
+        },
+        "parent2": "not-nested"
+    }
+
+There is one caveat that you need to be aware of because of this special
+object creation syntax.  When you have a nested array that exactly contains
+two values and the first value is a string value, you must explicitly declare
+that as an array by using an :cpp:class:`~orcus::json::array` class instance.
+For instance, this code:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: array ambiguous
+   :end-before: //!code-end: array ambiguous
+
+is intended to be an object containing an array.  However, because the supposed
+inner array contains exactly two values and the first value is a string
+value, which could be interpreted as a key-value pair for the outer object, it
+ends up being too ambiguous and a :cpp:class:`~orcus::json::key_value_error`
+exception gets thrown as a result.
+
+To work around this ambiguity, you need to declare the inner array to be
+explicit by using an :cpp:class:`~orcus::json::array` instance:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: array explicit
+   :end-before: //!code-end: array explicit
+
+This code now correctly generates a root object containing one key-value pair
+whose value is an array:
+
+.. code-block:: text
+
+    {
+        "array": [
+            "one",
+            987
+        ]
+    }
+
+Similar ambiguity issue arises when you want to construct a tree consisting
+only of an empty root object.  You may be tempted to write something like
+this:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: object ambiguous a
+   :end-before: //!code-end: object ambiguous a
+
+However, this will result in leaving the tree entirely unpopulated i.e. the
+tree will not even have a root node!  If you continue on and try to get a root
+node from this tree, you'll get a :cpp:class:`~orcus::json::document_error`
+thrown as a result.  If you inspect the error message stored in the exception:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: object ambiguous b
+   :end-before: //!code-end: object ambiguous b
+
+you will get
+
+.. code-block:: text
+
+    json::document_error: document tree is empty
+
+giving you further proof that the tree is indeed empty!  The solution here is
+to directly assign an instance of :cpp:class:`~orcus::json::object` to the
+document tree, which will initialize the tree with an empty root object.  The
+following code:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: object explicit 1
+   :end-before: //!code-end: object explicit 1
+
+will therefore generate
+
+.. code-block:: text
+
+    {
+    }
+
+You can also use the :cpp:class:`~orcus::json::object` class instances to
+indicate empty objects anythere in the tree.  For instance, this code:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: object explicit 2
+   :end-before: //!code-end: object explicit 2
+
+is intended to create an array containing three empty objects as its elements,
+and that's exactly what it does:
+
+.. code-block:: text
+
+    [
+        {
+        },
+        {
+        },
+        {
+        }
+    ]
+
+So far all the examples have shown how to initialize the document tree as the
+tree itself is being constructed.  But our next example shows how to create
+new key-value pairs to existing objects after the document tree instance has
+been initialized.
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: root object add child
+   :end-before: //!code-end: root object add child
+
+This code first initializes the tree with an empty object, then retrieves the
+root empty object and assigns several key-value pairs to it.  When converting
+the tree content to a string and inspecting it you'll see something like the
+following:
+
+.. code-block:: text
+
+    {
+        "child array": [
+            1.1,
+            1.2,
+            true
+        ],
+        "child1": 1,
+        "child3": [
+            true,
+            false
+        ],
+        "child2": "string",
+        "child object": {
+            "key1": 100,
+            "key2": 200
+        }
+    }
+
+The next example shows how to append values to an existing array after the
+tree has been constructed.  Let's take a look at the code:
+
+.. literalinclude:: ../../doc_example/json_doc_2.cpp
+   :language: C++
+   :start-after: //!code-start: root array add child
+   :end-before: //!code-end: root array add child
+
+Like the previous example, this code first initializes the tree but this time
+with an empty array as its root, retrieves the root array, then appends
+several values to it via its :cpp:func:`~orcus::json::node::push_back` method.
+
+When you dump the content of this tree as a JSON string you'll get something
+like this:
+
+.. code-block:: text
+
+    [
+        -1.2,
+        "string",
+        true,
+        null,
+        {
+            "key1": 1.1,
+            "key2": 1.2
+        }
+    ]
+
diff --git a/doc/overview/yaml.rst b/doc/overview/yaml.rst
new file mode 100644
index 0000000..4109cf2
--- /dev/null
+++ b/doc/overview/yaml.rst
@@ -0,0 +1,8 @@
+
+.. highlight:: cpp
+
+YAML
+====
+
+TBD
+