summaryrefslogtreecommitdiffstats
path: root/third_party/rust/sync15/README.md
blob: a638435908a3fc8b07a04bfe4ef0b64059b70ab3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# Low-level sync-1.5 helper component

This component contains utility code to be shared between different
data stores that want to sync against a Firefox Sync v1.5 sync server.
It handles things like encrypting/decrypting records, obtaining and
using storage node auth tokens, and so-on.

There are 2 key concepts to understand here - the implementation itself, and
a rust trait for a "syncable store" where component-specific logic lives - but
before we dive into them, some preamble might help put things into context.

## Nomenclature

* The term "store" is generally used as the interface to the database - ie, the
  thing that gets and saves items. It can also be seen as supplying the API
  used by most consumers of the component. Note that the "places" component
  is alone in using the term "api" for this object.

* The term "engine" (or ideally, "sync engine") is used for the thing that
  actually does the syncing for a store. Sync engines implement the SyncEngine
  trait - the trait is either implemented directly by a store, or a new object
  that has a reference to a store.

## Introduction and History

For many years Sync has worked exclusively against a "sync v1.5 server". This
[is a REST API described here](https://mozilla-services.readthedocs.io/en/latest/storage/apis-1.5.html).
The important part is that the API is conceptually quite simple - there are
arbitrary "collections" containing "records" indexed by a GUID, and lacking
traditonal database concepts like joins. Because the record is encrypted,
there's very little scope for the server to be much smarter. Thus it's
reasonably easy to create a fairly generic abstraction over the API that can be
easily reused.

Back in the deep past, we found ourselves with 2 different components that
needed to sync against a sync v1.5 server. The apps using these components
didn't have schedulers or any UI for choosing what to sync - so these
components just looked at the existing state of the engines on the server and
synced if they were enabled.

This was also pre-megazord - the idea was that apps could choose from a "menu"
of components to include - so we didn't really want to bind these components
together. Therefore, there was no concept of "sync all" - instead, each of the
components had to be synced individually. So this component started out as more
of a "library" than a "component" which individual components could reuse - and
each of these components was a "syncable store" (ie, a store which could supply
 a "sync engine").

Fast forward to Fenix and we needed a UI for managing all the engines supported
there, and a single "sync now" experience etc - so we also have a sync_manager
component - [see its README for more](../components/sync_manager/README.md).
But even though it exists, there are still some parts of this component that
reflect these early days - for example, it's still possible to sync just a
single component using sync15 (ie, without going via the "sync manager"),
although this isn't used and should be removed - the "sync manager" allows you
to choose which engines to sync, so that should be used exclusively.

## Metadata

There's some metadata associated with a sync. Some of the metadata is "global"
to the app (eg, the enabled state of engines, information about what servers to
use, etc) and some is specific to an engine (eg, timestamp of the
server's collection for this engine, guids for the collections, etc).

We made the decision early on that no storage should be done by this
component:

* The "global" metadata should be stored by the application - but because it
  doesn't need to interpret the data, we do this with an opaque string (that
  is JSON, but the app should never assume or introspect that)

* Each engine should store its own metadata, so we don't end up in the
  situation where, say, a database is moved between profiles causing the
  metadata to refer to a completely different data set. So each engine
  stores its metadata in the same database as the data itself, so if the
  database is moved or copied, the metadata comes with it)

## Sync Implementation

The core implementation does all of the interaction with things like the
tokenserver, the `meta/global` and `info/collections` collections, etc. It
does all network interaction (ie, individual engines don't need to interact with
the network at all), tracks things like whether the server is asking us to
"backoff" due to operational concerns, manages encryption keys and the
encryption itself, etc. The general flow of a sync - which interacts with the
`SyncEngine` trait - is:

* Does all pre-sync setup, such as checking `meta/global`, and whether the
  sync IDs on the server match the sync IDs we last saw (ie, to check whether
  something drastic has happened since we last synced)
* Asks the engine about how to formulate the URL query params to obtain the
  records the engine cares about. In most cases, this will simply be "records
  since the last modified timestamp of the last sync".
* Downloads and decrypts these records.
* Passes these records to the engine for processing, and obtains records that
  should be uploaded to the server.
* Encrypts these outgoing records and uploads them.
* Tells the engine about the result of the upload (ie, the last-modified
  timestamp of the POST so it can be saved as engine metadata)

As above, the sync15 component really only deals with a single engine at a time.
See the "sync manager" for how multiple engine are managed (but the tl;dr is
that the "sync manager" leans on this very heavily, but knows about multiple
engine and manages shared state)

## The `SyncEngine` trait

The SyncEngine trait is where all logic specific to a collection lives. A "sync
engine" implements (or provides) this trait to implement actual syncing.

For <handwave> reasons, it actually lives in the
[sync-traits helper](https://github.com/mozilla/application-services/blob/main/components/support/sync15-traits/src/engine.rs)
but for the purposes of this document, you should consider it as owned by sync15.

This is actually quite a simple trait - at a high level, it's really just
concerned with:

* Get or set some metadata the sync15 component has decided should be saved or
  fetched.

* In a normal sync, take some "incoming" records, process them, and return
  the "outgoing" records we should send to the server.

* In some edge-cases, either "wipe" (ie, actually delete everything, which
  almost never happens) or "reset" (ie, pretend this engine has never before
  been synced)

And that's it!