From 6bf0a5cb5034a7e684dcc3500e841785237ce2dd Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 7 Apr 2024 19:32:43 +0200 Subject: Adding upstream version 1:115.7.0. Signed-off-by: Daniel Baumann --- services/sync/docs/engines.rst | 133 +++++++++++++++++++++++++ services/sync/docs/external.rst | 8 ++ services/sync/docs/index.rst | 17 ++++ services/sync/docs/overview.rst | 81 +++++++++++++++ services/sync/docs/payload-evolution.md | 168 ++++++++++++++++++++++++++++++++ services/sync/docs/rust-engines.rst | 37 +++++++ 6 files changed, 444 insertions(+) create mode 100644 services/sync/docs/engines.rst create mode 100644 services/sync/docs/external.rst create mode 100644 services/sync/docs/index.rst create mode 100644 services/sync/docs/overview.rst create mode 100644 services/sync/docs/payload-evolution.md create mode 100644 services/sync/docs/rust-engines.rst (limited to 'services/sync/docs') diff --git a/services/sync/docs/engines.rst b/services/sync/docs/engines.rst new file mode 100644 index 0000000000..7a4fa721af --- /dev/null +++ b/services/sync/docs/engines.rst @@ -0,0 +1,133 @@ +============================ +The Sync engines in the tree +============================ + +Unless otherwise specified, the engine implementations can be found +`here `_ + +Please read the :doc:`overview`. + +Clients +======= + +The ``clients`` engine is a special engine in that it's invisible to the +user and can not be disabled - think of it as a "meta" engine. As such, it +doesn't really have a sensible concept of ``store`` or ``tracker``. + +The engine is mainly responsible for keeping its own record current in the +``clients`` collection. Some parts of Sync use this collection to know what +other clients exist and when they last synced (although alot of this is moving +to using the Firefox Accounts devices). + +Clients also has the ability to handle ``commands`` - in short, some other +client can write to this client's ``commands``, and when this client notices, +it will execute the command. Commands aren't arbitrary, so commands must be +understood by both sides for them to work. There are commands to "wipe" +collections etc. In practice, this is used only by ``bookmarks`` when a device +restores bookmarks - in that case, the restoring device will send a ``wipe`` +command to all other clients so that they take the new bookmarks instead of +merging them. + +If not for this somewhat limited ``commands`` functionality, this engine could +be considered deprecated and subsumed by FxA devices - but because we +can't just remove support for commands and also do not have a plan for +replacing them, the clients engine remains important. + +Bookmarks +========= + +The ``bookmarks`` engine has changed so that it's tightly integrated with the +``places`` database. Instead of an external ``tracker``, the tracking is +integrated into Places. Each bookmark has a `syncStatus` and a +`syncChangeCounter` and these are managed internally by places. Sync then just +queries for changed bookmarks by looking for these fields. + +Bookmarks is somewhat unique in that it needs to maintain a tree structure, +which makes merging a challenge. The `dogear `_ +component (written in Rust and also used by the +`application-services bookmarks component `_) +performs this merging. + +Bookmarks also pioneered the concept of a "mirror" - this is a database table +which tracks exactly what is on the server. Because each sync only fetches +changes from the server since the last sync, each sync does not supply every +record on the server. However, the merging code does need to know what's on +the server - so the mirror tracks this. + +History +======= + +History is similar to bookmarks described above - it's closely integrated with +places - but is less complex because there's no tree structure involved. + +One unique characteristic of history is that the engine takes steps to *not* +upload everything - old profiles tend to have too much history to reasonably +store and upload, so typically uploads are limited to the last 5000 visits. + +Logins +====== + +Logins has also been upgraded to be closely integrated with `Services.logins` - +the logins component itself manages the metadata. + +Tabs +==== + +Tabs is a special engine in that there's no underlying storage at all - it +both saves the currently open tabs from this device (which are enumerated +every time it's updated) and also lets other parts of Firefox know which tabs +are open on other devices. There's no database - if we haven't synced yet we +don't know what other tabs are open, and when we do know, the list is just +stored in memory. + +The `SyncedTabs module `_ +is the main interface the browser uses to get the list of tabs from other +devices. + +Add-ons +======= + +Addons is still an "old school" engine, with a tracker and store which aren't +closely integrated with the addon manager. As a result it's fairly complex and +error prone - eg, it persists the "last known" state so it can know what to +sync, where a better model would be for the addon manager to track the changes +on Sync's behalf. + +It also attempts to sync themes etc. The future of this engine isn't clear given +it doesn't work on mobile platforms. + +Addresses / Credit-Cards +======================== + +Addresses and Credit-cards have Sync functionality tightly bound with the +store. Unlike other engines above, this engine has always been tightly bound, +because it was written after we realized this tight-binding was a feature and +not a bug. + +Technically these are 2 separate engines and collections. However, because the +underlying storage uses a shared implementation, the syncing also uses a +shared implementation - ie, the same logic is used for both - so we tend to +treat them as a single engine in practice. + +As a result, only a shim is in the `services/sync/modules/engines/` directory, +while the actual logic is +`next to the storage implementation `_. + +This engine has a unique twist on the "mirror" concept described above - +whenever a change is made to a fields, the original value of the field is +stored directly in the storage. This means that on the next sync, the value +of the record on the server can be deduced, meaning a "3-way" merge can be +done, so it can better tell the difference between local only, remote only, or +conflicting changes. + +WebExt-Storage +============== + +webext-storage is implemented in Rust and lives in +`application services `_ +and is vendored into the `addons code `_ - +note that this includes the storage *and* Sync code. The Sync engine itself +is a shim in the sync directory. + +See the :doc:`rust-engines` document for more about how rust engines are +integrated. diff --git a/services/sync/docs/external.rst b/services/sync/docs/external.rst new file mode 100644 index 0000000000..f7cebde32d --- /dev/null +++ b/services/sync/docs/external.rst @@ -0,0 +1,8 @@ +============== +External Links +============== + +Some external links that might be of interest: + +* `Information about the server APIs `_ +* `Some external Sync Client docs `_ diff --git a/services/sync/docs/index.rst b/services/sync/docs/index.rst new file mode 100644 index 0000000000..37ce3c19a0 --- /dev/null +++ b/services/sync/docs/index.rst @@ -0,0 +1,17 @@ +==== +Sync +==== + +This documents the sync implementation inside mozilla-central. It assumes +a general understanding of what Sync is and how it works at a high level - you +can find `some external docs `_ +which can help with this. + +.. toctree:: + :maxdepth: 1 + + overview + engines + rust-engines + payload-evolution + external diff --git a/services/sync/docs/overview.rst b/services/sync/docs/overview.rst new file mode 100644 index 0000000000..e956090d70 --- /dev/null +++ b/services/sync/docs/overview.rst @@ -0,0 +1,81 @@ +==================== +Introduction to Sync +==================== + +This document is a brief introduction to how Sync is implemented in desktop Firefox. + +General, Historical, Anatomy of a Sync Engine +============================================= + +This section describes how Sync used to work - and indeed, how much of it still +does. While we discuss how this is slowly changing, this context is valuable. + +For any datatype which syncs, there tends to be 3 parts: + +Store +----- + +The sync ``store`` interfaces with the actual Firefox desktop store. For example, +in the ``passwords`` engine, the "store" is that layer that talks to +``Services.logins`` + +Tracker +------- + +The ``tracker`` is what knows that something should be synced. For example, +when the user creates or updates a password, it is the tracker that knows +we should sync now, and what particular password(s) should be updated. + +This is typically done via "observer" notifications - ``Services.logins``, +``places`` etc all send specific notifications when certain events happen +(and indeed, some of these were added for Sync's benefit) + +Engine +------ + +The ``engine`` ties it all together. It works with the ``store`` and +``tracker`` and tracks its own metadata (eg, the timestamp of the passwords on +the server, so it knows how to grab just changed records and how to pass them +off to the ``store`` so the actual underlying storage can be updated. + +All of the above parts were typically in the +`services/sync/modules/engines directory `_ +directory and decoupled from the data they were syncing. + + +The Future of Desktop-Specific Sync Engines +=========================================== + +The system described above reflects the fact that Sync was "bolted on" to +Desktop Firefox relatively late - eg, the Sync ``store`` is decoupled from the +actual ``store``. This has causes a number of problems - particularly around +the ``tracker`` and the metadata used by the engine, and the fact that changes +to the backing store would often forget that Sync existed. + +Over the last few years, the Sync team has come to the conclusion that Sync +support must be integrated much closer to the store itself. For example, +``Services.logins`` should track when something has changed that would cause +an item to be synced. It should also track the metadata for the store so that +if (say) a corrupt database is recovered by creating a new, empty one, the +metadata should also vanish so Sync knows something bad has happened and can +recover. + +However, this is a slow process - currently the ``bookmarks``, ``history`` and +``passwords`` legacy engines have been improved so more responsibility is taken +by the stores. In all cases, for implementation reasons, the Sync +implementation still has a ``store``, but it tends to be a thin wrapper around +the actual underlying store. + +The Future of Cross-Platform Sync Engines +========================================= + +There are a number of Sync engines implemented in Rust and which live in the +application-services repository. While these were often done for mobile +platforms, the longer term hope is that they can be reused on Desktop. +:doc:`engines` has more details on these. + +While no existing engines have been replaced with Rust implemented engines, +the webext-storage engine is implemented in Rust via application-services, so +doesn't tend to use any of the infrastructure described above. + +Hopefully over time we will find more Rust-implemented engines in Desktop. diff --git a/services/sync/docs/payload-evolution.md b/services/sync/docs/payload-evolution.md new file mode 100644 index 0000000000..e195ee545d --- /dev/null +++ b/services/sync/docs/payload-evolution.md @@ -0,0 +1,168 @@ +# Handling the evolution of Sync payloads + +(Note that this document has been written in the format of an [application-services ADR](https://github.com/mozilla/application-services/blob/main/docs/adr/0000-use-markdown-architectural-decision-records.md) +but the relelvant teams decided that ultimately the best home for this doc is in mozilla-central) + +* Status: Accepted +* Deciders: sync team, credentials management team +* Date: 2023-03-15 + +Technical Story: +* https://github.com/mozilla/application-services/pull/5434 +* https://docs.google.com/document/d/1ToLOERA5HKzEzRVZNv6Ohv_2wZaujW69pVb1Kef2jNY + +## Context and Problem Statement + +Sync exists on all platforms (Desktop, Android, iOS), all channels (Nightly, Beta, Release, ESR) and is heavily used across all Firefox features. +Whenever there are feature changes or requests that potentially involve schema changes, there are not a lot of good options to ensure sync doesn’t break for any specific client. +Since sync data is synced from all channels, we need to make sure each client can handle the new data and that all channels can support the new schema. +Issues like [credit card failing on android and desktop release channels due to schema change on desktop Nightly](https://bugzilla.mozilla.org/show_bug.cgi?id=1812235) +are examples of such cases we can run into. +This document describes our decision on how we will support payload evolution over time. + +Note that even though this document exists in the application-services repository, it should +be considered to apply to all sync implementations, whether in this repository, in mozilla-central, +or anywhere else. + +## Definitions + +* A "new" Firefox installation is a version of Firefox which has a change to a Sync payload which + is not yet understood by "recent" versions. The most common example would be a Nightly version + of Firefox with a new feature not yet on the release channel. + +* A "recent" Firefox installation is a version older than a "new" version, which does not understand + or have support for new features in "new" versions, but which we still want to support without + breakage and without the user perceiving data-loss. This is typically accepted to mean the + current ESR version or later, but taking into account the slow update when new ESRs are released. + +* An "old" version is any version before what we consider "recent". + + +## Decision Drivers + +* It must be possible to change what data is carried by Sync to meet future product requirements. +* Both desktop and mobile platforms must be considered. +* We must not break "recent" Firefox installations when a "new" Firefox installation syncs, and vice-versa. +* Round-tripping data from a "new" Firefox installation through a "recent" Firefox installation must not discard any of the new data, and vice-versa. +* Some degree of breakage for "old" Firefox installations when "new" or "recent" firefoxes sync + might be considered acceptable if absolutely necessary. +* However, breakage of "new" or "recent" Firefoxes when an "old" version syncs is *not* acceptable. +* Because such evolution should be rare, we do not want to set an up-front policy about locking out + "old" versions just because they might have a problem in the future. That is, we want to avoid + a policy that dictates versions more than (say) 2 years old will break when syncing "just in case" +* Any solution to this must be achievable in a relatively short timeframe as we know of product + asks coming down the line which require this capability. + +## Considered Options + +* A backwards compatible schema policy, consisting of (a) having engines "round trip" data they + do not know about and (b) never changing the semantics of existing data. +* A policy which prevents "recent" clients from syncing, or editing data, or other restrictions. +* A formal schema-driven process. +* Consider the sync payloads frozen and never change them. +* Use separate collections for new data + +## Decision Outcome + +Chosen option: A backwards compatible schema policy because it is very flexible and the only option +meeting the decision drivers. + +## Pros and Cons of the Options + +### A backwards compatible schema policy + +A summary of this option is a policy by which: + +* Every sync engine must arrange to persist any fields from the payload which it + does not understand. The next time that engine needs to upload that record to the storage server, + it must arrange to add all such "unknown" fields back into the payload. + +* Different engines must identify different locations where this might happen. For example, the + `passwords` engine would identify the "root" of the payload, `addresses` and `creditcards` would + identify the `entry` sub-object in the payload, while the history engine would probably identify + *both* the root of the payload and the `visits` array. + +* Fields can not change type, nor be removed for a significant amount of time. This might mean + that "new" clients must support both new fields *and* fields which are considered deprecated + by these "new" clients because they are still used by "recent" versions. + +The pros and cons: + +* Good, because it meets the requirements. + +* Good, because the initial set of work identified is relatively simple to implement (that work + specifically is to support the round-tripping of "unknown" fields, in the hope that by the + time actual schema changes are proposed, this round-trip capability will then be on all "recent" + versions) + +* Bad, because the inability to deprecate or change existing fields means that + some evolution tasks become complicated. For example, consider a hypothetical change where + we wanted to change from "street/city/state" fields into a free-form "address" field. New + Firefox versions would need to populate *both* new and old fields when writing to the server, + and handle the fact that only the old versions might be updated when it sees an incoming + record written by a "recent" or "old" versions of Firefox. However, this should be rare. + +* Bad, because it's not possible to prove a proposed change meets the requirements - the policy + is informal and requires good judgement as changes are proposed. + +### A policy which prevents "recent" clients from syncing, or editing data + +Proposals which fit into this category might have been implemented by (say) adding +a version number to the schema, and if clients did not fully understand the schema it would +either prevent syncing the record, or sync it but not allow editing it, or similar. + +This was rejected because: + +* The user would certainly perceive data-loss if we ignored the incoming data entirely. +* If we still wanted older versions to "partially" see the record (eg, but disallow editing) we'd + still need most of the chosen option anyway - specifically, we could still never + deprecate fields etc. +* The UI/UX of trying to explain to the user why they can't edit a record was deemed impossible + to do in a satisfactory way. +* This would effectively penalize users who chose to use Nightly Firefoxes in any way. Simply + allowing a Nightly to sync would effectively break Release/Mobile Firefox versions. + +### A formal schema-driven process. + +Ideally we could formally describe schemas, but we can't come up with anything here which +works with the constraints of supporting older clients - we simply can't update older released +Firefoxes so they know how to work with the new schemas. We also couldn't come up with a solution +where a schema is downloaded dynamically which also allowed the *semantics* (as opposed to simply +validity) of new fields to be described. + +### Consider the sync payloads frozen and never change them. + +A process where payloads are frozen was rejected because: + +* The most naive approach here would not meet the needs of Firefox in the future. + +* A complicated system where we started creating new payload and new collections + (ie, freezing "old" schemas but then creating "new" schemas only understood by + newer clients) could not be conceived in a way that still met the requirements, + particularly around data-loss for older clients. For example, adding a credit-card + on a Nightly version but having it be completely unavailable on a release firefox + isn't acceptable. + +### Use separate collections for new data + +We could store the new data in a separate collection. For example define a +bookmarks2 collection where each record has the same guid as one in bookmarks alongside any new fields. +Newer clients use both collections to sync. + +The pros and cons: + +* Good, because it allows newer clients to sync new data without affecting recent or older clients +* Bad, because sync writes would lose atomicity without server changes. + We can currently write to a single collection in an atomic way, but don't have a way to write to multiple collections. +* Bad because this number of collections grows each time we want to add fields. +* Bad because it potentially leaks extra information to an attacker that gets access to the encrypted server records. + For example if we added a new collection for a single field, then the attacker could guess if that + field was set or not based on the size of the encrypted record. +* Bad because it's difficult to handle nested data with this approach, + for example adding a field to a history record visit. +* Bad because it has the same issue with dependent data as the chosen solution. + +## Links + +* This document was originally [brain-stormed in this google docs document](https://docs.google.com/document/d/1ToLOERA5HKzEzRVZNv6Ohv_2wZaujW69pVb1Kef2jNY), + which may be of interest for historical context, but should not be considered part of this ADR. diff --git a/services/sync/docs/rust-engines.rst b/services/sync/docs/rust-engines.rst new file mode 100644 index 0000000000..af00fd6619 --- /dev/null +++ b/services/sync/docs/rust-engines.rst @@ -0,0 +1,37 @@ +================================ +How Rust Engines are implemented +================================ + +There are 2 main components to engines implemented in Rust + +The bridged-engine +================== + +Because Rust engines still need to work with the existing Sync infrastructure, +there's the concept of a `bridged-engine `_. +In short, this is just a shim between the existing +`Sync Service `_ +and the Rust code. + +The bridge +========== + +`"Golden Gate" `_ +is a utility to help bridge any Rust implemented Sync engines with desktop. In +other words, it's a "rusty bridge" - get it? Get it? Yet another of Lina's puns +that live on! + +One of the key challenges with integrating a Rust Sync component with desktop +is the different threading models. The Rust code tends to be synchronous - +most functions block the calling thread to do the disk or network IO necessary +to work - it assumes that the consumer will delegate this to some other thread. + +So golden_gate is this background thread delegation for a Rust Sync engine - +gecko calls golden-gate on the main thread, it marshalls the call to a worker +thread, and the result is marshalled back to the main thread. + +It's worth noting that golden_gate is just for the Sync engine part - other +parts of the component (ie, the part that provides the functionality that's not +sync related) will have its own mechanism for this. For example, the +`webext-storage bridge `_ +uses a similar technique `which has some in-depth documentation <../../toolkit/components/extensions/webextensions/webext-storage.html>`_. -- cgit v1.2.3