diff options
Diffstat (limited to 'toolkit/components/telemetry/docs/collection/origin.rst')
-rw-r--r-- | toolkit/components/telemetry/docs/collection/origin.rst | 166 |
1 files changed, 166 insertions, 0 deletions
diff --git a/toolkit/components/telemetry/docs/collection/origin.rst b/toolkit/components/telemetry/docs/collection/origin.rst new file mode 100644 index 0000000000..0d0b211c71 --- /dev/null +++ b/toolkit/components/telemetry/docs/collection/origin.rst @@ -0,0 +1,166 @@ +.. _origintelemetry: + +================ +Origin Telemetry +================ + +*Origin Telemetry* is an experimental Firefox Telemetry mechanism that allows us to privately report origin-specific information in aggregate. +In short, it allows us to get exact counts of how *many* Firefox clients do certain things on specific origins without us being able to know *which* clients were doing which things on which origins. + +As an example, Content Blocking would like to know which trackers Firefox blocked most frequently. +Origin Telemetry allows us to count how many times a given tracker is blocked without being able to find out which clients were visiting pages that had those trackers on them. + +.. important:: + + This mechanism is experimental and is a prototype. + Please do not try to use this without explicit permission from the Firefox Telemetry Team, as it's really only been designed to work for Content Blocking right now. + +Adding or removing Origins or Metrics is not supported in artifact builds and build faster workflows. A non-artifact Firefox build is necessary to change these lists. + +This mechanism is enabled on Firefox Nightly only at present. + +.. important:: + + Every new or changed data collection in Firefox needs a `data collection review <https://wiki.mozilla.org/Firefox/Data_Collection>`__ from a Data Steward. + +Privacy +======= + +To achieve the necessary goal of getting accurate counts without being able to learn which clients contributed to the counts we use a mechanism called `Prio (pdf) <https://www.usenix.org/system/files/conference/nsdi17/nsdi17-corrigan-gibbs.pdf>`_. + +Prio uses cryptographic techniques to encrypt information and a proof that the information is correct, only sending the encrypted information on to be aggregated. +Only after aggregation do we learn the information we want (aggregated counts), and at no point do we learn the information we don't want (which clients contributed to the counts). + +.. _origin.usage: + +Using Origin Telemetry +====================== + +To record that something happened on a given origin, three things must happen: + +1. The origin must be one of the fixed, known list of origins. ("Where" something happened) +2. The metric must be one of the fixed, known list of metrics. ("What" happened) +3. A call must be made to the Origin Telemetry API. (To let Origin Telemetry know "that" happened "there") + +At present the lists of origins and metrics are hardcoded in C++. +Please consult the Firefox Telemetry Team before changing these lists. + +Origins can be arbitrary byte sequences of any length. +Do not add duplicate origins to the list. + +If an attempt is made to record to an unknown origin, a meta-origin ``__UNKNOWN__`` captures that it happened. +Unlike other origins where multiple recordings are considered additive ``__UNKNOWN__`` only accumulates a single value. +This is to avoid inflating the ping size in case the caller submits a lot of unknown origins for a given unit (e.g. pageload). + +Metrics should be of the form ``categoryname.metric_name``. +Both ``categoryname`` and ``metric_name`` should not exceed 40 bytes (UTF-8 encoded) in length and should only contain alphanumeric character and infix underscores. + +.. _origin.API: + +API +=== + +Origin Telemetry supplies APIs for recording information into and snapshotting information out of storage. + +Recording +--------- + +``Telemetry::RecordOrigin(aOriginMetricID, aOrigin);`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This C++ API records that a metric was true for a given origin. +For instance, maybe the user visited a page in which content from ``example.net`` was blocked. +That call might look like ``Telemetry::RecordOrigin(OriginMetricID::ContentBlocking_Blocked, "example.net"_ns)``. + +Snapshotting +------------ + +``let snapshot = await Telemetry.getEncodedOriginSnapshot(aClear);`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This JS API provides a snapshot of the prio-encoded payload and is intended to only be used to assemble the :doc:`"prio" ping's <../data/prio-ping>` payload. +It returns a Promise which resolves to an object of the form: + +.. code-block:: js + + { + a: <base64-encoded, prio-encoded data>, + b: <base64-encoded, prio-encoded data>, + } + +``let snapshot = Telemetry.getOriginSnapshot(aClear);`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This JS API provides a snapshot of the unencrypted storage of unsent Origin Telemetry, optionally clearing that storage. +It returns a structure of the form: + +.. code-block:: js + + { + "categoryname.metric_name": { + "origin1": count1, + "origin2": count2, + ... + }, + ... + } + +.. important:: + + This API is only intended to be used by ``about:telemetry`` and tests. + +.. _origin.example: + +Example +======= + +Firefox Content Blocking blocks web content from certain origins present on a list. +Users can exempt certain origins from being blocked. +To improve Content Blocking's effectiveness we need to know these two "what's" of information about that list of "wheres". + +This means we need two metrics ``contentblocking.blocked`` and ``contentblocking.exempt`` (the "what's"), and a list of origins (the "wheres"). + +Say "example.net" was blocked and "example.com" was exempted from blocking. +Content Blocking calls ``Telemetry::RecordOrigin(OriginMetricID::ContentBlocking_Blocked, "example.net"_ns))`` and ``Telemetry::RecordOrigin(OriginMetricID::ContentBlocking_Exempt, "example.com"_ns)``. + +At this time a call to ``Telemetry.getOriginSnapshot()`` would return: + +.. code-block:: js + + { + "contentblocking.blocked": {"example.net": 1}, + "contentblocking.exempt": {"example.com": 1}, + } + +Later, Origin Telemetry will get the encoded snapshot (clearing the storage) and assemble it with other information into a :doc:`"prio" ping <../data/prio-ping>` which will then be submitted. + +.. _origin.encoding: + +Encoding +======== + +.. note:: + + This section is provided to help you understand the client implementation's architecture. + If how we arranged our code doesn't matter to you, feel free to ignore. + +There are three levels of encoding in Origin Telemetry: App Encoding, Prio Encoding, and Base64 Encoding. + +*App Encoding* is the process by which we turn the Metrics and Origins into data structures that Prio can encrypt for us. +Prio, at time of writing, only supports counting up to 2046 "true/false" values at a time. +Thus, from the example, we need to turn "example.net was blocked" into "the boolean at index 11 of chunk 2 is true". +This encoding can be done any way we like so long as we don't change it without informing the aggregation servers (by sending it a new :ref:`encoding name <prio-ping.encoding>`). +This encoding provides no privacy benefit and is just a matter of transforming the data into a format Prio can process. + +*Prio Encoding* is the process by which those ordered true/false values that result from App Encoding are turned into an encrypted series of bytes. +You can `read the paper (pdf) <https://www.usenix.org/system/files/conference/nsdi17/nsdi17-corrigan-gibbs.pdf>`_ to learn more about that. +This encoding, together with the overall system architecture, is what provides the privacy quality to Origin Telemetry. + +*Base64 Encoding* is how we turn those encrypted bytes into a string of characters we can send over the network. +You can learn more about Base64 encoding `on wikipedia <https://wikipedia.org/wiki/Base64>`_. +This encoding provides no privacy benefit and is just used to make Data Engineers' lives a little easier. + +Version History +=============== + +- Firefox 68: Initial Origin Telemetry support (Nightly Only) (`bug 1536565 <https://bugzilla.mozilla.org/show_bug.cgi?id=1536565>`_). |