diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 09:22:09 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 09:22:09 +0000 |
commit | 43a97878ce14b72f0981164f87f2e35e14151312 (patch) | |
tree | 620249daf56c0258faa40cbdcf9cfba06de2a846 /toolkit/components/telemetry/docs/start | |
parent | Initial commit. (diff) | |
download | firefox-upstream.tar.xz firefox-upstream.zip |
Adding upstream version 110.0.1.upstream/110.0.1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'toolkit/components/telemetry/docs/start')
3 files changed, 439 insertions, 0 deletions
diff --git a/toolkit/components/telemetry/docs/start/adding-a-new-probe.rst b/toolkit/components/telemetry/docs/start/adding-a-new-probe.rst new file mode 100644 index 0000000000..f1baeadf82 --- /dev/null +++ b/toolkit/components/telemetry/docs/start/adding-a-new-probe.rst @@ -0,0 +1,153 @@ +============================ +Adding a new Telemetry probe +============================ + +In Firefox, the Telemetry system collects various measures of Firefox performance, hardware, usage and customizations and submit it to Mozilla. This article provides an overview of what is needed to add any new Telemetry data collection. + +.. important:: + + Every new data collection in Firefox needs a `data collection review <https://wiki.mozilla.org/Firefox/Data_Collection#Requesting_Approval>`__ from a data collection peer. Just set the feedback? flag for one of the data peers. They try to reply within a business day. + +What is your goal? +================== + +We have various :doc:`data collection tools <../collection/index>` available, each serving different needs. Before diving right into technical details, it is best to take a step back and consider what you need to achieve. + +Your goal could be to answer product questions like “how many people use feature X?” or “what is the error rate of service Y?”. +You could also be focused more on answering engineering questions, say “which web features are most used?” or “how is the performance of engine Z?”. + +From there, questions you should ask are: + +- What is the minimum data that can answer your questions? +- How many people do you need this data from? +- Is data from the pre-release channels sufficient? + +This also informs the `data collection review <https://wiki.mozilla.org/Firefox/Data_Collection>`__, which requires a plan for how to use the data. Data collection review is required for all new data collection. + +Data collection levels +====================== + +Most of our data collection falls into one of two levels, *release* and *pre-release*. + +**Release data** is recorded by default on all channels, users need to explicitly opt out to disable it. This has `stricter constraints <https://wiki.mozilla.org/Firefox/Data_Collection#Requirements>`_ for what data we can collect. "Most" users submit this data. + +**Pre-release data** is not recorded on release, but is collected by default on our pre-release channels (Beta and Nightly), so it can be biased. + +These levels cover what is described in the `Firefox privacy notice <https://www.mozilla.org/en-US/privacy/firefox/>`_. For other needs, there might be custom mechanisms that clearly require user opt-in and show what data is collected. + +Rich data & aggregate data +========================== + +For the recording and transmission of data, we have various data types available. We can divide these data types into two large groups. + +**Aggregate data** is aggregated on the client-side and cheap to send, process and analyze. This could e.g. be a simple count of tab opens or a histogram showing how long it takes to switch between tabs. This should be your default choice and is well supported in our analysis tools. + +**Rich data** is used when questions can not be answered from aggregate data. When we send more detailed data we can e.g. see when a specific UI interaction happened and in which context. + +As a general rule, you can inform the choice of data types from your goals like this: + ++------------------------+-----------------+-----------------------+ +| Goals | Collection type | Implementation | ++========================+=================+=======================+ +| On-going monitoring | Aggregate data | Histograms | +| | | | +| Health tracking | | Scalars | +| | | | +| KPI impact | | Environment data | ++------------------------+-----------------+-----------------------+ +| Detailed user behavior | Rich data | Event Telemetry | +| | | | +| Funnel analysis | | Detailed custom pings | +| | | | +| Diagnostics | | Logs | +| | | | +| | | Crash data | ++------------------------+-----------------+-----------------------+ + +Aggregate data +-------------- + +Most of our data collection happens through :doc:`scalars <../collection/scalars>` and :doc:`histograms <../collection/histograms>`: + +- Scalars allow collection of simple values, like counts, booleans and strings. +- Histograms allow collection of multiple different values, but aggregate them into a number of buckets. Each bucket has a value range and a count of how many values we recorded. + +Both scalars & histograms allow recording by keys. This allows for more flexible, two-level data collection. + +Other collections can build on top of scalars & histograms. An example is :doc:`use counters <../collection/use-counters>`, which submit web feature usage through histograms. + +We also collect :doc:`environment data <../data/environment>`. This consists of mostly scalar values that capture the “working environment” a Firefox session lives in, and includes e.g. data on hardware, OS, add-ons and some settings. Any data that is part of the "working environment", or needs to split :doc:`subsessions <../concepts/sessions>`, should go into it. + +Rich data +--------- + +Aggregate data can tell you that something happened, but is usually lacking details about what exactly. When more details are needed, we can collect them using other tools that submit less efficient data. This usually means that we can't enable the data collection for all users, for cost and performance concerns. + +There are multiple mechanisms to collect rich data: + +**Stack collection** helps with e.g. diagnosing hangs. Stack data is recorded into chrome hangs and threadhang stats. To diagnose where rarely used code is called from, you can use stack capturing. + +:doc:`Event Telemetry <../collection/events>` provides a way to record both when and what happened. This enables e.g. funnel analysis for usage. + +:doc:`Custom pings <../collection/custom-pings>` are used when other existing data collection does not cover your need. Submitting a custom ping enables you to submit your own JSON package that will be delivered to the Telemetry servers. However, this loses you access to existing tooling and makes it harder to join your data with other sources. + +Setup & building +================ + +Every build of Firefox has Telemetry enabled. Local developer builds with no custom build flags will record all Telemetry data, but not send it out. + +When adding any new scalar, histogram or event Firefox needs to be built. Artifact builds are currently not supported, even if code changes are limited to JavaScript. + +Usually you don't need to send out data to add new Telemetry. In the rare event you do, you need the following in your *.mozconfig*:: + + MOZ_TELEMETRY_REPORTING=1 + MOZILLA_OFFICIAL=1 + +Testing +======= + +Local confirmation +------------------ + +Your first step should always be to confirm your new data collection locally. + +The *about:telemetry* page allows to view any data you submitted to Telemetry in the last 60 days, whether it is in existing pings or in new custom pings. You can choose which pings to display on the top-left. + +If you need to confirm when - or if - pings are getting sent, you can run an instance of the `gzipServer <https://github.com/mozilla/gzipServer>`_ locally. It emulates roughly how the official Telemetry servers respond, and saves all received pings to disk for inspection. + +Test coverage +------------- + +Any data collection that you need to base decisions on needs to have test coverage. Using JS, you can access the recorded values for your data collection. You can use the following functions: + +- for scalars, `getSnapshotForScalars() <https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/toolkit/components/telemetry/core/nsITelemetry.idl#90-102>`_ + or `getSnapshotForKeyedScalars() <https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/toolkit/components/telemetry/core/nsITelemetry.idl#104-116>`_ +- for histograms, `getSnapshotForHistograms() <https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/toolkit/components/telemetry/core/nsITelemetry.idl#54-74>`_ + or `getSnapshotForKeyedHistograms() <https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/toolkit/components/telemetry/core/nsITelemetry.idl#76-88>`_ + + * Optionally, histogram objects have a `snapshot() <https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/toolkit/components/telemetry/core/nsITelemetry.idl#285-287,313-315>`_ method. + +- for events, `snapshotEvents() <https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/toolkit/components/telemetry/core/nsITelemetry.idl#542-558>`_ + +If you need to test that pings were correctly passed to Telemetry, you can use `TelemetryArchiveTesting <https://searchfox.org/mozilla-central/search?q=TelemetryArchiveTesting&redirect=false>`_. + +Validation +---------- + +While it's important to confirm that the data collection works on your machine, the Firefox user population is very diverse. Before basing decisions on any new data, it should be validated. This could take various forms. + +For *new data collection* using existing Telemetry data types, the transport mechanism is already tested. It is sufficient to validate the incoming values. This could happen through `Redash <https://docs.telemetry.mozilla.org/tools/stmo.html>`_ or through `custom analysis <https://docs.telemetry.mozilla.org/tools/spark.html>`_. + +For *new custom pings*, you'll want to check schema validation results, as well as that the contents look valid. + +Getting help +============ + +You can find all important Telemetry resources listed on `telemetry.mozilla.org <https://telemetry.mozilla.org/>`_. + +The Telemetry team is there to help with any problems. You can reach us via: + +- Matrix in `#telemetry:mozilla.org <https://chat.mozilla.org/#/room/#telemetry:mozilla.org>`_ +- Slack in `#data-help <https://mozilla.slack.com/messages/data-help/>`_ +- the `fx-data-dev mailing list <https://mail.mozilla.org/listinfo/fx-data-dev>`_ +- flags for `one of the peers <https://wiki.mozilla.org/Modules/Toolkit#Telemetry>`_ on Bugzilla or send us an e-mail diff --git a/toolkit/components/telemetry/docs/start/index.rst b/toolkit/components/telemetry/docs/start/index.rst new file mode 100644 index 0000000000..2b536b28a7 --- /dev/null +++ b/toolkit/components/telemetry/docs/start/index.rst @@ -0,0 +1,28 @@ +=============== +Getting started +=============== + +If you are interested in extending data collection by adding new probes have a look at + +.. toctree:: + :maxdepth: 2 + :titlesonly: + :glob: + + adding-a-new-probe + report-gecko-telemetry-in-glean + +If you want to work with the telemetry code itself, for example to fix a bug, it is often helpful to start with these steps: + +1. Have a look at about:telemetry to see which data is being collected and sent. Note that Origin Telemetry is missing here. +2. Increase the log level in about:config by setting toolkit.telemetry.log.level to Debug or Trace. This will show telemetry information in the browser console. To enable the browser console follow `these instructions <../../../../devtools-user/browser_console/index.html>`__. +3. Run a local telemetry receiver, e.g. `this one <https://github.com/mozilla/gzipServer>`__ and set ``toolkit.telemetry.server`` to “localhost” (Like the next preference this needs a restart.) +4. Set ``toolkit.telemetry.send.overrideOfficialCheck = true``, otherwise local debug builds will not send telemetry data. (Requires restart.) + +More information about the internals can be found `here <../internals/index.html>`__. + +Further Reading +############### + +* `Telemetry Portal <https://telemetry.mozilla.org/>`_ - Discover all important resources for working with data +* `Telemetry Data Documentation <https://docs.telemetry.mozilla.org/>`_ - Find what data is available & how to use it diff --git a/toolkit/components/telemetry/docs/start/report-gecko-telemetry-in-glean.rst b/toolkit/components/telemetry/docs/start/report-gecko-telemetry-in-glean.rst new file mode 100644 index 0000000000..e483d07548 --- /dev/null +++ b/toolkit/components/telemetry/docs/start/report-gecko-telemetry-in-glean.rst @@ -0,0 +1,258 @@ +======================================================= +How to report Gecko Telemetry in engine-gecko via Glean +======================================================= + +In Gecko, the `Telemetry <../index.html>`__ system collects various measures of Gecko performance, hardware, usage and customizations. +When the Gecko engine is embedded in Android products through any of the `engine-gecko-* <https://github.com/mozilla-mobile/android-components/tree/master/components/browser>`__ components of `Android Components <https://mozac.org/>`__ (there is one component for each Gecko channel), +and the product is also using the `Glean SDK <https://docs.telemetry.mozilla.org/concepts/glean/glean.html>`__ for data collection, then Gecko metrics can be reported in `Glean pings <https://mozilla.github.io/glean/book/user/pings/index.html>`__. +This article provides an overview of what is needed to report any existing or new Telemetry data collection in Gecko to Glean. + +.. important:: + + Every new or changed data collection in Firefox needs a `data collection review <https://wiki.mozilla.org/Firefox/Data_Collection>`__ from a Data Steward. + +Overview +======== +Histograms are reported out of Gecko with a mechanism called `streaming Telemetry <../internals/geckoview-streaming.html>`__. +This mechanism intercepts Gecko calls to tagged histograms and batches and bubbles them up through the `the GeckoView RuntimeTelemetry delegate <https://mozilla.github.io/geckoview/javadoc/mozilla-central/index.html>`__. +The ``engine-gecko-*`` components provide implementations of the delegate which dispatches Gecko metrics to the Glean SDK. + +Reporting an existing histogram +=============================== +Exfiltrating existing histograms is a relatively straightforward process made up of a few small steps. + +Tag histograms in ``Histograms.json`` +------------------------------------- +Accumulations to non-tagged histograms are ignored if streaming Telemetry is enabled. +To tag a histogram you must add the `geckoview_streaming` product to the :ref:`products list <histogram-products>` in the `Histograms.json file <https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/Histograms.json>`__ . + +Add Glean metrics to ``metrics.yaml`` +------------------------------------- +The Glean SDK provides a number of `higher level metric types <https://mozilla.github.io/glean/book/user/metrics/index.html>`__ to map Gecko histogram metrics to. +However, Gecko histograms lack the metadata to infer the Glean SDK destination type manually. +For this reason, engineers must pick the most appropriate Gecko SDK type themselves. + +Read more about how to add Glean SDK metrics to the `metrics.yaml file <https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/geckoview/streaming/metrics.yaml>`__ in the `Glean SDK documentation <https://mozilla.github.io/glean/book/user/adding-new-metrics.html>`__. + +.. important:: + + Every new or changed data collection in Firefox needs a `data collection review <https://wiki.mozilla.org/Firefox/Data_Collection>`__ from a Data Steward. + +Example: reporting ``CHECKERBOARD_DURATION`` +-------------------------------------------- +The first step is to add the relevant tag (i.e. ``geckoview_streaming``) to the histogram's ``products`` key in the ``Histograms.json`` file. + +.. code-block:: json + + { + "CHECKERBOARD_DURATION": { + "record_in_processes": ["main", "content", "gpu"], + "products": ["firefox", "geckoview_streaming", "thunderbird"], + "alert_emails": ["gfx-telemetry-alerts@mozilla.com", "somebody@mozilla.com"], + "bug_numbers": [1238040, 1539309], + "releaseChannelCollection": "opt-out", + "expires_in_version": "73", + "kind": "exponential", + "high": 100000, + "n_buckets": 50, + "description": "Duration of a checkerboard event in milliseconds" + }, + } + +.. note:: + + Histograms with ``"releaseChannelCollection": "opt-in"``, or without a ``releaseChannelCollection`` specified in its definition are only collected on Gecko built for ``"nightly"`` and ``"beta"`` channels. + +Since this is a timing distribution, with a milliseconds time unit, it can be added as follows to the ``metrics.yaml`` file: + +.. code-block:: yaml + + gfx.content.checkerboard: + duration: + type: timing_distribution + time_unit: millisecond + gecko_datapoint: CHECKERBOARD_DURATION + description: | + Duration of a checkerboard event. + bugs: + - 1238040 + - 1539309 + data_reviews: + - https://example.com/data-review-url-example + notification_emails: + - gfx-telemetry-alerts@mozilla.com + - somebody@mozilla.com + expires: 2019-12-09 # Gecko 73 + +Please note that the ``gecko_datapoint`` property will need to point to the name of the histogram exactly as written in the ``Histograms.json`` file. It is also important to note that ``time_unit`` needs to match the unit of the values that are recorded. + +Example: recording without losing process information +----------------------------------------------------- +If a histogram is being recorded in multiple processes, care must be taken to guarantee that data always comes from the same process throughout the lifetime of a Gecko instance, +otherwise all the data will be added to the same Glean SDK metric. +If process exclusivity cannot be guaranteed, then a histogram (and the respective Glean SDK metric) must be created for each relevant process. +Consider the ``IPC_MESSAGE_SIZE2`` histogram: + +.. code-block:: json + + { + "IPC_MESSAGE_SIZE2": { + "record_in_processes": ["main", "content", "gpu"], + "products": ["firefox", "thunderbird"], + "alert_emails": ["hchang@mozilla.com"], + "bug_numbers": [1353159], + "expires_in_version": "60", + "kind": "exponential", + "high": 8000000, + "n_buckets": 50, + "keyed": false, + "description": "Measures the size of all IPC messages sent that are >= 4096 bytes." + }, + } + +Data for this histogram could come, at the same time, from the ``"main"``, ``"content"`` and ``"gpu"`` processes, since it is measuring IPC itself. +By adding the ``geckoview_streaming`` product, data coming from all the processes would flow in the same Glean SDK metric and would loose the information about the process it came from. +This problem can be solved by creating three histograms, one for each originating process. +Here is, for example, the histogram for the GPU process: + +.. code-block:: json + + { + "IPC_MESSAGE_SIZE2_GPU": { + "record_in_processes": ["gpu"], + "products": ["geckoview_streaming"], + "alert_emails": ["hchang@mozilla.com"], + "bug_numbers": [1353159], + "expires_in_version": "60", + "kind": "exponential", + "high": 8000000, + "n_buckets": 50, + "description": "Measures the size of all IPC messages sent that are >= 4096 bytes." + }, + } + +And the related Glean SDK metric + + +.. code-block:: yaml + + ipc.message: + gpu_size: + type: memory_distribution + memory_unit: byte + gecko_datapoint: IPC_MESSAGE_SIZE2_GPU + description: | + Measures the size of the IPC messages from/to the GPU process that are >= 4096 bytes. + bugs: + - 1353159 + data_reviews: + - https://example.com/data-review-url-example + notification_emails: + - hchang@mozilla.com + expires: 2019-12-09 # Gecko 73 + +The ``ipc.message.gpu_size`` metric in the Glean SDK will now contain all the data coming exclusively from the GPU process. +Similar definitions can be used for the other processes. + +Reporting a scalar +================== +Exfiltrating existing boolean, string or uint scalars, or adding new ones, is a relatively straightforward process made up of a few small steps. + +Tag scalars in ``Scalars.yaml`` +---------------------------------- +Accumulations to non-tagged scalars are ignored if streaming Telemetry is enabled. +To tag a scalar you must add the `geckoview_streaming` product to the :ref:`products list <scalars-required-fields>` in the `Scalars.yaml file <https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/Scalars.yaml>`__ . + +Add Glean metrics to ``metrics.yaml`` +------------------------------------- +The Glean SDK provides the `Quantity <https://mozilla.github.io/glean/book/user/metrics/quantity.html>`__, `Boolean <https://mozilla.github.io/glean/book/user/metrics/boolean.html>`__ and `String <https://mozilla.github.io/glean/book/user/metrics/string.html>`__ metric types to map Gecko scalars to. +However, Gecko scalars lack the metadata to infer the Glean SDK destination type manually. +For this reason, engineers must pick the most appropriate Gecko SDK type themselves. + +Read more about how to add Glean SDK metrics to the `metrics.yaml file <https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/geckoview/streaming/metrics.yaml>`__ in the `Glean SDK documentation <https://mozilla.github.io/glean/book/user/adding-new-metrics.html>`__. + +.. important:: + + Every new or changed data collection in Firefox needs a `data collection review <https://wiki.mozilla.org/Firefox/Data_Collection>`__ from a Data Steward. + +Example: reporting the display width from Gecko +----------------------------------------------- +The first step is to add the relevant Gecko scalar with its streaming telemetry tag (i.e. ``geckoview_streaming``) in the ``Scalars.yaml`` file. + +.. code-block:: yaml + + gfx.info: + display_width: + bug_numbers: + - 1514840 + description: > + The width of the main display as detected by Gecko. + kind: uint + expires: never + notification_emails: + - gfx-telemetry-alerts@mozilla.com + - rhunt@mozilla.com + products: + - 'firefox' + - 'geckoview_streaming' + - 'thunderbird' + record_in_processes: + - 'main' + +.. note:: + + Scalars with ``"release_channel_collection": "opt-in"``, or without a ``release_channel_collection`` specified in its definition are only collected on Gecko built for ``"nightly"`` and ``"beta"`` channels. + +Since this is a uint scalar, it can be added as follows to the ``metrics.yaml`` file: + +.. code-block:: yaml + + gfx.display: + width: + type: quantity + description: The width of the display, in pixels. + unit: pixels + gecko_datapoint: gfx.info.display_width + description: | + Duration of a checkerboard event. + bugs: + - 1514840 + data_reviews: + - https://example.com/data-review-url-example + notification_emails: + - gfx-telemetry-alerts@mozilla.com + - rhunt@mozilla.com + expires: never + +Please note that the ``gecko_datapoint`` property will need to point to the name of the scalar exactly as written in the ``Scalars.yaml`` file. + +How to access the data? +======================= +Once a new build of Gecko will be provided through `Maven <https://maven.mozilla.org/?prefix=maven2/org/mozilla/geckoview>`__, the Android Components team will automatically pick it up. +Because the Gecko train model has three channels, there are three ``engine-gecko-*`` components, one per Gecko channel: `"engine-gecko-nigthly" <https://github.com/mozilla-mobile/android-components/tree/master/components/browser/engine-gecko-nightly>`__, `"engine-gecko-beta" <https://github.com/mozilla-mobile/android-components/tree/master/components/browser/engine-gecko-beta>`__ and `engine-gecko <https://github.com/mozilla-mobile/android-components/tree/master/components/browser/engine-gecko>`__. + +The availability of the metric in the specific product's dataset depends on which channel the application is using. +For example, if Fenix Release depends on the ``engine-gecko (release)`` channel, then the registry file additions need to be available on the Release channel for Gecko in order for them to be exposed in Fenix. + +Unless `Glean custom pings <https://mozilla.github.io/glean/book/user/pings/custom.html>`__ are used, all the metrics are reported through the `Glean metrics ping <https://mozilla.github.io/glean/book/user/pings/metrics.html>`__. + +Testing your metrics +==================== +At this time, the procedure for testing that metrics are correctly exfiltrated from GeckoView to Glean SDK-enabled products is a bit involved. + +1. After adding your metric as described in the previous section, substitute the locally built GeckoView in your local copy of `Android Components <https://github.com/mozilla-mobile/android-components/>`__ as described in the `GeckoView docs <https://mozilla.github.io/geckoview/contributor/geckoview-quick-start#dependency-substiting-your-local-geckoview-into-a-mozilla-project>`__. +2. In Android Components, follow the `instructions to enable upload <https://github.com/mozilla-mobile/android-components/tree/master/samples/browser#glean-sdk-support>`__ in the `samples-browser` application. +3. Build Android Components and the `samples-browser` application. +4. Use the Glean SDK `debugging features <https://mozilla.github.io/glean/book/user/debugging/index.html>`__ to either dump the `metrics` ping or send it to the `Glean Debug View <https://docs.telemetry.mozilla.org/concepts/glean/debug_ping_view.html>`__. + +.. note:: + + It is important to substitute GeckoView in Android Components, even if it's possible to substitute it directly in the final product. This is because the bulk of the processing happens in Android Components, in the `engine-gecko-*` components wrapping GeckoView. + +Unsupported features +==================== +This is the list of the currently unsupported features: + +* :ref:`keyed scalars <scalars-keyed-scalars>` are not supported and there are no future plans for supporting them; +* uint scalar operations other than :ref:`set <scalars-c++-API>` are not supported and there are no future plans for supporting them. +* :ref:`events <eventtelemetry>` are not supported and there are no future plans for supporting them. |