Bug 1959435 - Document crash ping end-to-end lifecycle r=gsvelto
Differential Revision: https://phabricator.services.mozilla.com/D246756
This commit is contained in:
committed by
afranchuk@mozilla.com
parent
1a39c1465a
commit
2944efa928
104
toolkit/components/crashes/docs/crash-ping-lifecycle.rst
Normal file
104
toolkit/components/crashes/docs/crash-ping-lifecycle.rst
Normal file
@@ -0,0 +1,104 @@
|
||||
====================
|
||||
Crash Ping Lifecycle
|
||||
====================
|
||||
|
||||
Crash pings and derived data go through a number of separate programs and
|
||||
services. To get a better idea of how these components interact, a breakdown of
|
||||
the lifecycle is presented here.
|
||||
|
||||
This description applies to Glean crash ping data.
|
||||
|
||||
|
||||
Origin
|
||||
======
|
||||
When a crash occurs, Glean metrics are populated and a Glean crash ping is sent with the data. This
|
||||
is ingested and made available in BigQuery through the usual Glean infrastructure.
|
||||
|
||||
Ping Definitions
|
||||
----------------
|
||||
* `Desktop crash ping <https://dictionary.telemetry.mozilla.org/apps/firefox_desktop/pings/crash>`_
|
||||
* `metrics definition
|
||||
<https://searchfox.org/mozilla-central/source/toolkit/components/crashes/metrics.yaml>`_
|
||||
* `ping definition
|
||||
<https://searchfox.org/mozilla-central/source/toolkit/components/crashes/pings.yaml>`_
|
||||
* `Fenix crash ping <https://dictionary.telemetry.mozilla.org/apps/fenix/pings/crash>`_
|
||||
* `metrics definition
|
||||
<https://searchfox.org/mozilla-central/source/mobile/android/android-components/components/lib/crash/metrics.yaml>`_
|
||||
* `ping definition
|
||||
<https://searchfox.org/mozilla-central/source/mobile/android/android-components/components/lib/crash/pings.yaml>`_
|
||||
|
||||
BigQuery Tables
|
||||
---------------
|
||||
* Desktop view: ``firefox_desktop.crash``.
|
||||
* Crashreporter client view: ``firefox_crashreporter.crash``. This uses the same metrics/ping definitions
|
||||
as desktop.
|
||||
* Combined desktop/crashreporter client view: ``firefox_desktop.desktop_crashes``.
|
||||
* Fenix view: ``fenix.crash``. This ping has a few different metrics, but is overall very similar to
|
||||
the desktop ping. As a result, it's a little verbose to combine fenix and desktop pings in a
|
||||
query, however most metrics exist in both with the same name.
|
||||
|
||||
**NOTE**: When querying the source data, you should always use the `crash.app_channel`,
|
||||
`crash.app_display_version`, and `crash.app_build` metrics rather than the similarly named fields of
|
||||
the Glean `client_info` struct. These values correspond to the application information *at the time
|
||||
of the crash*, and moreover the crash reporter client can't fully populate the client_info.
|
||||
|
||||
Source
|
||||
------
|
||||
All crash ping metrics are set in bulk at the same time, and typically come directly from `crash annotations <https://searchfox.org/mozilla-central/source/toolkit/crashreporter/CrashAnnotations.yaml>`_:
|
||||
* `Desktop <https://searchfox.org/mozilla-central/rev/b598575345077063c55b618e43ccaa6249505d02/toolkit/components/crashes/CrashManager.in.sys.mjs#787>`_
|
||||
* `Crashreporter client <https://searchfox.org/mozilla-central/rev/b598575345077063c55b618e43ccaa6249505d02/toolkit/crashreporter/client/app/src/net/ping/glean.rs#11>`_
|
||||
* `Fenix <https://searchfox.org/mozilla-central/rev/b598575345077063c55b618e43ccaa6249505d02/mobile/android/android-components/components/lib/crash/src/main/java/mozilla/components/lib/crash/service/GleanCrashReporterService.kt#312>`_
|
||||
|
||||
|
||||
Post-Processing
|
||||
===============
|
||||
The `crash-ping-ingest <https://github.com/mozilla/crash-ping-ingest>`_ repo is scheduled (using
|
||||
taskcluster) to run daily ingestion. It will retrieve crash pings with submissions as recent as the
|
||||
prior UTC day, ensuring that indexed results for the past week are available by default (in case of
|
||||
outages/hiccups/etc). This runs at 2:00 UTC and takes 1-2 hours, so you can expect data to be
|
||||
availalbe for the prior UTC day around 4:00 UTC. It also supplies a taskcluster action to manually
|
||||
generate data for a given date, if necessary.
|
||||
|
||||
Data Availability
|
||||
-----------------
|
||||
Data was backfilled to 2024-09-01, so you can expect ping data to be available for any date after
|
||||
then. All nightly and beta pings are processed, while release pings are randomly sampled with about
|
||||
5000 pings per os/process-type combination.
|
||||
|
||||
BigQuery
|
||||
--------
|
||||
The ingested output (including symbolicated stacks and crash signatures) is loaded into BigQuery in
|
||||
the ``moz-fx-data-shared-prod.crash_ping_ingest_external.ingest_output`` table. It is partitioned on
|
||||
``submission_timestamp`` to match the Glean views/tables, and it can be joined on ``document_id``
|
||||
(and optionally ``submission_timestamp``) with the fenix/desktop views.
|
||||
|
||||
What if post-processing has a bug?
|
||||
----------------------------------
|
||||
If there's a problem with the post-processed output, the post-processing bug can be fixed and the
|
||||
data can be re-generated by running the ingestion for the day(s) affected. The upload script in
|
||||
`crash-ping-ingest <https://github.com/mozilla/crash-ping-ingest/blob/main/upload.py>`_ will
|
||||
*replace* the data for the uploading date automatically. To run the ingestion, you must navigate to
|
||||
the taskcluster **task group** for the commits with the fixes (this is easily found by going to the
|
||||
taskcluster CI page for the commit on GitHub) and run the action task for "Process Pings (Manual)".
|
||||
There you can choose which dates to run.
|
||||
|
||||
Once the data in BigQuery has been fixed, you must also clear the netlify ``ping-data`` blobs
|
||||
corresponding to the affected dates. This can be done using the netlify-cli (though you need to
|
||||
authenticate with netlify, of course).
|
||||
|
||||
|
||||
Presentation
|
||||
============
|
||||
The `crash-pings <https://github.com/mozilla/crash-pings>`_ repository contains the code for the
|
||||
website hosted on netlify: https://crash-pings.mozilla.org. See the README for details about how it
|
||||
is built and what technologies it uses. It queries BigQuery and caches results, condensing data for
|
||||
efficient loading in the browser.
|
||||
|
||||
|
||||
Adding data to crash pings
|
||||
==========================
|
||||
#. Add crash annotations to the `definition file
|
||||
<https://searchfox.org/mozilla-central/source/toolkit/crashreporter/CrashAnnotations.yaml>`_ and
|
||||
populate the annotations with the generated APIs.
|
||||
#. Define corresponding glean metrics to the files listed in `Ping Definitions`_.
|
||||
#. Update the code that populates the metrics listed in `Source`_.
|
||||
@@ -36,6 +36,10 @@ implementation is robust. The Glean `crash` ping can be found
|
||||
|
||||
See `bug 1784069 <https://bugzilla.mozilla.org/show_bug.cgi?id=1784069>`_ for details.
|
||||
|
||||
Lifecycle and Post-Processing
|
||||
-----------------------------
|
||||
The lifecycle of a crash ping can be viewed at :ref:`Crash Ping Lifecycle`.
|
||||
|
||||
|
||||
Other Documents
|
||||
===============
|
||||
@@ -44,3 +48,4 @@ Other Documents
|
||||
:maxdepth: 1
|
||||
|
||||
crash-events
|
||||
crash-ping-lifecycle
|
||||
|
||||
Reference in New Issue
Block a user