Functions for normalizing data.
Accepts a pipeline metadata struct as input and returns a modified struct that includes a few parsed or normalized variants of the input metadata fields.
Accepts a glean client_info struct as input and returns a modified struct that includes a few parsed or normalized variants of the input fields.
Returns a normalized
for a product based on
Thus, this function serves as a bridge to get from legacy application
identifiers to the consistent identifiers we are using for reporting in 2021.
As of 2021, most Mozilla products are sending telemetry via the Glean SDK,
with Glean telemetry in active development for desktop Firefox as well. The
probeinfo API is the single source of truth for metadata about applications
sending Glean telemetry; the values for
canonical_app_name returned here correspond to the "end-to-end identifier"
values documented in
the v2 Glean app listings endpoint
For non-Glean telemetry, we provide values in the same style to provide
continuity as we continue the migration to Glean.
For legacy telemetry pings like
main ping for desktop and
core ping for
mobile products, the
legacy_app_name given as input to this function
should come from the submission URI (stored as
metadata.uri.app_name in BigQuery ping tables).
For Glean pings, we have invented
product values that can be passed
in to this function as the
app_name values are intended to be
readable and unambiguous, but short and easy to type.
They are suitable for use as a key in derived tables.
product is a deprecated field that was similar in intent.
canonical_app_name is more verbose and is suited for displaying
canonical_name is a synonym that we provide for
historical compatibility with previous versions of this function.
The returned struct also contains boolean
contributes_to_2021_kpi as the
canonical reference for whether the given application is included in KPI
reporting. Additional fields may be added for future years.
normalized_os value that's passed in should be the top-level
normalized_os value present in any ping table or you may want
to wrap a raw value in
This function also tolerates passing in a
product value as
legacy_app_name so that this function is still useful for derived tables
which have thrown away the raw
app_name value from legacy pings.
The mappings are as follows:
|Firefox||*||firefox_desktop||Firefox||Firefox for Desktop||true||true||true|
|Fenix||Android||fenix||Fenix||Firefox for Android (Fenix)||true||true||true|
|Fennec||Android||fennec||Fennec||Firefox for Android (Fennec)||true||true||true|
|Firefox Preview||Android||firefox_preview||Firefox Preview||Firefox Preview for Android||true||true||true|
|Fennec||iOS||firefox_ios||Firefox iOS||Firefox for iOS||true||true||true|
|FirefoxForFireTV||Android||firefox_fire_tv||Firefox Fire TV||Firefox for Fire TV||false||false||false|
|FirefoxConnect||Android||firefox_connect||Firefox Echo||Firefox for Echo Show||true||true||false|
|Zerda||Android||firefox_lite||Firefox Lite||Firefox Lite||true||true||false|
|Zerda_cn||Android||firefox_lite_cn||Firefox Lite CN||Firefox Lite (China)||false||false||false|
|Focus||Android||focus_android||Focus Android||Firefox Focus for Android||true||true||true|
|Focus||iOS||focus_ios||Focus iOS||Firefox Focus for iOS||true||true||true|
|Klar||Android||klar_android||Klar Android||Firefox Klar for Android||false||false||false|
|Klar||iOS||klar_ios||Klar iOS||Firefox Klar for iOS||false||false||false|
|Lockbox||Android||lockwise_android||Lockwise Android||Lockwise for Android||true||true||false|
|Lockbox||iOS||lockwise_ios||Lockwise iOS||Lockwise for iOS||true||true||false|
|FirefoxReality*||Android||firefox_reality||Firefox Reality||Firefox Reality||false||false||false|
Truncates a version string like
either the major or minor version. The return value is
which means that you can sort the results without fear (e.g. 100
will be categorized as greater than 80, which isn't the case when
For example, "5.1.0" would be translated to
5.1 if the parameter is
5 if the parameter is major.
If the version is only a major and/or minor version, then it will be left
unchanged (for example "10" would stay as
10 when run through this
function, no matter what the arguments).
This is useful for grouping Linux and Mac operating system versions inside aggregate datasets or queries where there may be many different patch releases in the field.
Accepts a glean ping_info struct as input and returns a modified struct that includes a few parsed or normalized variants of the input fields.
Returns canonical, human-understandable identification info for Fenix sources.
The Glean telemetry library for Android by design routes pings based on the Play Store appId value of the published application. As of August 2020, there have been 5 separate Play Store appId values associated with different builds of Fenix, each corresponding to different datasets in BigQuery, and the mapping of appId to logical app names (Firefox vs. Firefox Preview) and channel names (nightly, beta, or release) has changed over time; see the spreadsheet of naming history for Mozilla's mobile browsers.>
This function is intended as the source of truth for how to map a specific ping in BigQuery to a logical app names and channel. It should be expected that the output of this function may evolve over time. If we rename a product or channel, we may choose to update the values here so that analyses consistently get the new name.
The first argument (
app_id) can be fairly fuzzy; it is tolerant
of actual Google Play Store appId values like 'org.mozilla.firefox_beta'
(mix of periods and underscores) as well as BigQuery dataset names
with suffixes like 'org_mozilla_firefox_beta_stable'.
The second argument (
app_build_id) should be the value
The function returns a
STRUCT that contains the logical
channel as well as the Play Store
app_id in the canonical
form which would appear in Play Store URLs.
Note that the naming of Fenix applications changed on 2020-07-03,
so to get a continuous view of the pings associated with a logical
app channel, you may need to union together tables from multiple
BigQuery datasets. To see data for all Fenix channels together, it
is necessary to union together tables from all 5 datasets.
For basic usage information, consider using
telemetry.fenix_clients_last_seen which already handles the
union. Otherwise, see the example below as a template for how
construct a custom union.
Mapping of channels to datasets:
-- Example of a query over all Fenix builds advertised as "Firefox Beta" CREATE TEMP FUNCTION extract_fields(app_id STRING, m ANY TYPE) AS ( ( SELECT AS STRUCT m.submission_timestamp, m.metrics.string.geckoview_version, mozfun.norm.fenix_app_info(app_id, m.client_info.app_build).* ) ); WITH base AS ( SELECT extract_fields('org_mozilla_firefox_beta', m).* FROM org_mozilla_firefox_beta.metrics AS m UNION ALL SELECT extract_fields('org_mozilla_fenix', m).* FROM org_mozilla_fenix.metrics AS m ) SELECT DATE(submission_timestamp) AS submission_date, geckoview_version, COUNT(*) FROM base WHERE app_name = 'Fenix' -- excludes 'Firefox Preview' AND channel = 'beta' AND DATE(submission_timestamp) = '2020-08-01' GROUP BY submission_date, geckoview_version
Normalize an operating system string to one of the three major desktop platforms, one of the two major mobile platforms, or "Other".
This is a reimplementation of
logic used in the data pipeline>