Skip to content

norm

Functions for normalizing data.

fenix_app_info (UDF)

Returns canonical, human-understandable identification info for Fenix sources.

The Glean telemetry library for Android by design routes pings based on the Play Store appId value of the published application. As of August 2020, there have been 5 separate Play Store appId values associated with different builds of Fenix, each corresponding to different datasets in BigQuery, and the mapping of appId to logical app names (Firefox vs. Firefox Preview) and channel names (nightly, beta, or release) has changed over time; see the spreadsheet of naming history for Mozilla's mobile browsers.>

This function is intended as the source of truth for how to map a specific ping in BigQuery to a logical app names and channel. It should be expected that the output of this function may evolve over time. If we rename a product or channel, we may choose to update the values here so that analyses consistently get the new name.

The first argument (app_id) can be fairly fuzzy; it is tolerant of actual Google Play Store appId values like 'org.mozilla.firefox_beta' (mix of periods and underscores) as well as BigQuery dataset names with suffixes like 'org_mozilla_firefox_beta_stable'.

The second argument (app_build_id) should be the value in client_info.app_build.

The function returns a STRUCT that contains the logical app_name and channel as well as the Play Store app_id in the canonical form which would appear in Play Store URLs.

Note that the naming of Fenix applications changed on 2020-07-03, so to get a continuous view of the pings associated with a logical app channel, you may need to union together tables from multiple BigQuery datasets. To see data for all Fenix channels together, it is necessary to union together tables from all 5 datasets. For basic usage information, consider using telemetry.fenix_clients_last_seen which already handles the union. Otherwise, see the example below as a template for how construct a custom union.

Mapping of channels to datasets:

  • release: org_mozilla_firefox
  • beta: org_mozilla_firefox_beta (current) and org_mozilla_fenix
  • nightly: org_mozilla_fenix (current), org_mozilla_fennec_aurora, and org_mozilla_fenix_nightly
-- Example of a query over all Fenix builds advertised as "Firefox Beta"
CREATE TEMP FUNCTION extract_fields(app_id STRING, m ANY TYPE) AS (
  (
    SELECT AS STRUCT
      m.submission_timestamp,
      m.metrics.string.geckoview_version,
      mozfun.norm.fenix_app_info(app_id, m.client_info.app_build).*
  )
);

WITH base AS (
  SELECT
    extract_fields('org_mozilla_firefox_beta', m).*
  FROM
    org_mozilla_firefox_beta.metrics AS m
  UNION ALL
  SELECT
    extract_fields('org_mozilla_fenix', m).*
  FROM
    org_mozilla_fenix.metrics AS m
)
SELECT
  DATE(submission_timestamp) AS submission_date,
  geckoview_version,
  COUNT(*)
FROM
  base
WHERE
  app_name = 'Fenix'  -- excludes 'Firefox Preview'
  AND channel = 'beta'
  AND DATE(submission_timestamp) = '2020-08-01'
GROUP BY
  submission_date,
  geckoview_version

Source | Edit

glean_baseline_client_info (UDF)

Accepts a glean client_info struct as input and returns a modified struct that includes a few parsed or normalized variants of the input fields.

Source | Edit

truncate_version (UDF)

Truncates a version string like <major>.<minor>.<patch> to either the major or minor version. The return value is NUMERIC, which means that you can sort the results without fear (e.g. 100 will be categorized as greater than 80, which isn't the case when sorting lexigraphically).

For example, "5.1.0" would be translated to 5.1 if the parameter is "minor" or 5 if the parameter is major.

If the version is only a major and/or minor version, then it will be left unchanged (for example "10" would stay as 10 when run through this function, no matter what the arguments).

This is useful for grouping Linux and Mac operating system versions inside aggregate datasets or queries where there may be many different patch releases in the field.

Source | Edit

metadata (UDF)

Accepts a pipeline metadata struct as input and returns a modified struct that includes a few parsed or normalized variants of the input metadata fields.

Source | Edit

product_info (UDF)

Returns a normalized app_name and canonical_app_name for a product based on legacy_app_name and normalized_os values. Thus, this function serves as a bridge to get from legacy application identifiers to the consistent identifiers we are using for reporting in 2021.

As of 2021, most Mozilla products are sending telemetry via the Glean SDK, with Glean telemetry in active development for desktop Firefox as well. The probeinfo API is the single source of truth for metadata about applications sending Glean telemetry; the values for app_name and canonical_app_name returned here correspond to the "end-to-end identifier" values documented in the v2 Glean app listings endpoint . For non-Glean telemetry, we provide values in the same style to provide continuity as we continue the migration to Glean.

For legacy telemetry pings like main ping for desktop and core ping for mobile products, the legacy_app_name given as input to this function should come from the submission URI (stored as metadata.uri.app_name in BigQuery ping tables). For Glean pings, we have invented product values that can be passed in to this function as the legacy_app_name parameter.

The returned app_name values are intended to be readable and unambiguous, but short and easy to type. They are suitable for use as a key in derived tables. product is a deprecated field that was similar in intent.

The returned canonical_app_name is more verbose and is suited for displaying in visualizations. canonical_name is a synonym that we provide for historical compatibility with previous versions of this function.

The returned struct also contains boolean contributes_to_2021_kpi as the canonical reference for whether the given application is included in KPI reporting. Additional fields may be added for future years.

The normalized_os value that's passed in should be the top-level normalized_os value present in any ping table or you may want to wrap a raw value in mozfun.norm.os like mozfun.norm.product_info(app_name, mozfun.norm.os(os)).

This function also tolerates passing in a product value as legacy_app_name so that this function is still useful for derived tables which have thrown away the raw app_name value from legacy pings.

The mappings are as follows:

legacy_app_name normalized_os app_name product canonical_app_name 2019 2020 2021
Firefox * firefox_desktop Firefox Firefox for Desktop true true true
Fenix Android fenix Fenix Firefox for Android (Fenix) true true true
Fennec Android fennec Fennec Firefox for Android (Fennec) true true true
Firefox Preview Android firefox_preview Firefox Preview Firefox Preview for Android true true true
Fennec iOS firefox_ios Firefox iOS Firefox for iOS true true true
FirefoxForFireTV Android firefox_fire_tv Firefox Fire TV Firefox for Fire TV false false false
FirefoxConnect Android firefox_connect Firefox Echo Firefox for Echo Show true true false
Zerda Android firefox_lite Firefox Lite Firefox Lite true true false
Zerda_cn Android firefox_lite_cn Firefox Lite CN Firefox Lite (China) false false false
Focus Android focus_android Focus Android Firefox Focus for Android true true true
Focus iOS focus_ios Focus iOS Firefox Focus for iOS true true true
Klar Android klar_android Klar Android Firefox Klar for Android false false false
Klar iOS klar_ios Klar iOS Firefox Klar for iOS false false false
Lockbox Android lockwise_android Lockwise Android Lockwise for Android true true false
Lockbox iOS lockwise_ios Lockwise iOS Lockwise for iOS true true false
FirefoxReality* Android firefox_reality Firefox Reality Firefox Reality false false false
Source Edit

os (UDF)

Normalize an operating system string to one of the three major desktop platforms, one of the two major mobile platforms, or "Other".

This is a reimplementation of logic used in the data pipeline> to populate normalized_os.

Source | Edit

glean_ping_info (UDF)

Accepts a glean ping_info struct as input and returns a modified struct that includes a few parsed or normalized variants of the input fields.

Source | Edit