Functions for normalizing data.
Extracts numeric version data from a version string like
Note: Non-zero minor and patch versions will be floating point
SELECT mozfun.norm.extract_version(version_string, 'major') as major_version, mozfun.norm.extract_version(version_string, 'minor') as minor_version, mozfun.norm.extract_version(version_string, 'patch') as patch_version
SELECT mozfun.norm.extract_version('96.05.01', 'major') as major_version, -- 96 mozfun.norm.extract_version('96.05.01', 'minor') as minor_version, -- 5 mozfun.norm.extract_version('96.05.01', 'patch') as patch_version -- 1 ```[Source](https://github.com/mozilla/bigquery-etl/blob/generated-sql/sql/mozfun/norm/extract_version) | [Edit](https://github.com/mozilla/bigquery-etl/edit/generated-sql/sql/mozfun/norm/extract_version/metadata.yaml) ## fenix_app_info (UDF) Returns canonical, human-understandable identification info for Fenix sources. The Glean telemetry library for Android by design routes pings based on the Play Store appId value of the published application. As of August 2020, there have been 5 separate Play Store appId values associated with different builds of Fenix, each corresponding to different datasets in BigQuery, and the mapping of appId to logical app names (Firefox vs. Firefox Preview) and channel names (nightly, beta, or release) has changed over time; see the [spreadsheet of naming history for Mozilla's mobile browsers]( <https://docs.google.com/spreadsheets/d/18PzkzZxdpFl23__-CIO735NumYDqu7jHpqllo0sBbPA/edit#gid=0).> This function is intended as the source of truth for how to map a specific ping in BigQuery to a logical app names and channel. It should be expected that the output of this function may evolve over time. If we rename a product or channel, we may choose to update the values here so that analyses consistently get the new name. The first argument (`app_id`) can be fairly fuzzy; it is tolerant of actual Google Play Store appId values like 'org.mozilla.firefox_beta' (mix of periods and underscores) as well as BigQuery dataset names with suffixes like 'org_mozilla_firefox_beta_stable'. The second argument (`app_build_id`) should be the value in client_info.app_build. The function returns a `STRUCT` that contains the logical `app_name` and `channel` as well as the Play Store `app_id` in the canonical form which would appear in Play Store URLs. Note that the naming of Fenix applications changed on 2020-07-03, so to get a continuous view of the pings associated with a logical app channel, you may need to union together tables from multiple BigQuery datasets. To see data for all Fenix channels together, it is necessary to union together tables from all 5 datasets. For basic usage information, consider using `telemetry.fenix_clients_last_seen` which already handles the union. Otherwise, see the example below as a template for how construct a custom union. Mapping of channels to datasets: - release: `org_mozilla_firefox` - beta: `org_mozilla_firefox_beta` (current) and `org_mozilla_fenix` - nightly: `org_mozilla_fenix` (current), `org_mozilla_fennec_aurora`, and `org_mozilla_fenix_nightly` ```sql -- Example of a query over all Fenix builds advertised as "Firefox Beta" CREATE TEMP FUNCTION extract_fields(app_id STRING, m ANY TYPE) AS ( ( SELECT AS STRUCT m.submission_timestamp, m.metrics.string.geckoview_version, mozfun.norm.fenix_app_info(app_id, m.client_info.app_build).* ) ); WITH base AS ( SELECT extract_fields('org_mozilla_firefox_beta', m).* FROM org_mozilla_firefox_beta.metrics AS m UNION ALL SELECT extract_fields('org_mozilla_fenix', m).* FROM org_mozilla_fenix.metrics AS m ) SELECT DATE(submission_timestamp) AS submission_date, geckoview_version, COUNT(*) FROM base WHERE app_name = 'Fenix' -- excludes 'Firefox Preview' AND channel = 'beta' AND DATE(submission_timestamp) = '2020-08-01' GROUP BY submission_date, geckoview_version
Convert the Fenix client_info.app_build-format string to a DATETIME. May return NULL on failure.
Fenix originally used an 8-digit app_build format>
In short it is
- y is years since 2018
- DDD is day of year, 0-padded, 001-366
- HH is hour of day, 00-23
- mm is minute of hour, 00-59
The last date seen with an 8-digit build ID is 2020-08-10.
Newer builds use a 10-digit format> where the integer represents a pattern consisting of 32 bits. The 17 bits starting 13 bits from the left represent a number of hours since UTC midnight beginning 2014-12-28.
This function tolerates both formats.
After using this you may wish to
DATETIME_TRUNC(result, DAY) for
grouping by build date.
Accepts a glean ping_info struct as input and returns a modified struct that includes a few parsed or normalized variants of the input fields.
Accepts a pipeline metadata struct as input and returns a modified struct that includes a few parsed or normalized variants of the input metadata fields.
Normalize an operating system string to one of the three major desktop platforms, one of the two major mobile platforms, or "Other".
This is a reimplementation of
logic used in the data pipeline>
Accepts a glean client_info struct as input and returns a modified struct that includes a few parsed or normalized variants of the input fields.
Truncates a version string like
either the major or minor version. The return value is
which means that you can sort the results without fear (e.g. 100
will be categorized as greater than 80, which isn't the case when
For example, "5.1.0" would be translated to
5.1 if the parameter is
5 if the parameter is major.
If the version is only a major and/or minor version, then it will be left
unchanged (for example "10" would stay as
10 when run through this
function, no matter what the arguments).
This is useful for grouping Linux and Mac operating system versions inside aggregate datasets or queries where there may be many different patch releases in the field.
Accepts vpn attribution fields as input and returns a struct of normalized fields.
Returns a normalized
for a product based on
Thus, this function serves as a bridge to get from legacy application
identifiers to the consistent identifiers we are using for reporting in 2021.
As of 2021, most Mozilla products are sending telemetry via the Glean SDK,
with Glean telemetry in active development for desktop Firefox as well. The
probeinfo API is the single source of truth for metadata about applications
sending Glean telemetry; the values for
canonical_app_name returned here correspond to the "end-to-end identifier"
values documented in
the v2 Glean app listings endpoint
For non-Glean telemetry, we provide values in the same style to provide
continuity as we continue the migration to Glean.
For legacy telemetry pings like
main ping for desktop and
core ping for
mobile products, the
legacy_app_name given as input to this function
should come from the submission URI (stored as
metadata.uri.app_name in BigQuery ping tables).
For Glean pings, we have invented
product values that can be passed
in to this function as the
app_name values are intended to be
readable and unambiguous, but short and easy to type.
They are suitable for use as a key in derived tables.
product is a deprecated field that was similar in intent.
canonical_app_name is more verbose and is suited for displaying
canonical_name is a synonym that we provide for
historical compatibility with previous versions of this function.
The returned struct also contains boolean
contributes_to_2021_kpi as the
canonical reference for whether the given application is included in KPI
reporting. Additional fields may be added for future years.
normalized_os value that's passed in should be the top-level
normalized_os value present in any ping table or you may want
to wrap a raw value in
This function also tolerates passing in a
product value as
legacy_app_name so that this function is still useful for derived tables
which have thrown away the raw
app_name value from legacy pings.
The mappings are as follows:
|Firefox||*||firefox_desktop||Firefox||Firefox for Desktop||true||true||true|
|Fenix||Android||fenix||Fenix||Firefox for Android (Fenix)||true||true||true|
|Fennec||Android||fennec||Fennec||Firefox for Android (Fennec)||true||true||true|
|Firefox Preview||Android||firefox_preview||Firefox Preview||Firefox Preview for Android||true||true||true|
|Fennec||iOS||firefox_ios||Firefox iOS||Firefox for iOS||true||true||true|
|FirefoxForFireTV||Android||firefox_fire_tv||Firefox Fire TV||Firefox for Fire TV||false||false||false|
|FirefoxConnect||Android||firefox_connect||Firefox Echo||Firefox for Echo Show||true||true||false|
|Zerda||Android||firefox_lite||Firefox Lite||Firefox Lite||true||true||false|
|Zerda_cn||Android||firefox_lite_cn||Firefox Lite CN||Firefox Lite (China)||false||false||false|
|Focus||Android||focus_android||Focus Android||Firefox Focus for Android||true||true||true|
|Focus||iOS||focus_ios||Focus iOS||Firefox Focus for iOS||true||true||true|
|Klar||Android||klar_android||Klar Android||Firefox Klar for Android||false||false||false|
|Klar||iOS||klar_ios||Klar iOS||Firefox Klar for iOS||false||false||false|
|Lockbox||Android||lockwise_android||Lockwise Android||Lockwise for Android||true||true||false|
|Lockbox||iOS||lockwise_ios||Lockwise iOS||Lockwise for iOS||true||true||false|
|FirefoxReality*||Android||firefox_reality||Firefox Reality||Firefox Reality||false||false||false|
Adds metadata related to the browser version in a struct.
This is a temporary solution that allows browser version analysis. It should eventually be replaced with one or more browser version tables that serves as a source of truth for version releases.
Determine the number of whole months after grace period between start and end. Month is dependent on timezone, so start and end must both be datetimes, or both be dates, in the correct timezone. Grace period can be used to account for billing delay, usually 1 day, and is counted after months. When inclusive is FALSE, start and end are not included in whole months. For example, diff_months(start => '2021-01-01', end => '2021-03-01', grace_period => INTERVAL 0 day, inclusive => FALSE) returns 1, because start plus two months plus grace period is not less than end. Changing inclusive to TRUE returns 2, because start plus two months plus grace period is less than or equal to end. diff_months(start => '2021-01-01', end => '2021-03-02 00:00:00.000001', grace_period => INTERVAL 1 DAY, inclusive => FALSE) returns 2, because start plus two months plus grace period is less than end.