Skip to content

Udf

active_n_weeks_ago (UDF)

Parameters

INPUTS

x INT64, n INT64

OUTPUTS

BOOLEAN

Source | Edit

active_values_from_days_seen_map (UDF)

Given a map of representing activity for STRING keys, this function returns an array of which keys were active for the time period in question. start_offset should be at most 0. n_bits should be at most the remaining bits.

Parameters

INPUTS

days_seen_bits_map ARRAY<STRUCT<key STRING, value INT64>>, start_offset INT64, n_bits INT64

Source | Edit

add_monthly_engine_searches (UDF)

This function specifically windows searches into calendar-month windows. This means groups are not necessarily directly comparable, since different months have different numbers of days. On the first of each month, a new month is appended, and the first month is dropped. If the date is not the first of the month, the new entry is added to the last element in the array. For example, if we were adding 12 to [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]: On the first of the month, the result would be [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12] On any other day of the month, the result would be [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 24] This happens for every aggregate (searches, ad clicks, etc.)

Parameters

INPUTS

prev STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>, curr STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>, submission_date DATE

Source | Edit

add_monthly_searches (UDF)

Adds together two engine searches structs. Each engine searches struct has a MAP[engine -> search_counts_struct]. We want to add add together the prev and curr's values for a certain engine. This allows us to be flexible with the number of engines we're using.

Parameters

INPUTS

prev ARRAY<STRUCT<key STRING, value STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>>>, curr ARRAY<STRUCT<key STRING, value STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>>>, submission_date DATE

OUTPUTS

value

Source | Edit

add_searches_by_index (UDF)

Return sums of each search type grouped by the index. Results are ordered by index.

Parameters

INPUTS

searches ARRAY<STRUCT<total_searches INT64, tagged_searches INT64, search_with_ads INT64, ad_click INT64, index INT64>>

Source | Edit

aggregate_active_addons (UDF)

This function selects most frequently occuring value for each addon_id, using the latest value in the input among ties. The type for active_addons is ARRAY>, i.e. the output of SELECT ARRAY_CONCAT_AGG(active_addons) FROM telemetry.main_summary_v4, and is left unspecified to allow changes to the fields of the STRUCT.

Parameters

INPUTS

active_addons ANY TYPE

Source | Edit

aggregate_map_first (UDF)

Returns an aggregated map with all the keys and the first corresponding value from the given maps

Parameters

INPUTS

maps ANY TYPE

Source | Edit

aggregate_search_counts (UDF)

Parameters

INPUTS

search_counts ARRAY<STRUCT<engine STRING, source STRING, count INT64>>

Source | Edit

aggregate_search_map (UDF)

Aggregates the total counts of the given search counters

Parameters

INPUTS

engine_searches_list ANY TYPE

Source | Edit

array_11_zeroes_then (UDF)

An array of 11 zeroes, followed by a supplied value

Parameters

INPUTS

val INT64

Source | Edit

array_drop_first_and_append (UDF)

Drop the first element of an array, and append the given element. Result is an array with the same length as the input.

Parameters

INPUTS

arr ANY TYPE, append ANY TYPE

Source | Edit

array_of_12_zeroes (UDF)

An array of 12 zeroes

Parameters

INPUTS

) AS ( [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Source | Edit

array_slice (UDF)

Parameters

INPUTS

arr ANY TYPE, start_index INT64, end_index INT64

Source | Edit

bitcount_lowest_7 (UDF)

This function counts the 1s in lowest 7 bits of an INT64

Parameters

INPUTS

x INT64

Source | Edit

bitmask_365 (UDF)

A bitmask for 365 bits

Parameters

INPUTS

) AS ( CONCAT(b'\x1F', REPEAT(b'\xFF', 45

Source | Edit

bitmask_lowest_28 (UDF)

Parameters

INPUTS

) AS ( 0x0FFFFFFF

Source | Edit

bitmask_lowest_7 (UDF)

Parameters

INPUTS

) AS ( 0x7F

Source | Edit

bitmask_range (UDF)

Returns a bitmask that can be used to return a subset of an integer representing a bit array. The start_ordinal argument is an integer specifying the starting position of the slice, with start_ordinal = 1 indicating the first bit. The length argument is the number of bits to include in the mask. The arguments were chosen to match the semantics of the SUBSTR function; see https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#substr

Parameters

INPUTS

start_ordinal INT64, _length INT64

Source | Edit

bits28_active_in_range (UDF)

Parameters

INPUTS

bits INT64, start_offset INT64, n_bits INT64

Source | Edit

bits28_days_since_seen (UDF)

Parameters

INPUTS

bits INT64

Source | Edit

bits28_from_string (UDF)

Parameters

INPUTS

s STRING

Source | Edit

bits28_range (UDF)

Parameters

INPUTS

bits INT64, start_offset INT64, n_bits INT64

OUTPUTS

INT64

Source | Edit

bits28_retention (UDF)

Parameters

INPUTS

bits INT64, submission_date DATE

Source | Edit

bits28_to_dates (UDF)

Parameters

INPUTS

bits INT64, submission_date DATE

OUTPUTS

ARRAY<DATE>

Source | Edit

bits28_to_string (UDF)

Parameters

INPUTS

bits INT64

Source | Edit

bits_from_offsets (UDF)

Returns a bit pattern of type BYTES compactly encoding the given array of positive integer offsets. This is primarily useful to generate a compact encoding of dates on which a feature was used, with arbitrarily long history. Example aggregation: sql bits_from_offsets( ARRAY_AGG(IF(foo, DATE_DIFF(anchor_date, submission_date, DAY), NULL) IGNORE NULLS) ) The resulting value can be cast to an INT64 representing the most recent 64 days via: sql CAST(CONCAT('0x', TO_HEX(RIGHT(bits >> i, 4))) AS INT64) Or representing the most recent 28 days (compatible with bits28 functions) via: sql CAST(CONCAT('0x', TO_HEX(RIGHT(bits >> i, 4))) AS INT64) << 36 >> 36

Parameters

INPUTS

offsets ARRAY<INT64>

Source | Edit

bits_to_active_n_weeks_ago (UDF)

Given a BYTE and an INT64, return whether the user was active that many weeks ago. NULL input returns NULL output.

Parameters

INPUTS

b BYTES, n INT64

OUTPUTS

BOOL

Source | Edit

bits_to_days_seen (UDF)

Given a BYTE, get the number of days the user was seen. NULL input returns NULL output.

Parameters

INPUTS

b BYTES

Source | Edit

bits_to_days_since_first_seen (UDF)

Given a BYTES, return the number of days since the client was first seen. If no bits are set, returns NULL, indicating we don't know. Otherwise the result is 0-indexed, meaning that for \x01, it will return 0. Results showed this being between 5-10x faster than the simpler alternative: CREATE OR REPLACE FUNCTION udf.bits_to_days_since_first_seen(b BYTES) AS (( SELECT MAX(n) FROM UNNEST(GENERATE_ARRAY( 0, 8 * BYTE_LENGTH(b))) AS n WHERE BIT_COUNT(SUBSTR(b >> n, -1) & b'\x01') > 0)); See also: bits_to_days_since_seen.sql

Parameters

INPUTS

b BYTES

Source | Edit

bits_to_days_since_seen (UDF)

Given a BYTES, return the number of days since the client was last seen. If no bits are set, returns NULL, indicating we don't know. Otherwise the results are 0-indexed, meaning \x01 will return 0. Tests showed this being 5-10x faster than the simpler alternative: CREATE OR REPLACE FUNCTION udf.bits_to_days_since_seen(b BYTES) AS (( SELECT MIN(n) FROM UNNEST(GENERATE_ARRAY(0, 364)) AS n WHERE BIT_COUNT(SUBSTR(b >> n, -1) & b'\x01') > 0)); See also: bits_to_days_since_first_seen.sql

Parameters

INPUTS

b BYTES

Source | Edit

bool_to_365_bits (UDF)

Convert a boolean to 365 bit byte array

Parameters

INPUTS

val BOOLEAN

Source | Edit

boolean_histogram_to_boolean (UDF)

Given histogram h, return TRUE if it has a value in the "true" bucket, or FALSE if it has a value in the "false" bucket, or NULL otherwise. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L309-L317

Parameters

INPUTS

histogram STRING

Source | Edit

coalesce_adjacent_days_28_bits (UDF)

We generally want to believe only the first reasonable profile creation date that we receive from a client. Given bits representing usage from the previous day and the current day, this function shifts the first argument by one day and returns either that value if non-zero and non-null, the current day value if non-zero and non-null, or else 0.

Parameters

INPUTS

prev INT64, curr INT64

Source | Edit

coalesce_adjacent_days_365_bits (UDF)

Coalesce previous data's PCD with the new data's PCD. We generally want to believe only the first reasonable profile creation date that we receive from a client. Given bytes representing usage from the previous day and the current day, this function shifts the first argument by one day and returns either that value if non-zero and non-null, the current day value if non-zero and non-null, or else 0.

Parameters

INPUTS

prev BYTES, curr BYTES

Source | Edit

combine_adjacent_days_28_bits (UDF)

Combines two bit patterns. The first pattern represents activity over a 28-day period ending "yesterday". The second pattern represents activity as observed today (usually just 0 or 1). We shift the bits in the first pattern by one to set the new baseline as "today", then perform a bitwise OR of the two patterns.

Parameters

INPUTS

prev INT64, curr INT64

Source | Edit

combine_adjacent_days_365_bits (UDF)

Parameters

INPUTS

prev BYTES, curr BYTES

Source | Edit

combine_days_seen_maps (UDF)

The "clients_last_seen" class of tables represent various types of client activity within a 28-day window as bit patterns. This function takes in two arrays of structs (aka maps) where each entry gives the bit pattern for days in which we saw a ping for a given user in a given key. We combine the bit patterns for the previous day and the current day, returning a single map. See udf.combine_experiment_days for a more specific example of this approach.

Parameters

INPUTS

-- prev ARRAY<STRUCT<key STRING, value INT64>>, -- curr ARRAY<STRUCT<key STRING, value INT64>>

Source | Edit

combine_experiment_days (UDF)

The "clients_last_seen" class of tables represent various types of client activity within a 28-day window as bit patterns. This function takes in two arrays of structs where each entry gives the bit pattern for days in which we saw a ping for a given user in a given experiment. We combine the bit patterns for the previous day and the current day, returning a single array of experiment structs.

Parameters

INPUTS

-- prev ARRAY<STRUCT<experiment STRING, branch STRING, bits INT64>>, -- curr ARRAY<STRUCT<experiment STRING, branch STRING, bits INT64>>

Source | Edit

country_code_to_flag (UDF)

For a given two-letter ISO 3166-1 alpha-2 country code, returns a string consisting of two Unicode regional indicator symbols, which is rendered in supporting fonts (such as in the BigQuery console or STMO) as flag emoji. This is just for fun. See: - https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 - https://en.wikipedia.org/wiki/Regional_Indicator_Symbol

Parameters

INPUTS

country_code string

Source | Edit

days_seen_bytes_to_rfm (UDF)

Return the frequency, recency, and T from a BYTE array, as defined in https://lifetimes.readthedocs.io/en/latest/Quickstart.html#the-shape-of-your-data RFM refers to Recency, Frequency, and Monetary value.

Parameters

INPUTS

days_seen_bytes BYTES

Source | Edit

days_since_created_profile_as_28_bits (UDF)

Takes in a difference between submission date and profile creation date and returns a bit pattern representing the profile creation date IFF the profile date is the same as the submission date or no more than 6 days earlier. Analysis has shown that client-reported profile creation dates are much less reliable outside of this range and cannot be used as reliable indicators of new profile creation.

Parameters

INPUTS

days_since_created_profile INT64

Source | Edit

deanonymize_event (UDF)

Rename struct fields in anonymous event tuples to meaningful names.

Parameters

INPUTS

tuple STRUCT<f0_ INT64, f1_ STRING, f2_ STRING, f3_ STRING, f4_ STRING, f5_ ARRAY<STRUCT<key STRING, value STRING>>>

Source | Edit

decode_int64 (UDF)

Parameters

INPUTS

raw BYTES

Source | Edit

dedupe_array (UDF)

Return an array containing only distinct values of the given array

Parameters

INPUTS

list ANY TYPE

Source | Edit

distribution_model_clients (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

distribution_id STRING

OUTPUTS

STRING

Source | Edit

distribution_model_ga_metrics (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

) RETURNS STRING AS ( 'helloworld'

OUTPUTS

STRING

Source | Edit

distribution_model_installs (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

distribution_id STRING

OUTPUTS

STRING

Source | Edit

event_code_points_to_string (UDF)

Parameters

INPUTS

code_points ANY TYPE

OUTPUTS

ARRAY<INT64>

experiment_search_metric_to_array (UDF)

Used for testing only. Reproduces the string transformations done in experiment_search_events_live_v1 materialized views.

Parameters

INPUTS

metric ARRAY<STRUCT<key STRING, value INT64>>

Source | Edit

extract_count_histogram_value (UDF)

Parameters

INPUTS

input STRING

Source | Edit

extract_document_type (UDF)

Extract the document type from a table name e.g. _TABLE_SUFFIX.

Parameters

INPUTS

table_name STRING

Source | Edit

extract_document_version (UDF)

Extract the document version from a table name e.g. _TABLE_SUFFIX.

Parameters

INPUTS

table_name STRING

Source | Edit

extract_histogram_sum (UDF)

This is a performance optimization compared to the more general mozfun.hist.extract for cases where only the histogram sum is needed. It must support all the same format variants as mozfun.hist.extract but this simplification is necessary to keep the main_summary query complexity in check.

Parameters

INPUTS

input STRING

OUTPUTS

INT64

Source | Edit

extract_schema_validation_path (UDF)

Return a path derived from an error message in payload_bytes_error

Parameters

INPUTS

error_message STRING

OUTPUTS

STRING

Source | Edit

fenix_build_to_datetime (UDF)

Convert the Fenix client_info.app_build-format string to a DATETIME. May return NULL on failure.

Fenix originally used an 8-digit app_build format>

In short it is yDDDHHmm:

  • y is years since 2018
  • DDD is day of year, 0-padded, 001-366
  • HH is hour of day, 00-23
  • mm is minute of hour, 00-59

The last date seen with an 8-digit build ID is 2020-08-10.

Newer builds use a 10-digit format> where the integer represents a pattern consisting of 32 bits. The 17 bits starting 13 bits from the left represent a number of hours since UTC midnight beginning 2014-12-28.

This function tolerates both formats.

After using this you may wish to DATETIME_TRUNC(result, DAY) for grouping by build date.

Parameters

INPUTS

app_build STRING

Source | Edit

funnel_derived_clients (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

os STRING, first_seen_date DATE, build_id STRING, attribution_source STRING, attribution_ua STRING, startup_profile_selection_reason STRING, distribution_id STRING

OUTPUTS

STRING

Source | Edit

funnel_derived_ga_metrics (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

device_category STRING, browser STRING, operating_system STRING

OUTPUTS

STRING

Source | Edit

funnel_derived_installs (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

silent BOOLEAN, submission_timestamp TIMESTAMP, build_id STRING, attribution STRING, distribution_id STRING

OUTPUTS

STRING

Source | Edit

ga_is_mozilla_browser (UDF)

Determine if a browser in a Google Analytics data is produced by Mozilla

Parameters

INPUTS

browser STRING

OUTPUTS

BOOLEAN

Source | Edit

geo_struct (UDF)

Convert geoip lookup fields to a struct, replacing '??' with NULL. Returns NULL if if required field country would be NULL. Replaces '??' with NULL because '??' is a placeholder that may be used if there was an issue during geoip lookup in hindsight.

Parameters

INPUTS

country STRING, city STRING, geo_subdivision1 STRING, geo_subdivision2 STRING

Source | Edit

get_key (UDF)

Parameters

INPUTS

map ANY TYPE, k ANY TYPE

Source | Edit

get_key_with_null (UDF)

Parameters

INPUTS

map ANY TYPE, k ANY TYPE

OUTPUTS

STRING

Source | Edit

glean_timespan_nanos (UDF)

Parameters

INPUTS

timespan STRUCT<time_unit STRING, value INT64>

Source | Edit

glean_timespan_seconds (UDF)

Parameters

INPUTS

timespan STRUCT<time_unit STRING, value INT64>

Source | Edit

Given a gzip compressed byte string, extract the uncompressed size from the footer. WARNING: THIS FUNCTION IS NOT RELIABLE FOR ARBITRARY GZIP STREAMS. It should, however, be safe to use for checking the decompressed size of payload in payload_bytes_decoded (and NOT payload_bytes_raw) because that payload is produced by the decoder and limited to conditions where the footer is accurate. From https://stackoverflow.com/a/9213826 First, the only information about the uncompressed length is four bytes at the end of the gzip file (stored in little-endian order). By necessity, that is the length modulo 232. So if the uncompressed length is 4 GB or more, you won't know what the length is. You can only be certain that the uncompressed length is less than 4 GB if the compressed length is less than something like 232 / 1032 + 18, or around 4 MB. (1032 is the maximum compression factor of deflate.) Second, and this is worse, a gzip file may actually be a concatenation of multiple gzip streams. Other than decoding, there is no way to find where each gzip stream ends in order to look at the four-byte uncompressed length of that piece. (Which may be wrong anyway due to the first reason.) Third, gzip files will sometimes have junk after the end of the gzip stream (usually zeros). Then the last four bytes are not the length.

Parameters

INPUTS

compressed BYTES

Source | Edit

histogram_max_key_with_nonzero_value (UDF)

Find the largest numeric bucket that contains a value greater than zero. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L253-L266

Parameters

INPUTS

histogram STRING

Source | Edit

histogram_merge (UDF)

Parameters

INPUTS

histogram_list ANY TYPE

Source | Edit

histogram_normalize (UDF)

Parameters

INPUTS

histogram STRUCT<bucket_count INT64, `sum` INT64, histogram_type INT64, `range` ARRAY<INT64>, `values` ARRAY<STRUCT<key INT64, value INT64>>>

OUTPUTS

STRUCT<bucket_count INT64, `sum` INT64, histogram_type INT64, `range` ARRAY<INT64>, `values` ARRAY<STRUCT<key INT64, value FLOAT64>>>

Source | Edit

histogram_percentiles (UDF)

Parameters

INPUTS

histogram ANY TYPE, percentiles ARRAY<FLOAT64>

Source | Edit

histogram_to_mean (UDF)

Parameters

INPUTS

histogram ANY TYPE

Source | Edit

histogram_to_threshold_count (UDF)

Parameters

INPUTS

histogram STRING, threshold INT64

Source | Edit

hmac_sha256 (UDF)

Given a key and message, return the HMAC-SHA256 hash. This algorithm can be found in Wikipedia: https://en.wikipedia.org/wiki/HMAC#Implementation This implentation is validated against the NIST test vectors. See test/validation/hmac_sha256.py for more information.

Parameters

INPUTS

key BYTES, message BYTES

Source | Edit

int_to_365_bits (UDF)

Parameters

INPUTS

value INT64

OUTPUTS

BYTES

Source | Edit

int_to_hex_string (UDF)

Parameters

INPUTS

value INT64

OUTPUTS

STRING

Source | Edit

json_extract_histogram (UDF)

Parameters

INPUTS

input STRING

Source | Edit

json_extract_int_map (UDF)

Parameters

INPUTS

input STRING

Source | Edit

json_mode_last (UDF)

Parameters

INPUTS

list ANY TYPE

Source | Edit

keyed_histogram_get_sum (UDF)

Take a keyed histogram of type STRUCT, extract the histogram of the given key, and return the sum value

Parameters

INPUTS

keyed_histogram ANY TYPE, target_key STRING

Source | Edit

kv_array_append_to_json_string (UDF)

Returns a JSON string which has the pair appended to the provided input JSON string. NULL is also valid for input. Examples: udf.kv_array_append_to_json_string('{"foo":"bar"}', [STRUCT("baz" AS key, "boo" AS value)]) '{"foo":"bar","baz":"boo"}' udf.kv_array_append_to_json_string('{}', [STRUCT("baz" AS key, "boo" AS value)]) '{"baz": "boo"}'

Parameters

INPUTS

input STRING, arr ANY TYPE

OUTPUTS

STRING

Source | Edit

kv_array_to_json_string (UDF)

Returns a JSON string representing the input key-value array. Value type must be able to be represented as a string - this function will cast to a string. At Mozilla, the schema for a map is STRUCT>>. To use this with that representation, it should be as udf.kv_array_to_json_string(struct.key_value).

Parameters

INPUTS

kv_arr ANY TYPE

OUTPUTS

STRING

Source | Edit

main_summary_scalars (UDF)

Parameters

INPUTS

processes ANY TYPE

Source | Edit

map_bing_revenue_country_to_country_code (UDF)

For use by LTV revenue join only. Maps the Bing country to a country code. Only keeps the country codes we want to aggregate on.

Parameters

INPUTS

country STRING

Source | Edit

map_mode_last (UDF)

Parameters

INPUTS

entries ANY TYPE

Source | Edit

map_revenue_country (UDF)

Only for use by the LTV Revenue join. Maps country codes to the codes we have in the revenue dataset. Buckets small Bing countries into "other".

Parameters

INPUTS

engine STRING, country STRING

OUTPUTS

STRING

Source | Edit

map_sum (UDF)

Parameters

INPUTS

entries ANY TYPE

Source | Edit

merge_scalar_user_data (UDF)

Given an array of scalar metric data that might have duplicate values for a metric, merge them into one value.

Parameters

INPUTS

aggs ARRAY<STRUCT<metric STRING, metric_type STRING, key STRING, process STRING, agg_type STRING, value FLOAT64>>

OUTPUTS

ARRAY<STRUCT<metric STRING, metric_type STRING, key STRING, process STRING, agg_type STRING, value FLOAT64>>

Source | Edit

mod_uint128 (UDF)

This function returns "dividend mod divisor" where the dividend and the result is encoded in bytes, and divisor is an integer.

Parameters

INPUTS

dividend BYTES, divisor INT64

Source | Edit

mode_last (UDF)

Parameters

INPUTS

list ANY TYPE

Source | Edit

mode_last_retain_nulls (UDF)

Parameters

INPUTS

list ANY TYPE

Source | Edit

monetized_search (UDF)

Stub monetized_search UDF for tests

Parameters

INPUTS

engine STRING, country STRING, distribution_id STRING, submission_date DATE

Source | Edit

new_monthly_engine_searches_struct (UDF)

This struct represents the past year's worth of searches. Each month has its own entry, hence 12.

Parameters

INPUTS

) AS ( STRUCT( udf.array_of_12_zeroes(

Source | Edit

normalize_fenix_metrics (UDF)

Accepts a glean metrics struct as input and returns a modified struct that nulls out histograms for older versions of the Glean SDK that reported pathological binning; see https://bugzilla.mozilla.org/show_bug.cgi?id=1592930

Parameters

INPUTS

telemetry_sdk_build STRING, metrics ANY TYPE

Source | Edit

normalize_glean_baseline_client_info (UDF)

Parameters

INPUTS

client_info ANY TYPE, metrics ANY TYPE

Source | Edit

normalize_glean_ping_info (UDF)

Parameters

INPUTS

ping_info ANY TYPE

Source | Edit

normalize_main_payload (UDF)

Accepts a pipeline metadata struct as input and returns a modified struct that includes a few parsed or normalized variants of the input metadata fields.

Parameters

INPUTS

payload ANY TYPE

Source | Edit

normalize_metadata (UDF)

Parameters

INPUTS

metadata ANY TYPE

Source | Edit

normalize_monthly_searches (UDF)

Sum up the monthy search count arrays by normalized engine

Parameters

INPUTS

engine_searches ARRAY<STRUCT<key STRING, value STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>>>

OUTPUTS

STRING

Source | Edit

normalize_os (UDF)

Parameters

INPUTS

os STRING

Source | Edit

normalize_search_engine (UDF)

Return normalized engine name for recognized engines This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

engine STRING

Source | Edit

null_if_empty_list (UDF)

Return NULL if list is empty, otherwise return list. This cannot be done with NULLIF because NULLIF does not support arrays.

Parameters

INPUTS

list ANY TYPE

Source | Edit

one_as_365_bits (UDF)

One represented as a byte array of 365 bits

Parameters

INPUTS

) AS ( CONCAT(REPEAT(b'\x00', 45

Source | Edit

pack_event_properties (UDF)

Parameters

INPUTS

event_properties ANY TYPE, indices ANY TYPE

OUTPUTS

ARRAY<STRUCT<key STRING, value STRING>>

parquet_array_sum (UDF)

Sum an array from a parquet-derived field. These are lists of an element that contain the field value.

Parameters

INPUTS

list ANY TYPE

Source | Edit

parse_desktop_telemetry_uri (UDF)

Parses and labels the components of a telemetry desktop ping submission uri Per https://docs.telemetry.mozilla.org/concepts/pipeline/http_edge_spec.html#special-handling-for-firefox-desktop-telemetry the format is /submit/telemetry/docId/docType/appName/appVersion/appUpdateChannel/appBuildID e.g. /submit/telemetry/ce39b608-f595-4c69-b6a6-f7a436604648/main/Firefox/61.0a1/nightly/20180328030202

Parameters

INPUTS

uri STRING

OUTPUTS

STRUCT<namespace STRING, document_id STRING, document_type STRING, app_name STRING, app_version STRING, app_update_channel STRING, app_build_id STRING>

Source | Edit

parse_iso8601_date (UDF)

Take a ISO 8601 date or date and time string and return a DATE. Return null if parse fails. Possible formats: 2019-11-04, 2019-11-04T21:15:00+00:00, 2019-11-04T21:15:00Z, 20191104T211500Z

Parameters

INPUTS

date_str STRING

OUTPUTS

DATE

Source | Edit

partner_org_clients (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

distribution_id STRING

OUTPUTS

STRING

Source | Edit

partner_org_ga_metrics (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

) RETURNS STRING AS ( (SELECT 'hola_world' AS partner_org

OUTPUTS

STRING

Source | Edit

partner_org_installs (UDF)

This is a stub implementation for use with tests; real implementation is in private-bigquery-etl

Parameters

INPUTS

distribution_id STRING

OUTPUTS

STRING

Source | Edit

pos_of_leading_set_bit (UDF)

Returns the 0-based index of the first set bit. No set bits returns NULL.

Parameters

INPUTS

i INT64

Source | Edit

pos_of_trailing_set_bit (UDF)

Identical to bits28_days_since_seen. Returns a 0-based index of the rightmost set bit in the passed bit pattern or null if no bits are set (bits = 0). To determine this position, we take a bitwise AND of the bit pattern and its complement, then we determine the position of the bit via base-2 logarithm; see https://stackoverflow.com/a/42747608/1260237

Parameters

INPUTS

bits INT64

OUTPUTS

INT64

Source | Edit

product_info_with_baseline (UDF)

Similar to mozfun.norm.product_info(), but this UDF also handles "baseline" apps that were introduced differentiate for certain apps whether data is sent through Glean or core pings. This UDF has been temporarily introduced as part of https://bugzilla.mozilla.org/show_bug.cgi?id=1775216

Parameters

INPUTS

legacy_app_name STRING, normalized_os STRING

OUTPUTS

STRUCT<app_name STRING, product STRING, canonical_app_name STRING, canonical_name STRING, contributes_to_2019_kpi BOOLEAN, contributes_to_2020_kpi BOOLEAN, contributes_to_2021_kpi BOOLEAN>

Source | Edit

pseudonymize_ad_id (UDF)

Pseudonymize Ad IDs, handling opt-outs.

Parameters

INPUTS

hashed_ad_id STRING, key BYTES

OUTPUTS

STRING

Source | Edit

quantile_search_metric_contribution (UDF)

This function returns how much of one metric is contributed by the quantile of another metric. Quantile variable should add an offset to get the requried percentile value. Example: udf.quantile_search_metric_contribution(sap, search_with_ads, sap_percentiles[OFFSET(9)]) It returns search_with_ads if sap value in top 10% volumn else null.

Parameters

INPUTS

metric1 FLOAT64, metric2 FLOAT64, quantile FLOAT64

Source | Edit

round_timestamp_to_minute (UDF)

Floor a timestamp object to the given minute interval.

Parameters

INPUTS

timestamp_expression TIMESTAMP, minute INT64

Source | Edit

safe_crc32_uuid (UDF)

Calculate the CRC-32 hash of a 36-byte UUID, or NULL if the value isn't 36 bytes. This implementation is limited to an exact length because recursion does not work. Based on https://stackoverflow.com/a/18639999/1260237 See https://en.wikipedia.org/wiki/Cyclic_redundancy_check

Parameters

INPUTS

) AS ( [ 0, 1996959894, 3993919788, 2567524794, 124634137, 1886057615, 3915621685, 2657392035, 249268274, 2044508324, 3772115230, 2547177864, 162941995, 2125561021, 3887607047, 2428444049, 498536548, 1789927666, 4089016648, 2227061214, 450548861, 1843258603, 4107580753, 2211677639, 325883990, 1684777152, 4251122042, 2321926636, 335633487, 1661365465, 4195302755, 2366115317, 997073096, 1281953886, 3579855332, 2724688242, 1006888145, 1258607687, 3524101629, 2768942443, 901097722, 1119000684, 3686517206, 2898065728, 853044451, 1172266101, 3705015759, 2882616665, 651767980, 1373503546, 3369554304, 3218104598, 565507253, 1454621731, 3485111705, 3099436303, 671266974, 1594198024, 3322730930, 2970347812, 795835527, 1483230225, 3244367275, 3060149565, 1994146192, 31158534, 2563907772, 4023717930, 1907459465, 112637215, 2680153253, 3904427059, 2013776290, 251722036, 2517215374, 3775830040, 2137656763, 141376813, 2439277719, 3865271297, 1802195444, 476864866, 2238001368, 4066508878, 1812370925, 453092731, 2181625025, 4111451223, 1706088902, 314042704, 2344532202, 4240017532, 1658658271, 366619977, 2362670323, 4224994405, 1303535960, 984961486, 2747007092, 3569037538, 1256170817, 1037604311, 2765210733, 3554079995, 1131014506, 879679996, 2909243462, 3663771856, 1141124467, 855842277, 2852801631, 3708648649, 1342533948, 654459306, 3188396048, 3373015174, 1466479909, 544179635, 3110523913, 3462522015, 1591671054, 702138776, 2966460450, 3352799412, 1504918807, 783551873, 3082640443, 3233442989, 3988292384, 2596254646, 62317068, 1957810842, 3939845945, 2647816111, 81470997, 1943803523, 3814918930, 2489596804, 225274430, 2053790376, 3826175755, 2466906013, 167816743, 2097651377, 4027552580, 2265490386, 503444072, 1762050814, 4150417245, 2154129355, 426522225, 1852507879, 4275313526, 2312317920, 282753626, 1742555852, 4189708143, 2394877945, 397917763, 1622183637, 3604390888, 2714866558, 953729732, 1340076626, 3518719985, 2797360999, 1068828381, 1219638859, 3624741850, 2936675148, 906185462, 1090812512, 3747672003, 2825379669, 829329135, 1181335161, 3412177804, 3160834842, 628085408, 1382605366, 3423369109, 3138078467, 570562233, 1426400815, 3317316542, 2998733608, 733239954, 1555261956, 3268935591, 3050360625, 752459403, 1541320221, 2607071920, 3965973030, 1969922972, 40735498, 2617837225, 3943577151, 1913087877, 83908371, 2512341634, 3803740692, 2075208622, 213261112, 2463272603, 3855990285, 2094854071, 198958881, 2262029012, 4057260610, 1759359992, 534414190, 2176718541, 4139329115, 1873836001, 414664567, 2282248934, 4279200368, 1711684554, 285281116, 2405801727, 4167216745, 1634467795, 376229701, 2685067896, 3608007406, 1308918612, 956543938, 2808555105, 3495958263, 1231636301, 1047427035, 2932959818, 3654703836, 1088359270, 936918000, 2847714899, 3736837829, 1202900863, 817233897, 3183342108, 3401237130, 1404277552, 615818150, 3134207493, 3453421203, 1423857449, 601450431, 3009837614, 3294710456, 1567103746, 711928724, 3020668471, 3272380065, 1510334235, 755167117 ]

Source | Edit

safe_sample_id (UDF)

Stably hash a client_id to an integer between 0 and 99, or NULL if client_id isn't 36 bytes

Parameters

INPUTS

client_id STRING

OUTPUTS

BYTES

Source | Edit

search_counts_map_sum (UDF)

Calculate the sums of search counts per source and engine

Parameters

INPUTS

entries ARRAY<STRUCT<engine STRING, source STRING, count INT64>>

Source | Edit

shift_28_bits_one_day (UDF)

Shift input bits one day left and drop any bits beyond 28 days.

Parameters

INPUTS

x INT64

Source | Edit

shift_365_bits_one_day (UDF)

Shift input bits one day left and drop any bits beyond 365 days.

Parameters

INPUTS

x BYTES

Source | Edit

shift_one_day (UDF)

Returns the bitfield shifted by one day, 0 for NULL

Parameters

INPUTS

x INT64

Source | Edit

smoot_usage_from_28_bits (UDF)

Calculates a variety of metrics based on bit patterns of daily usage for the smoot_usage_* tables.

Parameters

INPUTS

bit_arrays ARRAY<STRUCT<days_created_profile_bits INT64, days_active_bits INT64>>

Source | Edit

vector_add (UDF)

This function adds two vectors. The two vectors can have different length. If one vector is null, the other vector will be returned directly.

Parameters

INPUTS

a ARRAY<INT64>, b ARRAY<INT64>

Source | Edit

zero_as_365_bits (UDF)

Zero represented as a 365-bit byte array

Parameters

INPUTS

) AS ( REPEAT(b'\x00', 46

Source | Edit

zeroed_array (UDF)

Generates an array if all zeroes, of arbitrary length

Parameters

INPUTS

len INT64

Source | Edit