Udf
active_n_weeks_ago (UDF)
Parameters
INPUTS
x INT64, n INT64
OUTPUTS
BOOLEAN
active_values_from_days_seen_map (UDF)
Given a map of representing activity for STRING key
s, this function returns an array of which key
s were active for the time period in question. start_offset should be at most 0. n_bits should be at most the remaining bits.
Parameters
INPUTS
days_seen_bits_map ARRAY<STRUCT<key STRING, value INT64>>, start_offset INT64, n_bits INT64
add_monthly_engine_searches (UDF)
This function specifically windows searches into calendar-month windows. This means groups are not necessarily directly comparable, since different months have different numbers of days. On the first of each month, a new month is appended, and the first month is dropped. If the date is not the first of the month, the new entry is added to the last element in the array. For example, if we were adding 12 to [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]: On the first of the month, the result would be [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12] On any other day of the month, the result would be [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 24] This happens for every aggregate (searches, ad clicks, etc.)
Parameters
INPUTS
prev STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>, curr STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>, submission_date DATE
add_monthly_searches (UDF)
Adds together two engine searches structs. Each engine searches struct has a MAP[engine -> search_counts_struct]. We want to add add together the prev and curr's values for a certain engine. This allows us to be flexible with the number of engines we're using.
Parameters
INPUTS
prev ARRAY<STRUCT<key STRING, value STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>>>, curr ARRAY<STRUCT<key STRING, value STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>>>, submission_date DATE
OUTPUTS
value
add_searches_by_index (UDF)
Return sums of each search type grouped by the index. Results are ordered by index.
Parameters
INPUTS
searches ARRAY<STRUCT<total_searches INT64, tagged_searches INT64, search_with_ads INT64, ad_click INT64, index INT64>>
aggregate_active_addons (UDF)
This function selects most frequently occuring value for each addon_id, using the latest value in the input among ties. The type for active_addons is ARRAYSELECT ARRAY_CONCAT_AGG(active_addons) FROM telemetry.main_summary_v4
, and is left unspecified to allow changes to the fields of the STRUCT.
Parameters
INPUTS
active_addons ANY TYPE
aggregate_map_first (UDF)
Returns an aggregated map with all the keys and the first corresponding value from the given maps
Parameters
INPUTS
maps ANY TYPE
aggregate_search_counts (UDF)
Parameters
INPUTS
search_counts ARRAY<STRUCT<engine STRING, source STRING, count INT64>>
aggregate_search_map (UDF)
Aggregates the total counts of the given search counters
Parameters
INPUTS
engine_searches_list ANY TYPE
array_11_zeroes_then (UDF)
An array of 11 zeroes, followed by a supplied value
Parameters
INPUTS
val INT64
array_drop_first_and_append (UDF)
Drop the first element of an array, and append the given element. Result is an array with the same length as the input.
Parameters
INPUTS
arr ANY TYPE, append ANY TYPE
array_of_12_zeroes (UDF)
An array of 12 zeroes
Parameters
INPUTS
) AS ( [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
array_slice (UDF)
Parameters
INPUTS
arr ANY TYPE, start_index INT64, end_index INT64
bitcount_lowest_7 (UDF)
This function counts the 1s in lowest 7 bits of an INT64
Parameters
INPUTS
x INT64
bitmask_365 (UDF)
A bitmask for 365 bits
Parameters
INPUTS
) AS ( CONCAT(b'\x1F', REPEAT(b'\xFF', 45
bitmask_lowest_28 (UDF)
Parameters
INPUTS
) AS ( 0x0FFFFFFF
bitmask_lowest_7 (UDF)
Parameters
INPUTS
) AS ( 0x7F
bitmask_range (UDF)
Returns a bitmask that can be used to return a subset of an integer representing a bit array. The start_ordinal argument is an integer specifying the starting position of the slice, with start_ordinal = 1 indicating the first bit. The length argument is the number of bits to include in the mask. The arguments were chosen to match the semantics of the SUBSTR function; see https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#substr
Parameters
INPUTS
start_ordinal INT64, _length INT64
bits28_active_in_range (UDF)
Parameters
INPUTS
bits INT64, start_offset INT64, n_bits INT64
bits28_days_since_seen (UDF)
Parameters
INPUTS
bits INT64
bits28_from_string (UDF)
Parameters
INPUTS
s STRING
bits28_range (UDF)
Parameters
INPUTS
bits INT64, start_offset INT64, n_bits INT64
OUTPUTS
INT64
bits28_retention (UDF)
Parameters
INPUTS
bits INT64, submission_date DATE
bits28_to_dates (UDF)
Parameters
INPUTS
bits INT64, submission_date DATE
OUTPUTS
ARRAY<DATE>
bits28_to_string (UDF)
Parameters
INPUTS
bits INT64
bits_from_offsets (UDF)
Returns a bit pattern of type BYTES compactly encoding the given array of positive integer offsets.
This is primarily useful to generate a compact encoding of dates on which a feature was used, with arbitrarily long history. Example aggregation:
sql bits_from_offsets(
ARRAY_AGG(IF(foo, DATE_DIFF(anchor_date, submission_date, DAY), NULL)
IGNORE NULLS)
)
The resulting value can be cast to an INT64 representing the most recent 64 days via:
sql CAST(CONCAT('0x', TO_HEX(RIGHT(bits >> i, 4))) AS INT64)
Or representing the most recent 28 days (compatible with bits28 functions) via:
sql CAST(CONCAT('0x', TO_HEX(RIGHT(bits >> i, 4))) AS INT64) << 36 >> 36
Parameters
INPUTS
offsets ARRAY<INT64>
bits_to_active_n_weeks_ago (UDF)
Given a BYTE and an INT64, return whether the user was active that many weeks ago. NULL input returns NULL output.
Parameters
INPUTS
b BYTES, n INT64
OUTPUTS
BOOL
bits_to_days_seen (UDF)
Given a BYTE, get the number of days the user was seen. NULL input returns NULL output.
Parameters
INPUTS
b BYTES
bits_to_days_since_first_seen (UDF)
Given a BYTES, return the number of days since the client was first seen. If no bits are set, returns NULL, indicating we don't know. Otherwise the result is 0-indexed, meaning that for \x01, it will return 0. Results showed this being between 5-10x faster than the simpler alternative: CREATE OR REPLACE FUNCTION udf.bits_to_days_since_first_seen(b BYTES) AS (( SELECT MAX(n) FROM UNNEST(GENERATE_ARRAY( 0, 8 * BYTE_LENGTH(b))) AS n WHERE BIT_COUNT(SUBSTR(b >> n, -1) & b'\x01') > 0)); See also: bits_to_days_since_seen.sql
Parameters
INPUTS
b BYTES
bits_to_days_since_seen (UDF)
Given a BYTES, return the number of days since the client was last seen. If no bits are set, returns NULL, indicating we don't know. Otherwise the results are 0-indexed, meaning \x01 will return 0. Tests showed this being 5-10x faster than the simpler alternative: CREATE OR REPLACE FUNCTION udf.bits_to_days_since_seen(b BYTES) AS (( SELECT MIN(n) FROM UNNEST(GENERATE_ARRAY(0, 364)) AS n WHERE BIT_COUNT(SUBSTR(b >> n, -1) & b'\x01') > 0)); See also: bits_to_days_since_first_seen.sql
Parameters
INPUTS
b BYTES
bool_to_365_bits (UDF)
Convert a boolean to 365 bit byte array
Parameters
INPUTS
val BOOLEAN
boolean_histogram_to_boolean (UDF)
Given histogram h, return TRUE if it has a value in the "true" bucket, or FALSE if it has a value in the "false" bucket, or NULL otherwise. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L309-L317
Parameters
INPUTS
histogram STRING
coalesce_adjacent_days_28_bits (UDF)
We generally want to believe only the first reasonable profile creation date that we receive from a client. Given bits representing usage from the previous day and the current day, this function shifts the first argument by one day and returns either that value if non-zero and non-null, the current day value if non-zero and non-null, or else 0.
Parameters
INPUTS
prev INT64, curr INT64
coalesce_adjacent_days_365_bits (UDF)
Coalesce previous data's PCD with the new data's PCD. We generally want to believe only the first reasonable profile creation date that we receive from a client. Given bytes representing usage from the previous day and the current day, this function shifts the first argument by one day and returns either that value if non-zero and non-null, the current day value if non-zero and non-null, or else 0.
Parameters
INPUTS
prev BYTES, curr BYTES
combine_adjacent_days_28_bits (UDF)
Combines two bit patterns. The first pattern represents activity over a 28-day period ending "yesterday". The second pattern represents activity as observed today (usually just 0 or 1). We shift the bits in the first pattern by one to set the new baseline as "today", then perform a bitwise OR of the two patterns.
Parameters
INPUTS
prev INT64, curr INT64
combine_adjacent_days_365_bits (UDF)
Parameters
INPUTS
prev BYTES, curr BYTES
combine_days_seen_maps (UDF)
The "clients_last_seen" class of tables represent various types of client activity within a 28-day window as bit patterns. This function takes in two arrays of structs (aka maps) where each entry gives the bit pattern for days in which we saw a ping for a given user in a given key. We combine the bit patterns for the previous day and the current day, returning a single map. See udf.combine_experiment_days
for a more specific example of this approach.
Parameters
INPUTS
-- prev ARRAY<STRUCT<key STRING, value INT64>>, -- curr ARRAY<STRUCT<key STRING, value INT64>>
combine_experiment_days (UDF)
The "clients_last_seen" class of tables represent various types of client activity within a 28-day window as bit patterns. This function takes in two arrays of structs where each entry gives the bit pattern for days in which we saw a ping for a given user in a given experiment. We combine the bit patterns for the previous day and the current day, returning a single array of experiment structs.
Parameters
INPUTS
-- prev ARRAY<STRUCT<experiment STRING, branch STRING, bits INT64>>, -- curr ARRAY<STRUCT<experiment STRING, branch STRING, bits INT64>>
country_code_to_flag (UDF)
For a given two-letter ISO 3166-1 alpha-2 country code, returns a string consisting of two Unicode regional indicator symbols, which is rendered in supporting fonts (such as in the BigQuery console or STMO) as flag emoji. This is just for fun. See: - https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 - https://en.wikipedia.org/wiki/Regional_Indicator_Symbol
Parameters
INPUTS
country_code string
days_seen_bytes_to_rfm (UDF)
Return the frequency, recency, and T from a BYTE array, as defined in https://lifetimes.readthedocs.io/en/latest/Quickstart.html#the-shape-of-your-data RFM refers to Recency, Frequency, and Monetary value.
Parameters
INPUTS
days_seen_bytes BYTES
days_since_created_profile_as_28_bits (UDF)
Takes in a difference between submission date and profile creation date and returns a bit pattern representing the profile creation date IFF the profile date is the same as the submission date or no more than 6 days earlier. Analysis has shown that client-reported profile creation dates are much less reliable outside of this range and cannot be used as reliable indicators of new profile creation.
Parameters
INPUTS
days_since_created_profile INT64
deanonymize_event (UDF)
Rename struct fields in anonymous event tuples to meaningful names.
Parameters
INPUTS
tuple STRUCT<f0_ INT64, f1_ STRING, f2_ STRING, f3_ STRING, f4_ STRING, f5_ ARRAY<STRUCT<key STRING, value STRING>>>
decode_int64 (UDF)
Parameters
INPUTS
raw BYTES
dedupe_array (UDF)
Return an array containing only distinct values of the given array
Parameters
INPUTS
list ANY TYPE
distribution_model_clients (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
distribution_id STRING
OUTPUTS
STRING
distribution_model_ga_metrics (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
) RETURNS STRING AS ( 'helloworld'
OUTPUTS
STRING
distribution_model_installs (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
distribution_id STRING
OUTPUTS
STRING
event_code_points_to_string (UDF)
Parameters
INPUTS
code_points ANY TYPE
OUTPUTS
ARRAY<INT64>
experiment_search_metric_to_array (UDF)
Used for testing only. Reproduces the string transformations done in experiment_search_events_live_v1 materialized views.
Parameters
INPUTS
metric ARRAY<STRUCT<key STRING, value INT64>>
extract_count_histogram_value (UDF)
Parameters
INPUTS
input STRING
extract_document_type (UDF)
Extract the document type from a table name e.g. _TABLE_SUFFIX.
Parameters
INPUTS
table_name STRING
extract_document_version (UDF)
Extract the document version from a table name e.g. _TABLE_SUFFIX.
Parameters
INPUTS
table_name STRING
extract_histogram_sum (UDF)
This is a performance optimization compared to the more general mozfun.hist.extract for cases where only the histogram sum is needed. It must support all the same format variants as mozfun.hist.extract but this simplification is necessary to keep the main_summary query complexity in check.
Parameters
INPUTS
input STRING
OUTPUTS
INT64
extract_schema_validation_path (UDF)
Return a path derived from an error message in payload_bytes_error
Parameters
INPUTS
error_message STRING
OUTPUTS
STRING
fenix_build_to_datetime (UDF)
Convert the Fenix client_info.app_build-format string to a DATETIME. May return NULL on failure.
Fenix originally used an 8-digit app_build format>
In short it is yDDDHHmm
:
- y is years since 2018
- DDD is day of year, 0-padded, 001-366
- HH is hour of day, 00-23
- mm is minute of hour, 00-59
The last date seen with an 8-digit build ID is 2020-08-10.
Newer builds use a 10-digit format> where the integer represents a pattern consisting of 32 bits. The 17 bits starting 13 bits from the left represent a number of hours since UTC midnight beginning 2014-12-28.
This function tolerates both formats.
After using this you may wish to DATETIME_TRUNC(result, DAY)
for grouping
by build date.
Parameters
INPUTS
app_build STRING
funnel_derived_clients (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
os STRING, first_seen_date DATE, build_id STRING, attribution_source STRING, attribution_ua STRING, startup_profile_selection_reason STRING, distribution_id STRING
OUTPUTS
STRING
funnel_derived_ga_metrics (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
device_category STRING, browser STRING, operating_system STRING
OUTPUTS
STRING
funnel_derived_installs (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
silent BOOLEAN, submission_timestamp TIMESTAMP, build_id STRING, attribution STRING, distribution_id STRING
OUTPUTS
STRING
ga_is_mozilla_browser (UDF)
Determine if a browser in a Google Analytics data is produced by Mozilla
Parameters
INPUTS
browser STRING
OUTPUTS
BOOLEAN
geo_struct (UDF)
Convert geoip lookup fields to a struct, replacing '??' with NULL. Returns NULL if if required field country would be NULL. Replaces '??' with NULL because '??' is a placeholder that may be used if there was an issue during geoip lookup in hindsight.
Parameters
INPUTS
country STRING, city STRING, geo_subdivision1 STRING, geo_subdivision2 STRING
geo_struct_set_defaults (UDF)
Convert geoip lookup fields to a struct, replacing NULLs with "??". This allows for better joins on those fields, but needs to be changed back to NULL at the end of the query.
Parameters
INPUTS
country STRING, city STRING, geo_subdivision1 STRING, geo_subdivision2 STRING
get_key (UDF)
Parameters
INPUTS
map ANY TYPE, k ANY TYPE
get_key_with_null (UDF)
Parameters
INPUTS
map ANY TYPE, k ANY TYPE
OUTPUTS
STRING
glean_timespan_nanos (UDF)
Parameters
INPUTS
timespan STRUCT<time_unit STRING, value INT64>
glean_timespan_seconds (UDF)
Parameters
INPUTS
timespan STRUCT<time_unit STRING, value INT64>
gzip_length_footer (UDF)
Given a gzip compressed byte string, extract the uncompressed size from the footer. WARNING: THIS FUNCTION IS NOT RELIABLE FOR ARBITRARY GZIP STREAMS. It should, however, be safe to use for checking the decompressed size of payload in payload_bytes_decoded (and NOT payload_bytes_raw) because that payload is produced by the decoder and limited to conditions where the footer is accurate. From https://stackoverflow.com/a/9213826 First, the only information about the uncompressed length is four bytes at the end of the gzip file (stored in little-endian order). By necessity, that is the length modulo 232. So if the uncompressed length is 4 GB or more, you won't know what the length is. You can only be certain that the uncompressed length is less than 4 GB if the compressed length is less than something like 232 / 1032 + 18, or around 4 MB. (1032 is the maximum compression factor of deflate.) Second, and this is worse, a gzip file may actually be a concatenation of multiple gzip streams. Other than decoding, there is no way to find where each gzip stream ends in order to look at the four-byte uncompressed length of that piece. (Which may be wrong anyway due to the first reason.) Third, gzip files will sometimes have junk after the end of the gzip stream (usually zeros). Then the last four bytes are not the length.
Parameters
INPUTS
compressed BYTES
histogram_max_key_with_nonzero_value (UDF)
Find the largest numeric bucket that contains a value greater than zero. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L253-L266
Parameters
INPUTS
histogram STRING
histogram_merge (UDF)
Parameters
INPUTS
histogram_list ANY TYPE
histogram_normalize (UDF)
Parameters
INPUTS
histogram STRUCT<bucket_count INT64, `sum` INT64, histogram_type INT64, `range` ARRAY<INT64>, `values` ARRAY<STRUCT<key INT64, value INT64>>>
OUTPUTS
STRUCT<bucket_count INT64, `sum` INT64, histogram_type INT64, `range` ARRAY<INT64>, `values` ARRAY<STRUCT<key INT64, value FLOAT64>>>
histogram_percentiles (UDF)
Parameters
INPUTS
histogram ANY TYPE, percentiles ARRAY<FLOAT64>
histogram_to_mean (UDF)
Parameters
INPUTS
histogram ANY TYPE
histogram_to_threshold_count (UDF)
Parameters
INPUTS
histogram STRING, threshold INT64
hmac_sha256 (UDF)
Given a key and message, return the HMAC-SHA256 hash. This algorithm can be found in Wikipedia: https://en.wikipedia.org/wiki/HMAC#Implementation This implentation is validated against the NIST test vectors. See test/validation/hmac_sha256.py for more information.
Parameters
INPUTS
key BYTES, message BYTES
int_to_365_bits (UDF)
Parameters
INPUTS
value INT64
OUTPUTS
BYTES
int_to_hex_string (UDF)
Parameters
INPUTS
value INT64
OUTPUTS
STRING
json_extract_histogram (UDF)
Parameters
INPUTS
input STRING
json_extract_int_map (UDF)
Parameters
INPUTS
input STRING
json_mode_last (UDF)
Parameters
INPUTS
list ANY TYPE
keyed_histogram_get_sum (UDF)
Take a keyed histogram of type STRUCT
Parameters
INPUTS
keyed_histogram ANY TYPE, target_key STRING
kv_array_append_to_json_string (UDF)
Returns a JSON string which has the pair
appended to the provided input
JSON string. NULL is also valid for input
. Examples: udf.kv_array_append_to_json_string('{"foo":"bar"}', [STRUCT("baz" AS key, "boo" AS value)]) '{"foo":"bar","baz":"boo"}' udf.kv_array_append_to_json_string('{}', [STRUCT("baz" AS key, "boo" AS value)]) '{"baz": "boo"}'
Parameters
INPUTS
input STRING, arr ANY TYPE
OUTPUTS
STRING
kv_array_to_json_string (UDF)
Returns a JSON string representing the input key-value array. Value type must be able to be represented as a string - this function will cast to a string. At Mozilla, the schema for a map is STRUCTudf.kv_array_to_json_string(struct.key_value)
.
Parameters
INPUTS
kv_arr ANY TYPE
OUTPUTS
STRING
main_summary_scalars (UDF)
Parameters
INPUTS
processes ANY TYPE
map_bing_revenue_country_to_country_code (UDF)
For use by LTV revenue join only. Maps the Bing country to a country code. Only keeps the country codes we want to aggregate on.
Parameters
INPUTS
country STRING
map_mode_last (UDF)
Parameters
INPUTS
entries ANY TYPE
map_revenue_country (UDF)
Only for use by the LTV Revenue join. Maps country codes to the codes we have in the revenue dataset. Buckets small Bing countries into "other".
Parameters
INPUTS
engine STRING, country STRING
OUTPUTS
STRING
map_sum (UDF)
Parameters
INPUTS
entries ANY TYPE
marketing_attributable_desktop (UDF)
This is a UDF to help distinguish if acquired desktop clients are attributable to marketing efforts or not
Parameters
INPUTS
medium STRING
OUTPUTS
BOOLEAN
merge_scalar_user_data (UDF)
Given an array of scalar metric data that might have duplicate values for a metric, merge them into one value.
Parameters
INPUTS
aggs ARRAY<STRUCT<metric STRING, metric_type STRING, key STRING, process STRING, agg_type STRING, value FLOAT64>>
OUTPUTS
ARRAY<STRUCT<metric STRING, metric_type STRING, key STRING, process STRING, agg_type STRING, value FLOAT64>>
mod_uint128 (UDF)
This function returns "dividend mod divisor" where the dividend and the result is encoded in bytes, and divisor is an integer.
Parameters
INPUTS
dividend BYTES, divisor INT64
mode_last (UDF)
Parameters
INPUTS
list ANY TYPE
mode_last_retain_nulls (UDF)
Parameters
INPUTS
list ANY TYPE
monetized_search (UDF)
Stub monetized_search UDF for tests
Parameters
INPUTS
engine STRING, country STRING, distribution_id STRING, submission_date DATE
new_monthly_engine_searches_struct (UDF)
This struct represents the past year's worth of searches. Each month has its own entry, hence 12.
Parameters
INPUTS
) AS ( STRUCT( udf.array_of_12_zeroes(
normalize_fenix_metrics (UDF)
Accepts a glean metrics struct as input and returns a modified struct that nulls out histograms for older versions of the Glean SDK that reported pathological binning; see https://bugzilla.mozilla.org/show_bug.cgi?id=1592930
Parameters
INPUTS
telemetry_sdk_build STRING, metrics ANY TYPE
normalize_glean_baseline_client_info (UDF)
Parameters
INPUTS
client_info ANY TYPE, metrics ANY TYPE
normalize_glean_ping_info (UDF)
Parameters
INPUTS
ping_info ANY TYPE
normalize_main_payload (UDF)
Accepts a pipeline metadata struct as input and returns a modified struct that includes a few parsed or normalized variants of the input metadata fields.
Parameters
INPUTS
payload ANY TYPE
normalize_metadata (UDF)
Parameters
INPUTS
metadata ANY TYPE
normalize_monthly_searches (UDF)
Sum up the monthy search count arrays by normalized engine
Parameters
INPUTS
engine_searches ARRAY<STRUCT<key STRING, value STRUCT<total_searches ARRAY<INT64>, tagged_searches ARRAY<INT64>, search_with_ads ARRAY<INT64>, ad_click ARRAY<INT64>>>>
OUTPUTS
STRING
normalize_os (UDF)
Parameters
INPUTS
os STRING
normalize_search_engine (UDF)
Return normalized engine name for recognized engines This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
engine STRING
null_if_empty_list (UDF)
Return NULL if list is empty, otherwise return list. This cannot be done with NULLIF because NULLIF does not support arrays.
Parameters
INPUTS
list ANY TYPE
one_as_365_bits (UDF)
One represented as a byte array of 365 bits
Parameters
INPUTS
) AS ( CONCAT(REPEAT(b'\x00', 45
organic_vs_paid_desktop (UDF)
This is a UDF to help distinguish desktop client attribution as being organic or paid
Parameters
INPUTS
medium STRING
OUTPUTS
STRING
organic_vs_paid_mobile (UDF)
This is a UDF to help distinguish mobile client attribution as being organic or paid
Parameters
INPUTS
adjust_network STRING
OUTPUTS
STRING
pack_event_properties (UDF)
Parameters
INPUTS
event_properties ANY TYPE, indices ANY TYPE
OUTPUTS
ARRAY<STRUCT<key STRING, value STRING>>
parquet_array_sum (UDF)
Sum an array from a parquet-derived field. These are lists of an element
that contain the field value.
Parameters
INPUTS
list ANY TYPE
parse_desktop_telemetry_uri (UDF)
Parses and labels the components of a telemetry desktop ping submission uri Per https://docs.telemetry.mozilla.org/concepts/pipeline/http_edge_spec.html#special-handling-for-firefox-desktop-telemetry the format is /submit/telemetry/docId/docType/appName/appVersion/appUpdateChannel/appBuildID e.g. /submit/telemetry/ce39b608-f595-4c69-b6a6-f7a436604648/main/Firefox/61.0a1/nightly/20180328030202
Parameters
INPUTS
uri STRING
OUTPUTS
STRUCT<namespace STRING, document_id STRING, document_type STRING, app_name STRING, app_version STRING, app_update_channel STRING, app_build_id STRING>
parse_iso8601_date (UDF)
Take a ISO 8601 date or date and time string and return a DATE. Return null if parse fails. Possible formats: 2019-11-04, 2019-11-04T21:15:00+00:00, 2019-11-04T21:15:00Z, 20191104T211500Z
Parameters
INPUTS
date_str STRING
OUTPUTS
DATE
partner_org_clients (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
distribution_id STRING
OUTPUTS
STRING
partner_org_ga_metrics (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
) RETURNS STRING AS ( (SELECT 'hola_world' AS partner_org
OUTPUTS
STRING
partner_org_installs (UDF)
This is a stub implementation for use with tests; real implementation is in private-bigquery-etl
Parameters
INPUTS
distribution_id STRING
OUTPUTS
STRING
pos_of_leading_set_bit (UDF)
Returns the 0-based index of the first set bit. No set bits returns NULL.
Parameters
INPUTS
i INT64
pos_of_trailing_set_bit (UDF)
Identical to bits28_days_since_seen. Returns a 0-based index of the rightmost set bit in the passed bit pattern or null if no bits are set (bits = 0). To determine this position, we take a bitwise AND of the bit pattern and its complement, then we determine the position of the bit via base-2 logarithm; see https://stackoverflow.com/a/42747608/1260237
Parameters
INPUTS
bits INT64
OUTPUTS
INT64
product_info_with_baseline (UDF)
Similar to mozfun.norm.product_info(), but this UDF also handles "baseline" apps that were introduced differentiate for certain apps whether data is sent through Glean or core pings. This UDF has been temporarily introduced as part of https://bugzilla.mozilla.org/show_bug.cgi?id=1775216
Parameters
INPUTS
legacy_app_name STRING, normalized_os STRING
OUTPUTS
STRUCT<app_name STRING, product STRING, canonical_app_name STRING, canonical_name STRING, contributes_to_2019_kpi BOOLEAN, contributes_to_2020_kpi BOOLEAN, contributes_to_2021_kpi BOOLEAN>
pseudonymize_ad_id (UDF)
Pseudonymize Ad IDs, handling opt-outs.
Parameters
INPUTS
hashed_ad_id STRING, key BYTES
OUTPUTS
STRING
quantile_search_metric_contribution (UDF)
This function returns how much of one metric is contributed by the quantile of another metric. Quantile variable should add an offset to get the requried percentile value. Example: udf.quantile_search_metric_contribution(sap, search_with_ads, sap_percentiles[OFFSET(9)]) It returns search_with_ads if sap value in top 10% volumn else null.
Parameters
INPUTS
metric1 FLOAT64, metric2 FLOAT64, quantile FLOAT64
round_timestamp_to_minute (UDF)
Floor a timestamp object to the given minute interval.
Parameters
INPUTS
timestamp_expression TIMESTAMP, minute INT64
safe_crc32_uuid (UDF)
Calculate the CRC-32 hash of a 36-byte UUID, or NULL if the value isn't 36 bytes. This implementation is limited to an exact length because recursion does not work. Based on https://stackoverflow.com/a/18639999/1260237 See https://en.wikipedia.org/wiki/Cyclic_redundancy_check
Parameters
INPUTS
) AS ( [ 0, 1996959894, 3993919788, 2567524794, 124634137, 1886057615, 3915621685, 2657392035, 249268274, 2044508324, 3772115230, 2547177864, 162941995, 2125561021, 3887607047, 2428444049, 498536548, 1789927666, 4089016648, 2227061214, 450548861, 1843258603, 4107580753, 2211677639, 325883990, 1684777152, 4251122042, 2321926636, 335633487, 1661365465, 4195302755, 2366115317, 997073096, 1281953886, 3579855332, 2724688242, 1006888145, 1258607687, 3524101629, 2768942443, 901097722, 1119000684, 3686517206, 2898065728, 853044451, 1172266101, 3705015759, 2882616665, 651767980, 1373503546, 3369554304, 3218104598, 565507253, 1454621731, 3485111705, 3099436303, 671266974, 1594198024, 3322730930, 2970347812, 795835527, 1483230225, 3244367275, 3060149565, 1994146192, 31158534, 2563907772, 4023717930, 1907459465, 112637215, 2680153253, 3904427059, 2013776290, 251722036, 2517215374, 3775830040, 2137656763, 141376813, 2439277719, 3865271297, 1802195444, 476864866, 2238001368, 4066508878, 1812370925, 453092731, 2181625025, 4111451223, 1706088902, 314042704, 2344532202, 4240017532, 1658658271, 366619977, 2362670323, 4224994405, 1303535960, 984961486, 2747007092, 3569037538, 1256170817, 1037604311, 2765210733, 3554079995, 1131014506, 879679996, 2909243462, 3663771856, 1141124467, 855842277, 2852801631, 3708648649, 1342533948, 654459306, 3188396048, 3373015174, 1466479909, 544179635, 3110523913, 3462522015, 1591671054, 702138776, 2966460450, 3352799412, 1504918807, 783551873, 3082640443, 3233442989, 3988292384, 2596254646, 62317068, 1957810842, 3939845945, 2647816111, 81470997, 1943803523, 3814918930, 2489596804, 225274430, 2053790376, 3826175755, 2466906013, 167816743, 2097651377, 4027552580, 2265490386, 503444072, 1762050814, 4150417245, 2154129355, 426522225, 1852507879, 4275313526, 2312317920, 282753626, 1742555852, 4189708143, 2394877945, 397917763, 1622183637, 3604390888, 2714866558, 953729732, 1340076626, 3518719985, 2797360999, 1068828381, 1219638859, 3624741850, 2936675148, 906185462, 1090812512, 3747672003, 2825379669, 829329135, 1181335161, 3412177804, 3160834842, 628085408, 1382605366, 3423369109, 3138078467, 570562233, 1426400815, 3317316542, 2998733608, 733239954, 1555261956, 3268935591, 3050360625, 752459403, 1541320221, 2607071920, 3965973030, 1969922972, 40735498, 2617837225, 3943577151, 1913087877, 83908371, 2512341634, 3803740692, 2075208622, 213261112, 2463272603, 3855990285, 2094854071, 198958881, 2262029012, 4057260610, 1759359992, 534414190, 2176718541, 4139329115, 1873836001, 414664567, 2282248934, 4279200368, 1711684554, 285281116, 2405801727, 4167216745, 1634467795, 376229701, 2685067896, 3608007406, 1308918612, 956543938, 2808555105, 3495958263, 1231636301, 1047427035, 2932959818, 3654703836, 1088359270, 936918000, 2847714899, 3736837829, 1202900863, 817233897, 3183342108, 3401237130, 1404277552, 615818150, 3134207493, 3453421203, 1423857449, 601450431, 3009837614, 3294710456, 1567103746, 711928724, 3020668471, 3272380065, 1510334235, 755167117 ]
safe_sample_id (UDF)
Stably hash a client_id to an integer between 0 and 99, or NULL if client_id isn't 36 bytes
Parameters
INPUTS
client_id STRING
OUTPUTS
BYTES
search_counts_map_sum (UDF)
Calculate the sums of search counts per source and engine
Parameters
INPUTS
entries ARRAY<STRUCT<engine STRING, source STRING, count INT64>>
shift_28_bits_one_day (UDF)
Shift input bits one day left and drop any bits beyond 28 days.
Parameters
INPUTS
x INT64
shift_365_bits_one_day (UDF)
Shift input bits one day left and drop any bits beyond 365 days.
Parameters
INPUTS
x BYTES
shift_one_day (UDF)
Returns the bitfield shifted by one day, 0 for NULL
Parameters
INPUTS
x INT64
smoot_usage_from_28_bits (UDF)
Calculates a variety of metrics based on bit patterns of daily usage for the smoot_usage_* tables.
Parameters
INPUTS
bit_arrays ARRAY<STRUCT<days_created_profile_bits INT64, days_active_bits INT64>>
vector_add (UDF)
This function adds two vectors. The two vectors can have different length. If one vector is null, the other vector will be returned directly.
Parameters
INPUTS
a ARRAY<INT64>, b ARRAY<INT64>
zero_as_365_bits (UDF)
Zero represented as a 365-bit byte array
Parameters
INPUTS
) AS ( REPEAT(b'\x00', 46
zeroed_array (UDF)
Generates an array if all zeroes, of arbitrary length
Parameters
INPUTS
len INT64