Skip to content

User-Defined Functions (Javascript)

Persistent user-defined functions written in Javascript

Source Directory

bootstrap_percentile_ci

Bootstrap Percentile CI

udf_js.bootstrap_percentile_ci

Calculate a confidence interval using an efficient bootstrap sampling technique for a given percentile of a histogram. This implementation relies on the stdlib.js library and the binomial quantile function (https://github.com/stdlib-js/stats-base-dists-binomial-quantile/) for randomly sampling from a binomial distribution.

Source Directory | Metadata File


crc32

CRC-32

udf_js.crc32

Calculate the CRC-32 hash of an input string. The implementation here could be optimized. In particular, it calculates a lookup table on every invocation which could be cached and reused. In practice, though, this implementation appears to be fast enough that further optimization is not yet warranted. Based on https://stackoverflow.com/a/18639999/1260237 See https://en.wikipedia.org/wiki/Cyclic_redundancy_check

Source Directory | Metadata File


decode_uri_attribution

Decode Uri Attribution

udf_js.decode_uri_attribution

URL decodes the raw firefox_installer.install.attribution string to a STRUCT. The fields campaign, content, dlsource, dltoken, experiment, medium, source, ua, variation the string are extracted. If any value is (not+set) it is converted to (not set) to match the text from GA when the fields are not set.

Source Directory | Metadata File


extract_string_from_bytes

Extract string from bytes

udf_js.extract_string_from_bytes

Related to https://mozilla-hub.atlassian.net/browse/RS-682. The function extracts string data from payload which is in bytes.

Source Directory | Metadata File


gunzip

Gunzip

udf_js.gunzip

Unzips a GZIP string. This implementation relies on the zlib.js library (https://github.com/imaya/zlib.js) and the atob function for decoding base64.

Source Directory | Metadata File


jackknife_mean_ci

Jackknife Mean Ci

udf_js.jackknife_mean_ci

Calculates a confidence interval using a jackknife resampling technique for the mean of an array of values for various buckets; see https://en.wikipedia.org/wiki/Jackknife_resampling Users must specify the number of expected buckets as the first parameter to guard against the case where empty buckets lead to an array with missing elements. Usage generally involves first calculating an aggregate per bucket, then aggregating over buckets, passing ARRAY_AGG(metric) to this function.

Source Directory | Metadata File


jackknife_percentile_ci

Jackknife Percentile CI

udf_js.jackknife_percentile_ci

Calculate a confidence interval using a jackknife resampling technique for a given percentile of a histogram.

Source Directory | Metadata File


jackknife_ratio_ci

Jackknife Ratio Ci

udf_js.jackknife_ratio_ci

Calculates a confidence interval using a jackknife resampling technique for the weighted mean of an array of ratios for various buckets; see https://en.wikipedia.org/wiki/Jackknife_resampling Users must specify the number of expected buckets as the first parameter to guard against the case where empty buckets lead to an array with missing elements. Usage generally involves first calculating an aggregate per bucket, then aggregating over buckets, passing ARRAY_AGG(metric) to this function. Example: WITH bucketed AS ( SELECT submission_date, SUM(active_days_in_week) AS active_days_in_week, SUM(wau) AS wau FROM mytable GROUP BY submission_date, bucket_id ) SELECT submission_date, udf_js.jackknife_ratio_ci(20, ARRAY_AGG(STRUCT(CAST(active_days_in_week AS float64), CAST(wau as FLOAT64)))) AS intensity FROM bucketed GROUP BY submission_date

Source Directory | Metadata File


jackknife_sum_ci

Jackknife Sum Ci

udf_js.jackknife_sum_ci

Calculates a confidence interval using a jackknife resampling technique for the sum of an array of counts for various buckets; see https://en.wikipedia.org/wiki/Jackknife_resampling Users must specify the number of expected buckets as the first parameter to guard against the case where empty buckets lead to an array with missing elements. Usage generally involves first calculating an aggregate count per bucket, then aggregating over buckets, passing ARRAY_AGG(metric) to this function. Example: WITH bucketed AS ( SELECT submission_date, SUM(dau) AS dau_sum FROM mytable GROUP BY submission_date, bucket_id ) SELECT submission_date, udf_js.jackknife_sum_ci(ARRAY_AGG(dau_sum)).* FROM bucketed GROUP BY submission_date

Source Directory | Metadata File


json_extract_events

Json Extract Events

udf_js.json_extract_events

Source Directory | Metadata File


json_extract_histogram

Json Extract Histogram

udf_js.json_extract_histogram

Returns a parsed struct from a JSON string representing a histogram. This implementation uses JavaScript and is provided for performance comparison; see udf/udf_json_extract_histogram for a pure SQL implementation that will likely be more usable in practice.

Source Directory | Metadata File


json_extract_keyed_histogram

Json Extract Keyed Histogram

udf_js.json_extract_keyed_histogram

Returns an array of parsed structs from a JSON string representing a keyed histogram. This is likely only useful for histograms that weren't properly parsed to fields, so ended up embedded in an additional_properties JSON blob. Normally, keyed histograms will be modeled as a key/value struct where the values are JSON representations of single histograms. There is no pure SQL equivalent to this function, since BigQuery does not provide any functions for listing or iterating over keysn in a JSON map.

Source Directory | Metadata File


json_extract_missing_cols

Json Extract Missing Cols

udf_js.json_extract_missing_cols

Extract missing columns from additional properties. More generally, get a list of nodes from a JSON blob. Array elements are indicated as [...]. param input: The JSON blob to explode param indicates_node: An array of strings. If a key's value is an object, and contains one of these values, that key is returned as a node. param known_nodes: An array of strings. If a key is in this array, it is returned as a node. Notes: - Use indicates_node for things like histograms. For example ['histogram_type'] will ensure that each histogram will be returned as a missing node, rather than the subvalues within the histogram (e.g. values, sum, etc.) - Use known_nodes if you're aware of a missing section, like ['simpleMeasurements'] See here for an example usage https://sql.telemetry.mozilla.org/queries/64460/source

Source Directory | Metadata File


main_summary_active_addons

Main Summary Active Addons

udf_js.main_summary_active_addons

Add fields from additional_attributes to active_addons in main pings. Return an array instead of a "map" for backwards compatibility. The INT64 columns from BigQuery may be passed as strings, so parseInt before returning them if they will be coerced to BOOL. The fields from additional_attributes due to union types: integer or boolean for foreignInstall and userDisabled; string or number for version. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c00df191501b39d2c4e2ece3fe703a0ef3/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L422-L449

Source Directory | Metadata File


main_summary_addon_scalars

Main Summary Addon Scalars

udf_js.main_summary_addon_scalars

Parse scalars from payload.processes.dynamic into map columns for each value type. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c00df191501b39d2c4e2ece3fe703a0ef3/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L385-L399

Source Directory | Metadata File


main_summary_disabled_addons

Main Summary Disabled Addons

udf_js.main_summary_disabled_addons

Report the ids of the addons which are in the addonDetails but not in the activeAddons. They are the disabled addons (possibly because they are legacy). We need this as addonDetails may contain both disabled and active addons. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c00df191501b39d2c4e2ece3fe703a0ef3/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L451-L464

Source Directory | Metadata File


parse_sponsored_interaction

Extract string from bytes

udf_js.parse_sponsored_interaction

Related to https://mozilla-hub.atlassian.net/browse/RS-682. The function parses the sponsored interaction column from payload_error_bytes.contextual_services table.

Source Directory | Metadata File


sample_id

Sample Id

udf_js.sample_id

Stably hash a client_id to an integer between 0 and 99. This function is technically defined in SQL, but it calls a JS UDF implementation of a CRC-32 hash, so we defined it here to make it clear that its performance may be limited by BigQuery's JavaScript UDF environment.

Source Directory | Metadata File