User-Defined Functions (Javascript)
Persistent user-defined functions written in Javascript
bootstrap_percentile_ci
Bootstrap Percentile CI
udf_js.bootstrap_percentile_ci
Calculate a confidence interval using an efficient bootstrap sampling technique for a given percentile of a histogram. This implementation relies on the stdlib.js library and the binomial quantile function (https://github.com/stdlib-js/stats-base-dists-binomial-quantile/) for randomly sampling from a binomial distribution.
Source Directory | Metadata File
crc32
CRC-32
udf_js.crc32
Calculate the CRC-32 hash of an input string. The implementation here could be optimized. In particular, it calculates a lookup table on every invocation which could be cached and reused. In practice, though, this implementation appears to be fast enough that further optimization is not yet warranted. Based on https://stackoverflow.com/a/18639999/1260237 See https://en.wikipedia.org/wiki/Cyclic_redundancy_check
Source Directory | Metadata File
decode_uri_attribution
Decode Uri Attribution
udf_js.decode_uri_attribution
URL decodes the raw firefox_installer.install.attribution string to a STRUCT. The fields campaign, content, dlsource, dltoken, experiment, medium, source, ua, variation the string are extracted. If any value is (not+set) it is converted to (not set) to match the text from GA when the fields are not set.
Source Directory | Metadata File
extract_string_from_bytes
Extract string from bytes
udf_js.extract_string_from_bytes
Related to https://mozilla-hub.atlassian.net/browse/RS-682. The function extracts string data from payload
which is in bytes.
Source Directory | Metadata File
gunzip
Gunzip
udf_js.gunzip
Unzips a GZIP string. This implementation relies on the zlib.js library (https://github.com/imaya/zlib.js) and the atob function for decoding base64.
Source Directory | Metadata File
jackknife_mean_ci
Jackknife Mean Ci
udf_js.jackknife_mean_ci
Calculates a confidence interval using a jackknife resampling technique for the mean of an array of values for various buckets; see https://en.wikipedia.org/wiki/Jackknife_resampling Users must specify the number of expected buckets as the first parameter to guard against the case where empty buckets lead to an array with missing elements. Usage generally involves first calculating an aggregate per bucket, then aggregating over buckets, passing ARRAY_AGG(metric) to this function.
Source Directory | Metadata File
jackknife_percentile_ci
Jackknife Percentile CI
udf_js.jackknife_percentile_ci
Calculate a confidence interval using a jackknife resampling technique for a given percentile of a histogram.
Source Directory | Metadata File
jackknife_ratio_ci
Jackknife Ratio Ci
udf_js.jackknife_ratio_ci
Calculates a confidence interval using a jackknife resampling technique for the weighted mean of an array of ratios for various buckets; see https://en.wikipedia.org/wiki/Jackknife_resampling Users must specify the number of expected buckets as the first parameter to guard against the case where empty buckets lead to an array with missing elements. Usage generally involves first calculating an aggregate per bucket, then aggregating over buckets, passing ARRAY_AGG(metric) to this function. Example: WITH bucketed AS ( SELECT submission_date, SUM(active_days_in_week) AS active_days_in_week, SUM(wau) AS wau FROM mytable GROUP BY submission_date, bucket_id ) SELECT submission_date, udf_js.jackknife_ratio_ci(20, ARRAY_AGG(STRUCT(CAST(active_days_in_week AS float64), CAST(wau as FLOAT64)))) AS intensity FROM bucketed GROUP BY submission_date
Source Directory | Metadata File
jackknife_sum_ci
Jackknife Sum Ci
udf_js.jackknife_sum_ci
Calculates a confidence interval using a jackknife resampling technique for the sum of an array of counts for various buckets; see https://en.wikipedia.org/wiki/Jackknife_resampling Users must specify the number of expected buckets as the first parameter to guard against the case where empty buckets lead to an array with missing elements. Usage generally involves first calculating an aggregate count per bucket, then aggregating over buckets, passing ARRAY_AGG(metric) to this function. Example: WITH bucketed AS ( SELECT submission_date, SUM(dau) AS dau_sum FROM mytable GROUP BY submission_date, bucket_id ) SELECT submission_date, udf_js.jackknife_sum_ci(ARRAY_AGG(dau_sum)).* FROM bucketed GROUP BY submission_date
Source Directory | Metadata File
json_extract_events
Json Extract Events
udf_js.json_extract_events
Source Directory | Metadata File
json_extract_histogram
Json Extract Histogram
udf_js.json_extract_histogram
Returns a parsed struct from a JSON string representing a histogram. This implementation uses JavaScript and is provided for performance comparison; see udf/udf_json_extract_histogram for a pure SQL implementation that will likely be more usable in practice.
Source Directory | Metadata File
json_extract_keyed_histogram
Json Extract Keyed Histogram
udf_js.json_extract_keyed_histogram
Returns an array of parsed structs from a JSON string representing a keyed histogram. This is likely only useful for histograms that weren't properly parsed to fields, so ended up embedded in an additional_properties JSON blob. Normally, keyed histograms will be modeled as a key/value struct where the values are JSON representations of single histograms. There is no pure SQL equivalent to this function, since BigQuery does not provide any functions for listing or iterating over keysn in a JSON map.
Source Directory | Metadata File
json_extract_missing_cols
Json Extract Missing Cols
udf_js.json_extract_missing_cols
Extract missing columns from additional properties. More generally, get a list of nodes from a JSON blob. Array elements are indicated as [...]. param input: The JSON blob to explode param indicates_node: An array of strings. If a key's value is an object, and contains one of these values, that key is returned as a node. param known_nodes: An array of strings. If a key is in this array, it is returned as a node. Notes: - Use indicates_node for things like histograms. For example ['histogram_type'] will ensure that each histogram will be returned as a missing node, rather than the subvalues within the histogram (e.g. values, sum, etc.) - Use known_nodes if you're aware of a missing section, like ['simpleMeasurements'] See here for an example usage https://sql.telemetry.mozilla.org/queries/64460/source
Source Directory | Metadata File
main_summary_active_addons
Main Summary Active Addons
udf_js.main_summary_active_addons
Add fields from additional_attributes to active_addons in main pings. Return an array instead of a "map" for backwards compatibility. The INT64 columns from BigQuery may be passed as strings, so parseInt before returning them if they will be coerced to BOOL. The fields from additional_attributes due to union types: integer or boolean for foreignInstall and userDisabled; string or number for version. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c00df191501b39d2c4e2ece3fe703a0ef3/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L422-L449
Source Directory | Metadata File
main_summary_addon_scalars
Main Summary Addon Scalars
udf_js.main_summary_addon_scalars
Parse scalars from payload.processes.dynamic into map columns for each value type. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c00df191501b39d2c4e2ece3fe703a0ef3/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L385-L399
Source Directory | Metadata File
main_summary_disabled_addons
Main Summary Disabled Addons
udf_js.main_summary_disabled_addons
Report the ids of the addons which are in the addonDetails but not in the activeAddons. They are the disabled addons (possibly because they are legacy). We need this as addonDetails may contain both disabled and active addons. https://github.com/mozilla/telemetry-batch-view/blob/ea0733c00df191501b39d2c4e2ece3fe703a0ef3/src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala#L451-L464
Source Directory | Metadata File
parse_sponsored_interaction
Extract string from bytes
udf_js.parse_sponsored_interaction
Related to https://mozilla-hub.atlassian.net/browse/RS-682. The function parses the sponsored interaction column from payload_error_bytes.contextual_services table.
Source Directory | Metadata File
sample_id
Sample Id
udf_js.sample_id
Stably hash a client_id to an integer between 0 and 99. This function is technically defined in SQL, but it calls a JS UDF implementation of a CRC-32 hash, so we defined it here to make it clear that its performance may be limited by BigQuery's JavaScript UDF environment.
Source Directory | Metadata File