Glam
build_hour_to_datetime (UDF)
Parses the custom build id used for Fenix builds in GLAM to a datetime.
Parameters
INPUTS
build_hour STRING
OUTPUTS
DATETIME
build_seconds_to_hour (UDF)
Returns a custom build id generated from the build seconds of a FOG build.
Parameters
INPUTS
build_hour STRING
OUTPUTS
STRING
fenix_build_to_build_hour (UDF)
Returns a custom build id generated from the build hour of a Fenix build.
Parameters
INPUTS
app_build_id STRING
OUTPUTS
STRING
histogram_bucket_from_value (UDF)
Parameters
INPUTS
buckets ARRAY<STRING>, val FLOAT64
OUTPUTS
FLOAT64
histogram_buckets_cast_string_array (UDF)
Cast histogram buckets into a string array.
Parameters
INPUTS
buckets ARRAY<INT64>
OUTPUTS
ARRAY<STRING>
histogram_cast_json (UDF)
Cast a histogram into a JSON blob.
Parameters
INPUTS
histogram ARRAY<STRUCT<key STRING, value FLOAT64>>
OUTPUTS
STRING
histogram_cast_struct (UDF)
Cast a String-based JSON histogram to an Array of Structs
Parameters
INPUTS
json_str STRING
OUTPUTS
ARRAY<STRUCT<KEY STRING, value FLOAT64>>
histogram_fill_buckets (UDF)
Interpolate missing histogram buckets with empty buckets.
Parameters
INPUTS
input_map ARRAY<STRUCT<key STRING, value FLOAT64>>, buckets ARRAY<STRING>
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64>>
histogram_fill_buckets_dirichlet (UDF)
Interpolate missing histogram buckets with empty buckets so it becomes a valid estimator for the dirichlet distribution.
See: https://docs.google.com/document/d/1ipy1oFIKDvHr3R6Ku0goRjS11R1ZH1z2gygOGkSdqUg
To use this, you must first: Aggregate the histograms to the client level, to get a histogram {k1: p1, k2:p2, ..., kK: pN} where the p's are proportions(and p1, p2, ... sum to 1) and Kis the number of buckets.
This is then the client's estimated density, and every client has been reduced to one row (i.e the client's histograms are reduced to this single one and normalized).
Then add all of these across clients to get {k1: P1, k2:P2, ..., kK: PK} where P1 = sum(p1 across N clients) and P2 = sum(p2 across N clients).
Calculate the total number of buckets K, as well as the total number of
profiles N reporting
Then our estimate for final density is: [{k1: ((P1 + 1/K) / (nreporting+1)), k2: ((P2 + 1/K) /(nreporting+1)), ... }
Parameters
INPUTS
input_map ARRAY<STRUCT<key STRING, value FLOAT64>>, buckets ARRAY<STRING>, total_users INT64
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64>>
histogram_filter_high_values (UDF)
Prevent overflows by only keeping buckets where value is less than 2^40 allowing 2^24 entries. This value was chosen somewhat abitrarily, typically the max histogram value is somewhere on the order of ~20 bits. Negative values are incorrect and should not happen but were observed, probably due to some bit flips.
Parameters
INPUTS
aggs ARRAY<STRUCT<key STRING, value INT64>>
OUTPUTS
ARRAY<STRUCT<key STRING, value INT64>>
histogram_from_buckets_uniform (UDF)
Create an empty histogram from an array of buckets.
Parameters
INPUTS
buckets ARRAY<STRING>
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64>>
histogram_generate_exponential_buckets (UDF)
Generate exponential buckets for a histogram.
Parameters
INPUTS
min FLOAT64, max FLOAT64, nBuckets FLOAT64
OUTPUTS
ARRAY<FLOAT64>DETERMINISTIC
histogram_generate_functional_buckets (UDF)
Generate functional buckets for a histogram. This is specific to Glean.
See: https://github.com/mozilla/glean/blob/main/glean-core/src/histogram/functional.rs
A functional bucketing algorithm. The bucket index of a given sample is determined with the following function:
i = $$ \lfloor{n log_{\text{base}}{(x)}}\rfloor $$
In other words, there are n buckets for each power of base
magnitude.
Parameters
INPUTS
log_base INT64, buckets_per_magnitude INT64, range_max INT64
OUTPUTS
ARRAY<FLOAT64>
histogram_generate_linear_buckets (UDF)
Generate linear buckets for a histogram.
Parameters
INPUTS
min FLOAT64, max FLOAT64, nBuckets FLOAT64
OUTPUTS
ARRAY<FLOAT64>
histogram_generate_scalar_buckets (UDF)
Generate scalar buckets for a histogram using a fixed number of buckets.
Parameters
INPUTS
min_bucket FLOAT64, max_bucket FLOAT64, num_buckets INT64
OUTPUTS
ARRAY<FLOAT64>
histogram_normalized_sum (UDF)
Compute the normalized sum of an array of histograms.
Parameters
INPUTS
arrs ARRAY<STRUCT<key STRING, value INT64>>, weight FLOAT64
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64>>
histogram_normalized_sum_with_original (UDF)
Compute the normalized and the non-normalized sum of an array of histograms.
Parameters
INPUTS
arrs ARRAY<STRUCT<key STRING, value INT64>>, weight FLOAT64
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64, non_norm_value FLOAT64>>
map_from_array_offsets (UDF)
Parameters
INPUTS
required ARRAY<FLOAT64>, `values` ARRAY<FLOAT64>
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64>>
map_from_array_offsets_precise (UDF)
Parameters
INPUTS
required ARRAY<FLOAT64>, `values` ARRAY<FLOAT64>
OUTPUTS
ARRAY<STRUCT<key STRING, value FLOAT64>>
percentile (UDF)
Get the value of the approximate CDF at the given percentile.
Parameters
INPUTS
pct FLOAT64, histogram ARRAY<STRUCT<key STRING, value FLOAT64>>, type STRING
OUTPUTS
FLOAT64