Glam

build_hour_to_datetime (UDF)

Parses the custom build id used for Fenix builds in GLAM to a datetime.

Parameters

INPUTS

build_hour STRING

OUTPUTS

DATETIME

Source | Edit

build_seconds_to_hour (UDF)

Returns a custom build id generated from the build seconds of a FOG build.

Parameters

INPUTS

build_hour STRING

OUTPUTS

STRING

Source | Edit

fenix_build_to_build_hour (UDF)

Returns a custom build id generated from the build hour of a Fenix build.

Parameters

INPUTS

app_build_id STRING

OUTPUTS

STRING

Source | Edit

histogram_bucket_from_value (UDF)

Parameters

INPUTS

buckets ARRAY<STRING>, val FLOAT64

OUTPUTS

FLOAT64

Source | Edit

histogram_buckets_cast_string_array (UDF)

Cast histogram buckets into a string array.

Parameters

INPUTS

buckets ARRAY<INT64>

OUTPUTS

ARRAY<STRING>

Source | Edit

histogram_cast_json (UDF)

Cast a histogram into a JSON blob.

Parameters

INPUTS

histogram ARRAY<STRUCT<key STRING, value FLOAT64>>

OUTPUTS

STRING

Source | Edit

histogram_cast_struct (UDF)

Cast a String-based JSON histogram to an Array of Structs

Parameters

INPUTS

json_str STRING

OUTPUTS

ARRAY<STRUCT<KEY STRING, value FLOAT64>>

Source | Edit

histogram_fill_buckets (UDF)

Interpolate missing histogram buckets with empty buckets.

Parameters

INPUTS

input_map ARRAY<STRUCT<key STRING, value FLOAT64>>, buckets ARRAY<STRING>

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64>>

Source | Edit

histogram_fill_buckets_dirichlet (UDF)

Interpolate missing histogram buckets with empty buckets so it becomes a valid estimator for the dirichlet distribution.

See: https://docs.google.com/document/d/1ipy1oFIKDvHr3R6Ku0goRjS11R1ZH1z2gygOGkSdqUg

To use this, you must first: Aggregate the histograms to the client level, to get a histogram {k1: p1, k2:p2, ..., kK: pN} where the p's are proportions(and p1, p2, ... sum to 1) and Kis the number of buckets.

This is then the client's estimated density, and every client has been reduced to one row (i.e the client's histograms are reduced to this single one and normalized).

Then add all of these across clients to get {k1: P1, k2:P2, ..., kK: PK} where P1 = sum(p1 across N clients) and P2 = sum(p2 across N clients).

Calculate the total number of buckets K, as well as the total number of profiles N reporting

Then our estimate for final density is: [{k1: ((P1 + 1/K) / (nreporting+1)), k2: ((P2 + 1/K) /(nreporting+1)), ... }

Parameters

INPUTS

input_map ARRAY<STRUCT<key STRING, value FLOAT64>>, buckets ARRAY<STRING>, total_users INT64

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64>>

Source | Edit

histogram_filter_high_values (UDF)

Prevent overflows by only keeping buckets where value is less than 2^40 allowing 2^24 entries. This value was chosen somewhat abitrarily, typically the max histogram value is somewhere on the order of ~20 bits. Negative values are incorrect and should not happen but were observed, probably due to some bit flips.

Parameters

INPUTS

aggs ARRAY<STRUCT<key STRING, value INT64>>

OUTPUTS

ARRAY<STRUCT<key STRING, value INT64>>

Source | Edit

histogram_from_buckets_uniform (UDF)

Create an empty histogram from an array of buckets.

Parameters

INPUTS

buckets ARRAY<STRING>

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64>>

Source | Edit

histogram_generate_exponential_buckets (UDF)

Generate exponential buckets for a histogram.

Parameters

INPUTS

min FLOAT64, max FLOAT64, nBuckets FLOAT64

OUTPUTS

ARRAY<FLOAT64>DETERMINISTIC

Source | Edit

histogram_generate_functional_buckets (UDF)

Generate functional buckets for a histogram. This is specific to Glean.

See: https://github.com/mozilla/glean/blob/main/glean-core/src/histogram/functional.rs

A functional bucketing algorithm. The bucket index of a given sample is determined with the following function:

i = $$ \lfloor{n log_{\text{base}}{(x)}}\rfloor $$

In other words, there are n buckets for each power of base magnitude.

Parameters

INPUTS

log_base INT64, buckets_per_magnitude INT64, range_max INT64

OUTPUTS

ARRAY<FLOAT64>

Source | Edit

histogram_generate_linear_buckets (UDF)

Generate linear buckets for a histogram.

Parameters

INPUTS

min FLOAT64, max FLOAT64, nBuckets FLOAT64

OUTPUTS

ARRAY<FLOAT64>

Source | Edit

histogram_generate_scalar_buckets (UDF)

Generate scalar buckets for a histogram using a fixed number of buckets.

Parameters

INPUTS

min_bucket FLOAT64, max_bucket FLOAT64, num_buckets INT64

OUTPUTS

ARRAY<FLOAT64>

Source | Edit

histogram_normalized_sum (UDF)

Compute the normalized sum of an array of histograms.

Parameters

INPUTS

arrs ARRAY<STRUCT<key STRING, value INT64>>, weight FLOAT64

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64>>

Source | Edit

histogram_normalized_sum_with_original (UDF)

Compute the normalized and the non-normalized sum of an array of histograms.

Parameters

INPUTS

arrs ARRAY<STRUCT<key STRING, value INT64>>, weight FLOAT64

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64, non_norm_value FLOAT64>>

Source | Edit

map_from_array_offsets (UDF)

Parameters

INPUTS

required ARRAY<FLOAT64>, `values` ARRAY<FLOAT64>

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64>>

Source | Edit

map_from_array_offsets_precise (UDF)

Parameters

INPUTS

required ARRAY<FLOAT64>, `values` ARRAY<FLOAT64>

OUTPUTS

ARRAY<STRUCT<key STRING, value FLOAT64>>

Source | Edit

percentile (UDF)

Get the value of the approximate CDF at the given percentile.

Parameters

INPUTS

pct FLOAT64, histogram ARRAY<STRUCT<key STRING, value FLOAT64>>, type STRING

OUTPUTS

FLOAT64

Source | Edit