hist

Functions for working with string encodings of histograms from desktop telemetry.

count (UDF)

Given histogram h, return the count of all measurements across all buckets.

Extracts the values from the histogram and sums them, returning the total_count.

Parameters

INPUTS

histogram STRING

OUTPUTS

INT64

Source | Edit

extract (UDF)

Return a parsed struct from a string-encoded histogram.

We support a variety of compact encodings as well as the classic JSON representation as sent in main pings.

The built-in BigQuery JSON parsing functions are not powerful enough to handle all the logic here, so we resort to some string processing. This function could behave unexpectedly on poorly-formatted histogram JSON, but we expect that payload validation in the data pipeline should ensure that histograms are well formed, which gives us some flexibility.

For more on desktop telemetry histogram structure, see:

https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/collection/histograms.html

The compact encodings were originally proposed in:

https://docs.google.com/document/d/1k_ji_1DB6htgtXnPpMpa7gX0klm-DGV5NMY7KkvVB00/edit#

SELECT
  mozfun.hist.extract(
    '{"bucket_count":3,"histogram_type":4,"sum":1,"range":[1,2],"values":{"0":1,"1":0}}'
  ).sum
-- 1

SELECT
  mozfun.hist.extract('5').sum
-- 5

Parameters

INPUTS

input STRING

OUTPUTS

INT64

Source | Edit

extract_histogram_sum (UDF)

Extract a histogram sum from a JSON str representation

Parameters

INPUTS

input STRING

OUTPUTS

INT64

Source | Edit

extract_keyed_hist_sum (UDF)

Sum of a keyed histogram, across all keys it contains.

Extract Keyed Histogram Sum

Takes a keyed histogram and returns a single number: the sum of all keys it contains. The expected input type is ARRAY<STRUCT<key STRING, value STRING>>

The return type is INT64.

The key field will be ignored, and the `value is expected to be the compact histogram representation.

Parameters

INPUTS

keyed_histogram ARRAY<STRUCT<key STRING, value STRING>>

OUTPUTS

INT64

Source | Edit

mean (UDF)

Given histogram h, return floor(mean) of the measurements in the bucket. That is, the histogram sum divided by the number of measurements taken.

https://github.com/mozilla/telemetry-batch-view/blob/ea0733c/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L292-L307

Parameters

INPUTS

histogram ANY TYPE

OUTPUTS

STRUCT<sum INT64, VALUES ARRAY<STRUCT<value INT64>>>

Source | Edit

merge (UDF)

Merge an array of histograms into a single histogram.

The histogram values will be summed per-bucket
The count will be summed
Other fields will take the mode_last

Parameters

INPUTS

histogram_list ANY TYPE

Source | Edit

normalize (UDF)

Normalize a histogram. Set sum to 1, and normalize to 1 the histogram bucket counts.

Parameters

INPUTS

histogram STRUCT<bucket_count INT64, `sum` INT64, histogram_type INT64, `range` ARRAY<INT64>, `values` ARRAY<STRUCT<key INT64, value INT64>>>

OUTPUTS

STRUCT<bucket_count INT64, `sum` INT64, histogram_type INT64, `range` ARRAY<INT64>, `values` ARRAY<STRUCT<key INT64, value FLOAT64>>>

Source | Edit

percentiles (UDF)

Given histogram and list of percentiles,calculate what those percentiles are for the histogram. If the histogram is empty, returns NULL.

Parameters

INPUTS

histogram ANY TYPE, percentiles ARRAY<FLOAT64>

OUTPUTS

ARRAY<STRUCT<percentile FLOAT64, value INT64>>

Source | Edit

string_to_json (UDF)

Convert a histogram string (in JSON or compact format) to a full histogram JSON blob.

Parameters

INPUTS

input STRING

OUTPUTS

INT64

Source | Edit

threshold_count (UDF)

Return the number of recorded observations greater than threshold for the histogram. CAUTION: Does not count any buckets that have any values less than the threshold. For example, a bucket with range (1, 10) will not be counted for a threshold of 2. Use threshold that are not bucket boundaries with caution.

https://github.com/mozilla/telemetry-batch-view/blob/ea0733c/src/main/scala/com/mozilla/telemetry/utils/MainPing.scala#L213-L239

Parameters

INPUTS

histogram STRING, threshold INT64

Source | Edit