Metrics YAML Registry Format

Metrics sent by an application or library are defined in YAML files which follow the metrics.yaml JSON schema.

This files must be parsed by glean_parser at build time in order to generate code in the target language (e.g. Kotlin, Swift, ...). The generated code is what becomes the public API to access the project's metrics.

For more information on how to introduce the glean_parser build step for a specific language / environment, refer to the "Adding Glean to your project" section of this book.

Note on the naming of these files

Although we refer to metrics definitions YAML files as metrics.yaml throughout Glean documentation this files may be named whatever makes the most sense for each project and may even be broken down into multiple files, if necessary.

File structure


---
# Schema
$schema: moz://mozilla.org/schemas/glean/metrics/2-0-0

$tags:
  - frontend

# Category
toolbar:
  # Name
  click:
    # Metric Parameters
    type: event
    description: |
      Event to record toolbar clicks.
    metadata:
      tags:
        - Interaction
    notification_emails:
      - CHANGE-ME@example.com
    bugs:
      - https://bugzilla.mozilla.org/123456789/
    data_reviews:
      - http://example.com/path/to/data-review
    expires: 2019-06-01

  double_click:
    ...

Schema

Declaring the schema at the top of a metrics definitions file is required, as it is what indicates that the current file is a metrics definitions file.

`$tags`

You may optionally declare tags at the file level that apply to all metrics in that file.

Name

Metric names are the second-level keys on metrics definition files.

Names may contain alphanumeric lower case characters as well as the _ character. Metric name lengths may not exceed 70 characters.

"Capitalization" rules also apply to metric names on generated code.

Metric parameters

Specific metric types may have special required parameters in their definition, these parameters are documented in each "Metric Type" reference page.

Following are the parameters common to all metric types.

Required parameters

`type`

Specifies the type of a metric, like "counter" or "event". This defines which operations are valid for the metric, how it is stored and how data analysis tooling displays it. See the list of supported metric types.

Types should not be changed after release

Once a metric is defined in a product, its type should only be changed in rare circumstances. It's better to rename the metric with the new type instead. The ingestion pipeline will create a new column for a metric with a changed type. Any new analysis will need to use the new column going forward. The old column will still be populated with data from old clients.

`description`

A textual description of the metric for humans. It should describe what the metric does, what it means for analysts, and its edge cases or any other helpful information.

The description field may contain markdown syntax.

Imposed limits on line length

The Glean linter uses a line length limit of 80 characters. If your description is longer, e.g. because it includes longer links, you can disable yamllint using the following annotations (and make sure to enable yamllint again as well):
# yamllint disable
description: |
  Your extra long description, that's longer than 80 characters by far.
# yamllint enable

`notification_emails`

A list of email addresses to notify for important events with the metric or when people with context or ownership for the metric need to be contacted.

For example when a metric's expiration is within in 14 days, emails will be sent from telemetry-alerts@mozilla.com to the notification_emails addresses associated with the metric.

Consider adding both a group email address and an individual who is responsible for this metric.

`bugs`

A list of bugs (e.g. Bugzilla or GitHub) that are relevant to this metric. For example, bugs that track its original implementation or later changes to it.

Each entry should be the full URL to the bug in an issue tracker. The use of numbers alone is deprecated and will be an error in the future.

`data_reviews`

A list of URIs to any data collection review responses relevant to the metric.

`expires`

When the metric is set to expire.

After a metric expires, an application will no longer collect or send data related to it. May be one of the following values:

<build date>: An ISO date yyyy-mm-dd in UTC on which the metric expires. For example, 2019-03-13. This date is checked at build time. Except in special cases, this form should be used so that the metric automatically "sunsets" after a period of time. Emails will be sent to the notification_emails addresses when the metric is about to expire. Generally, when a metric is no longer needed, it should simply be removed. This does not affect the availability of data already collected by the pipeline.
<major version>: An integer greater than 0 representing the major version the metric expires in, For example, 11. The version is checked at build time against the major provided to the glean_parser (see e.g. Build configuration for Android, Build configuration for iOS) and is only valid if a major version is provided at built time. If no major version is provided at build time and expiration by major version is used for a metric, an error is raised. Note that mixing expiration by date and version is not allowed within a product.
never: This metric never expires.
expired: This metric is manually expired.

Optional parameters

`tags`

default: []

A list of tag names associated with this metric. Must correspond to an entry specified in a tags file.

`lifetime`

default: ping

Defines the lifetime of the metric. Different lifetimes affect when the metrics value is reset.

`ping` (default)

The metric is cleared each time it is submitted in the ping. This is the most common case, and should be used for metrics that are highly dynamic, such as things computed in response to the user's interaction with the application.

`application`

The metric is related to an application run, and is cleared after the application restarts and any Glean-owned ping, due at startup, is submitted. This should be used for things that are constant during the run of an application, such as the operating system version. In practice, these metrics are generally set during application startup. A common mistake--- using the ping lifetime for these type of metrics---means that they will only be included in the first ping sent during a particular run of the application.

`user`

NOTE: Reach out to the Glean team before using this.

The metric is part of the user's profile and will live as long as the profile lives. This is often not the best choice unless the metric records a value that really needs to be persisted for the full lifetime of the user profile, e.g. an identifier like the client_id, the day the product was first executed. It is rare to use this lifetime outside of some metrics that are built in to the Glean SDK.

`send_in_pings`

default: events|metrics

Defines which pings the metric should be sent on. If not specified, the metric is sent on the default ping, which is the events ping for events and the metrics ping for everything else.

Most metrics don't need to specify this unless they are sent on custom pings.

The special value default may be used, in case it's required for a metric to be sent on the default ping as well as in a custom ping.

Adding metrics to every ping

For the small number of metrics that should be in every ping the Glean SDKs will eventually provide a solution. See bug 1695236 for details.


send_in_pings:
  - my-custom-ping
  - default

`disabled`

default: false

Data collection for this metric is disabled.

This is useful when you want to temporarily disable the collection for a specific metric without removing references to it in your source code.

Generally, when a metric is no longer needed, it should simply be removed. This does not affect the availability of data already collected by the pipeline.

`version`

default: 0

The version of the metric. A monotonically increasing integer value. This should be bumped if the metric changes in a backward-incompatible way.

`data_sensitivity`

default: []

A list of data sensitivity categories that the metric falls under. There are four data collection categories related to data sensitivity defined in Mozilla's data collection review process:

Category 1: Technical Data (`technical`)

Information about the machine or Firefox itself. Examples include OS, available memory, crashes and errors, outcome of automated processes like updates, safe browsing, activation, versions, and build id. This also includes compatibility information about features and APIs used by websites, add-ons, and other 3rd-party software that interact with Firefox during usage.

Category 2: Interaction Data (`interaction`)

Information about the user’s direct engagement with Firefox. Examples include how many tabs, add-ons, or windows a user has open; uses of specific Firefox features; session length, scrolls and clicks; and the status of discrete user preferences. It also includes information about the user's in-product journeys and product choices helpful to understand engagement (attitudes). For example, selections of add-ons or tiles to determine potential interest categories etc.

Category 3: Stored Content & Communications (`stored_content`)

(formerly Web activity data, web_activity)

Information about what people store, sync, communicate or connect to where the information is generally considered to be more sensitive and personal in nature. Examples include users' saved URLs or URL history, specific web browsing history, general information about their web browsing history (such as TLDs or categories of webpages visited over time) and potentially certain types of interaction data about specific web pages or stories visited (such as highlighted portions of a story). It also includes information such as content saved by users to an individual account like saved URLs, tags, notes, passwords and files as well as communications that users have with one another through a Mozilla service.

Category 4: Highly sensitive data or clearly identifiable personal data (`highly_sensitive`)

Information that directly identifies a person, or if combined with other data could identify a person. This data may be embedded within specific website content, such as memory contents, dumps, captures of screen data, or DOM data. Examples include account registration data like name, password, and email address associated with an account, payment data in connection with subscriptions or donations, contact information such as phone numbers or mailing addresses, email addresses associated with surveys, promotions and customer support contacts. It also includes any data from different categories that, when combined, can identify a person, device, household or account. For example Category 1 log data combined with Category 3 saved URLs. Additional examples are: voice audio commands (including a voice audio file), speech-to-text or text-to-speech (including transcripts), biometric data, demographic information, and precise location data associated with a persistent identifier, individual or small population cohorts. This is location inferred or determined from mechanisms other than IP such as wi-fi access points, Bluetooth beacons, cell phone towers or provided directly to us, such as in a survey or a profile.

Glean SDKs - Cross-platform Telemetry Libraries