Metrics sent by an application or library are defined in YAML files which follow
metrics.yaml JSON schema.
This files must be parsed by
glean_parser at build time
in order to generate code in the target language (e.g. Kotlin, Swift, ...). The generated code is
what becomes the public API to access the project's metrics.
For more information on how to introduce the
glean_parser build step for a specific language /
environment, refer to the "Adding Glean to your project"
section of this book.
Although we refer to metrics definitions YAML files as
metrics.yamlthroughout Glean documentation this files may be named whatever makes the most sense for each project and may even be broken down into multiple files, if necessary.
--- # Schema $schema: moz://mozilla.org/schemas/glean/metrics/2-0-0 $tags: - frontend # Category toolbar: # Name click: # Metric Parameters type: event description: | Event to record toolbar clicks. metadata: tags: - Interaction notification_emails: - CHANGE-ME@example.com bugs: - https://bugzilla.mozilla.org/123456789/ data_reviews: - http://example.com/path/to/data-review expires: 2019-06-01 double_click: ...
Declaring the schema at the top of a metrics definitions file is required, as it is what indicates that the current file is a metrics definitions file.
You may optionally declare tags at the file level that apply to all metrics in that file.
Categories are the top-level keys on metrics definition files. One single definition file may contain multiple categories grouping multiple metrics. They serve the purpose of grouping related metrics in a project.
Categories can contain alphanumeric lower case characters as well as the
which can be used to provide extra structure, for example
category.subcategory is a valid category.
Category lengths may not exceed 40 characters.
Categories may not start with the string
glean. That prefix is reserved for Glean internal metrics.
See the "Capitalization" note to understand how the category is formatted in generated code.
Metric names are the second-level keys on metrics definition files.
Names may contain alphanumeric lower case characters as well as the
_ character. Metric name
lengths may not exceed 30 characters.
"Capitalization" rules also apply to metric names on generated code.
Specific metric types may have special required parameters in their definition, these parameters are documented in each "Metric Type" reference page.
Following are the parameters common to all metric types.
Specifies the type of a metric, like "counter" or "event". This defines which operations are valid for the metric, how it is stored and how data analysis tooling displays it. See the list of supported metric types.
Once a metric is defined in a product, its
typemust not be changed. The ingestion pipeline will not be able to handle such a change. If a type change is required a new metric must be added with a new name and the new type. This will require an additional data review, in which you can also reference the old data review.
A textual description of the metric for humans. It should describe what the metric does, what it means for analysts, and its edge cases or any other helpful information.
The description field may contain markdown syntax.
The Glean linter uses a line length limit of 80 characters. If your description is longer, e.g. because it includes longer links, you can disable
yamllintusing the following annotations (and make sure to enable
yamllintagain as well):
# yamllint disable description: | Your extra long description, that's longer than 80 characters by far. # yamllint enable
A list of email addresses to notify for important events with the metric or when people with context or ownership for the metric need to be contacted.
For example when a metric's expiration is within in 14 days, emails will be sent
email@example.com to the
notification_emails addresses associated with the metric.
Consider adding both a group email address and an individual who is responsible for this metric.
A list of bugs (e.g. Bugzilla or GitHub) that are relevant to this metric. For example, bugs that track its original implementation or later changes to it.
Each entry should be the full URL to the bug in an issue tracker. The use of numbers alone is deprecated and will be an error in the future.
A list of URIs to any data collection review responses relevant to the metric.
When the metric is set to expire.
After a metric expires, an application will no longer collect or send data related to it. May be one of the following values:
<build date>: An ISO date
yyyy-mm-ddin UTC on which the metric expires. For example,
2019-03-13. This date is checked at build time. Except in special cases, this form should be used so that the metric automatically "sunsets" after a period of time. Emails will be sent to the
notification_emailsaddresses when the metric is about to expire. Generally, when a metric is no longer needed, it should simply be removed. This does not affect the availability of data already collected by the pipeline.
<major version>: An integer greater than 0 representing the major version the metric expires in, For example,
11. The version is checked at build time against the major provided to the glean_parser (see e.g. Build configuration for Android, Build configuration for iOS) and is only valid if a major version is provided at built time. If no major version is provided at build time and expiration by major version is used for a metric, an error is raised. Note that mixing expiration by date and version is not allowed within a product.
never: This metric never expires.
expired: This metric is manually expired.
A list of tag names associated with this metric. Must correspond to an entry specified in a tags file.
Defines the lifetime of the metric. Different lifetimes affect when the metrics value is reset.
The metric is cleared each time it is submitted in the ping. This is the most common case, and should be used for metrics that are highly dynamic, such as things computed in response to the user's interaction with the application.
The metric is related to an application run, and is cleared after the application restarts and any Glean-owned ping, due at startup, is submitted. This should be used for things that are constant during the run of an application, such as the operating system version. In practice, these metrics are generally set during application startup. A common mistake--- using the ping lifetime for these type of metrics---means that they will only be included in the first ping sent during a particular run of the application.
NOTE: Reach out to the Glean team before using this.
The metric is part of the user's profile and will live as long as the profile lives.
This is often not the best choice unless the metric records a value that really needs
to be persisted for the full lifetime of the user profile, e.g. an identifier like the
the day the product was first executed. It is rare to use this lifetime outside of some metrics
that are built in to the Glean SDK.
Defines which pings the metric should be sent on.
If not specified, the metric is sent on the default ping,
which is the
events ping for events and the
metrics ping for everything else.
Most metrics don't need to specify this unless they are sent on custom pings.
The special value
default may be used, in case it's required for a metric to be sent
on the default ping as well as in a custom ping.
For the small number of metrics that should be in every ping the Glean SDKs will eventually provide a solution. See bug 1695236 for details.
send_in_pings: - my-custom-ping - default
Data collection for this metric is disabled.
This is useful when you want to temporarily disable the collection for a specific metric without removing references to it in your source code.
Generally, when a metric is no longer needed, it should simply be removed. This does not affect the availability of data already collected by the pipeline.
The version of the metric. A monotonically increasing integer value. This should be bumped if the metric changes in a backward-incompatible way.
A list of data sensitivity categories that the metric falls under. There are four data collection categories related to data sensitivity defined in Mozilla's data collection review process:
Information about the machine or Firefox itself. Examples include OS, available memory, crashes and errors, outcome of automated processes like updates, safe browsing, activation, versions, and build id. This also includes compatibility information about features and APIs used by websites, add-ons, and other 3rd-party software that interact with Firefox during usage.
Information about the user’s direct engagement with Firefox. Examples include how many tabs, add-ons, or windows a user has open; uses of specific Firefox features; session length, scrolls and clicks; and the status of discrete user preferences. It also includes information about the user's in-product journeys and product choices helpful to understand engagement (attitudes). For example, selections of add-ons or tiles to determine potential interest categories etc.
(formerly Web activity data,
Information about what people store, sync, communicate or connect to where the information is generally considered to be more sensitive and personal in nature. Examples include users' saved URLs or URL history, specific web browsing history, general information about their web browsing history (such as TLDs or categories of webpages visited over time) and potentially certain types of interaction data about specific web pages or stories visited (such as highlighted portions of a story). It also includes information such as content saved by users to an individual account like saved URLs, tags, notes, passwords and files as well as communications that users have with one another through a Mozilla service.
Information that directly identifies a person, or if combined with other data could identify a person. This data may be embedded within specific website content, such as memory contents, dumps, captures of screen data, or DOM data. Examples include account registration data like name, password, and email address associated with an account, payment data in connection with subscriptions or donations, contact information such as phone numbers or mailing addresses, email addresses associated with surveys, promotions and customer support contacts. It also includes any data from different categories that, when combined, can identify a person, device, household or account. For example Category 1 log data combined with Category 3 saved URLs. Additional examples are: voice audio commands (including a voice audio file), speech-to-text or text-to-speech (including transcripts), biometric data, demographic information, and precise location data associated with a persistent identifier, individual or small population cohorts. This is location inferred or determined from mechanisms other than IP such as wi-fi access points, Bluetooth beacons, cell phone towers or provided directly to us, such as in a survey or a profile.