Metrics sent by an application or library are defined in YAML files which follow
metrics.yaml JSON schema.
This files must be parsed by
glean_parser at build time
in order to generate code in the target language (e.g. Kotlin, Swift, ...). The generated code is
what becomes the public API to access the project's metrics.
For more information on how to introduce the
glean_parser build step for a specific language /
environment, refer to the "Adding Glean to your project"
section of this book.
Although we refer to metrics definitions YAML files as
metrics.yamlthroughout Glean documentation this files may be named whatever makes the most sense for each project and may even be broken down into multiple files, if necessary.
--- # Schema $schema: moz://mozilla.org/schemas/glean/metrics/2-0-0 # Category toolbar: # Name click: # Metric Parameters type: event description: | Event to record toolbar clicks. notification_emails: - CHANGE-ME@example.com bugs: - https://bugzilla.mozilla.org/123456789/ data_reviews: - http://example.com/path/to/data-review expires: 2019-06-01 double_click: ...
Declaring the schema at the top of a metrics definitions file is required, as it is what indicates that the current file is a metrics definitions file.
Categories are the top-level keys on metrics definition files. One single definition file may contain multiple categories grouping multiple metrics. They serve the purpose of grouping related metrics in a project.
Categories can contain alphanumeric lower case characters as well as the
which can be used to provide extra structure, for example
category.subcategory is a valid category.
Category lengths may not exceed 40 characters.
Categories may not start with the string
glean. That prefix is reserved for Glean internal metrics.
See the "Capitalization" note to understand how the category is formatted in generated code.
Metric names are the second-level keys on metrics definition files.
Names may contain alphanumeric lower case characters as well as the
_ character. Metric name
lengths may not exceed 30 characters.
"Capitalization" rules also apply to metric names on generated code.
Specific metric types may have special required parameters in their definition, these parameters are documented in each "Metric Type" reference page.
Following are the parameters common to all metric types.
Specifies the type of a metric, like "counter" or "event". This defines which operations are valid for the metric, how it is stored and how data analysis tooling displays it. See the list of supported metric types.
Once a metric is released in a product, its
typeshould not be changed. If any data was collected locally with the older
type, and hasn't yet been sent in a ping, recording data with the new
typemay cause any old persisted data to be lost for that metric. See this comment for an extended explanation of the different scenarios.
A textual description of the metric for humans. It should describe what the metric does, what it means for analysts, and its edge cases or any other helpful information.
The description field may contain markdown syntax.
The Glean linter uses a line length limit of 80 characters. If your description is longer, e.g. because it includes longer links, you can disable
yamllintusing the following annotations (and make sure to enable
yamllintagain as well):
# yamllint disable description: | Your extra long description, that's longer than 80 characters by far. # yamllint enable
A list of email addresses to notify for important events with the metric or when people with context or ownership for the metric need to be contacted.
For example when a metric's expiration is within in 14 days, emails will be sent
email@example.com to the
notification_emails addresses associated with the metric.
Consider adding both a group email address and an individual who is responsible for this metric.
A list of bugs (e.g. Bugzilla or GitHub) that are relevant to this metric. For example, bugs that track its original implementation or later changes to it.
Each entry should be the full URL to the bug in an issue tracker. The use of numbers alone is deprecated and will be an error in the future.
A list of URIs to any data collection review responses relevant to the metric.
When the metric is set to expire.
After a metric expires, an application will no longer collect or send data related to it. May be one of the following values:
<build date>: An ISO date
yyyy-mm-ddin UTC on which the metric expires. For example,
2019-03-13. This date is checked at build time. Except in special cases, this form should be used so that the metric automatically "sunsets" after a period of time. Emails will be sent to the
notification_emailsaddresses when the metric is about to expire. Generally, when a metric is no longer needed, it should simply be removed. This does not affect the availability of data already collected by the pipeline.
never: This metric never expires.
expired: This metric is manually expired.
Defines the lifetime of the metric. Different lifetimes affect when the metrics value is reset.
ping(default): The metric is cleared each time it is submitted in the ping. This is the most common case, and should be used for metrics that are highly dynamic, such as things computed in response to the user's interaction with the application.
application: The metric is related to an application run, and is cleared after the application restarts and any Glean-owned ping, due at startup, is submitted. This should be used for things that are constant during the run of an application, such as the operating system version. In practice, these metrics are generally set during application startup. A common mistake---using the ping lifetime for these type of metrics---means that they will only be included in the first ping sent during a particular run of the application.
user: Reach out to the Glean team before using this.. The metric is part of the user's profile and will live as long as the profile lives. This is often not the best choice unless the metric records a value that really needs to be persisted for the full lifetime of the user profile, e.g. an identifier like the
client_id, the day the product was first executed. It is rare to use this lifetime outside of some metrics that are built in to the Glean SDK.
Defines which pings the metric should be sent on.
If not specified, the metric is sent on the "default ping",
which is the
events ping for events and the
metrics ping for everything else.
Most metrics don't need to specify this unless they are sent on custom pings.
Data collection for this metric is disabled.
This is useful when you want to temporarily disable the collection for a specific metric without removing references to it in your source code.
Generally, when a metric is no longer needed, it should simply be removed. This does not affect the availability of data already collected by the pipeline.
The version of the metric. A monotonically increasing integer value. This should be bumped if the metric changes in a backward-incompatible way.
A list of data sensitivity categories that the metric falls under. There are four data collection categories related to data sensitivity defined in Mozilla's data collection review process:
Information about the machine or Firefox itself. Examples include OS, available memory, crashes and errors, outcome of automated processes like updates, safe browsing, activation, versions, and build id. This also includes compatibility information about features and APIs used by websites, add-ons, and other 3rd-party software that interact with Firefox during usage.
Information about the user’s direct engagement with Firefox. Examples include how many tabs, add-ons, or windows a user has open; uses of specific Firefox features; session length, scrolls and clicks; and the status of discrete user preferences.
Information about user web browsing that could be considered sensitive. Examples include users’ specific web browsing history; general information about their web browsing history (such as TLDs or categories of webpages visited over time); and potentially certain types of interaction data about specific webpages visited.
Information that directly identifies a person, or if combined with other data could identify a person. Examples include e-mail, usernames, identifiers such as google ad id, apple id, Firefox account, city or country (unless small ones are explicitly filtered out), or certain cookies. It may be embedded within specific website content, such as memory contents, dumps, captures of screen data, or DOM data.