Adding Glean to your project

Before using Glean

Products using the Glean SDK to collect telemetry must:

  • add documentation for any new metric collected with the library in its repository (see an example);
  • include the markdown-formatted documentation generated from the metrics.yaml and pings.yaml files in the project's documentation;
  • go through data review for the newly collected data by following this process;
  • provide a way for users to turn data collection off (e.g. providing settings to control Glean.setUploadEnabled()).

Usage

Integrating with your project

Setting up the dependency

Glean is published on maven.mozilla.org. To use it, you need to add the following to your project's top-level build file, in the allprojects block (see e.g. Glean's own build.gradle):

repositories {
    maven {
       url "https://maven.mozilla.org/maven2"
    }
}

Each module that uses Glean needs to specify it in its build file, in the dependencies block. Add this to your Gradle configuration:

implementation "org.mozilla.components:service-glean:{latest-version}"

Important: the {latest-version} placeholder in the above link should be replaced with the version of Android Components used by the project.

The Glean SDK is released as part of android-components. Therefore, it follows android-components' versions. The android-components release page can be used to determine the latest version.

For example, if version 33.0.0 is used, then the include directive becomes:

implementation "org.mozilla.components:service-glean:33.0.0"

Requirements

  • Python >= 3.6

Setting up the dependency

Glean can be consumed through Carthage, a dependency manager for macOS and iOS. For consuming the latest version of Glean, add the following line to your Cartfile:

github "mozilla/glean" "{latest-version}"

Important: the {latest-version} placeholder should be replaced with the version number of the latest Glean SDK release. You can find the version number on the release page.

Then check out and build the new dependency:

carthage update --platform iOS

Integrating with the build system

For integration with the build system you can follow the Carthage Quick Start steps.

  1. After building the dependency one drag the built .framework binaries from Carthage/Build/iOS into your application's Xcode project.

  2. On your application targets' Build Phases settings tab, click the + icon and choose New Run Script Phase. If you already use Carthage for other dependencies, extend the existing step. Create a Run Script in which you specify your shell (ex: /bin/sh), add the following contents to the script area below the shell:

    /usr/local/bin/carthage copy-frameworks
    
  3. Add the path to the Glean framework under "Input Files":

    $(SRCROOT)/Carthage/Build/iOS/Glean.framework
    
  4. Add the paths to the copied framework to the "Output Files":

    $(BUILT_PRODUCTS_DIR)/$(FRAMEWORKS_FOLDER_PATH)/Glean.framework
    

We recommend using a virtual environment for your work to isolate the dependencies for your project. There are many popular abstractions on top of virtual environments in the Python ecosystem which can help manage your project dependencies.

The Python Glean bindings currently have prebuilt wheels on PyPI for x86_64 Windows, Linux and macOS.

If you're running one of those platforms and have your virtual environment set up and activated, you can install Glean into it using:

$ python -m pip install glean_sdk

If you are not on one of these platforms, you will need to build the Glean Python bindings from source using these instructions.

TODO. To be implemented in bug 1643568.

Adding new metrics

All metrics that your project collects must be defined in a metrics.yaml file.

The format of that file is documented with glean_parser. To learn more, see adding new metrics.

Important: as stated before, any new data collection requires documentation and data-review. This is also required for any new metric automatically collected by the Glean SDK.

In order for the Glean SDK to generate an API for your metrics, two Gradle plugins must be included in your build:

The Glean Gradle plugin is distributed through Mozilla's Maven, so we need to tell your build where to look for it by adding the following to the top of your build.gradle:

buildscript {
    repositories {
        // Include the next clause if you are tracking snapshots of android components
        maven {
            url "https://snapshots.maven.mozilla.org/maven2"
        }
        maven {
            url "https://maven.mozilla.org/maven2"
        }

        dependencies {
            classpath "org.mozilla.components:tooling-glean-gradle:{android-components-version}"
        }
    }
}

Important: as above, the {android-components-version} placeholder in the above link should be replaced with the version number of android components used in your project.

The JetBrains Python plugin is distributed in the Gradle plugin repository, so it can be included with:

plugins {
    id "com.jetbrains.python.envs" version "0.0.26"
}

Right before the end of the same file, we need to apply the Glean Gradle plugin. Set any additional parameters to control the behavior of the Glean Gradle plugin before calling apply plugin.

// Optionally, set any parameters to send to the plugin.
ext.gleanGenerateMarkdownDocs = true
apply plugin: "org.mozilla.telemetry.glean-gradle-plugin"

Note: Earlier versions of Glean used a Gradle script (sdk_generator.gradle) rather than a Gradle plugin. Its use is deprecated and projects should be updated to use the Gradle plugin as described above.

The metrics.yaml file is parsed at build time and Swift code is generated. Add a new metrics.yaml file to your Xcode project.

Follow these steps to automatically run the parser at build time:

  1. Download the sdk_generator.sh script from the Glean repository:

    https://raw.githubusercontent.com/mozilla/glean/{latest-release}/glean-core/ios/sdk_generator.sh
    

    Important: as above, the {latest-version} placeholder should be replaced with the version number of Glean SDK release used in this project.

  2. Add the sdk_generator.sh file to your Xcode project.

  3. On your application targets' Build Phases settings tab, click the + icon and choose New Run Script Phase. Create a Run Script in which you specify your shell (ex: /bin/sh), add the following contents to the script area below the shell:

    bash $PWD/sdk_generator.sh
    
  4. Add the path to your metrics.yaml and (optionally) pings.yaml under "Input files":

    $(SRCROOT)/{project-name}/metrics.yaml
    $(SRCROOT)/{project-name}/pings.yaml
    
  5. Add the path to the generated code file to the "Output Files":

    $(SRCROOT)/{project-name}/Generated/Metrics.swift
    

    Important: The parser now generates a single file called Metrics.swift (since Glean v31.0.0).

  6. If you are using Git, add the following lines to your .gitignore file:

    .venv/
    {project-name}/Generated
    

    This will ignore files that are generated at build time by the sdk_generator.sh script. They don't need to be kept in version control, as they can be re-generated from your metrics.yaml and pings.yaml files.

Important information about Glean and embedded extensions: Metric collection is a no-op in application extensions and Glean will not run. Since extensions run in a separate sandbox and process from the application, Glean would run in an extension as if it were a completely separate application with different client ids and storage. This complicates things because Glean doesn’t know or care about other processes. Because of this, Glean is purposefully prevented from running in an application extension and if metrics need to be collected from extensions, it's up to the integrating application to pass the information to the base application to record in Glean.

For Python, the metrics.yaml file must be available and loaded at runtime.

If your project is a script (i.e. just Python files in a directory), you can load the metrics.yaml using:

from glean import load_metrics

metrics = load_metrics("metrics.yaml")

# Use a metric on the returned object
metrics.your_category.your_metric.set("value")

If your project is a distributable Python package, you need to include the metrics.yaml file using one of the myriad ways to include data in a Python package and then use pkg_resources.resource_filename() to get the filename at runtime.

from glean import load_metrics
from pkg_resources import resource_filename

metrics = load_metrics(resource_filename(__name__, "metrics.yaml"))

# Use a metric on the returned object
metrics.your_category.your_metric.set("value")

The documentation for your application or library's metrics and pings are written in metrics.yaml and pings.yaml. However, you should also provide human-readable markdown files based on this information, and this is a requirement for Mozilla projects using Glean. For other languages and platforms, this transformation is done automatically as part of the build. However, for Python the integration to automatically generate docs is an additional step.

Glean provides a commandline tool for automatically generating markdown documentation from your metrics.yaml and pings.yaml files. To perform that translation, run glean_parser's translate command:

python3 -m glean_parser translate -f markdown -o docs metrics.yaml pings.yaml

To get more help about the commandline options:

python3 -m glean_parser translate --help

We recommend integrating this step into your project's documentation build. The details of that integration is left to you, since it depends on the documentation tool being used and how your project is set up.

TODO. To be implemented in bug 1643568.

Adding custom pings

Please refer to the custom pings documentation.

Important: as stated before, any new data collection requires documentation and data-review. This is also required for any new metric automatically collected by the Glean SDK.

Parallelism

All of Glean's target languages use a separate worker thread to do most of Glean's work, including any I/O. This thread is fully managed by Glean as an implementation detail. Therefore, users should be free to use the Glean API wherever it is most convenient, without worrying about the performance impact of updating metrics and sending pings.

Since Glean performs disk and networking I/O, it tries to do as much of its work as possible on separate threads and processes. Since there are complex trade-offs and corner cases to support Python parallelism, it is hard to design a one-size-fits-all approach.

Default behavior

When using the Python bindings, most of Glean's work is done on a separate thread, managed by Glean itself. Glean releases the Global Interpreter Lock (GIL), therefore your application's threads should not be in contention with Glean's thread.

Glean installs an atexit handler so the Glean thread can cleanly finish when your application exits. This handler will wait up to 30 seconds for any pending work to complete.

In addition, by default ping uploading is performed in a separate child process. This process will continue to upload any pending pings even after the main process shuts down. This is important for commandline tools where you want to return control to the shell as soon as possible and not be delayed by network connectivity.

Cases where subprocesses aren't possible

The default approach may not work with applications built using PyInstaller or similar tools which bundle an application together with a Python interpreter making it impossible to spawn new subprocesses of that interpreter. For these cases, there is an option to ensure that ping uploading occurs in the main process. To do this, set the allow_multiprocessing parameter on the glean.Configuration object to False.

Using the multiprocessing module

Additionally, the default approach does not work if your application uses the multiprocessing module for parallelism. Glean can not wait to finish its work in a multiprocessing subprocess, since atexit handlers are not supported in that context.
Therefore, if Glean detects that it is running in a multiprocessing subprocess, all of its work that would normally run on a worker thread will run on the main thread. In practice, this should not be a performance issue: since the work is already in a subprocess, it will not block the main process of your application.

Testing metrics

In order to make testing metrics easier 'out of the box', all metrics include a set of test API functions in order to facilitate unit testing. These include functions to test whether a value has been stored, and functions to retrieve the stored value for validation. For more information, please refer to Unit testing Glean metrics.

Adding metadata about your project to the pipeline

In order for data to be collected from your project, its application id must be registered in the pipeline.

File a data engineering bug to enable your product's application id.