## Glean integration checklist

The Glean integration checklist can help to ensure your Glean SDK-using product is meeting all of the recommended guidelines.

Products (applications or libraries) using the Glean SDK to collect telemetry must:

1. Integrate the Glean SDK into the build system. Since the Glean SDK does some code generation for your metrics at build time, this requires a few more steps than just adding a library.

2. Include the markdown-formatted documentation generated from the metrics.yaml and pings.yaml files in the project's documentation.

3. Go through data review process for all newly collected data.

4. Ensure that telemetry coming from automated testing or continuous integration is either not sent to the telemetry server or tagged with the automation tag using the sourceTag feature.

Additionally, applications (but not libraries) must:

1. File a data engineering bug to enable your product's application id.

2. Provide a way for users to turn data collection off (e.g. providing settings to control Glean.setUploadEnabled()). The exact method used is application-specific.

## Usage

#### Setting up the dependency

The Glean SDK is published on maven.mozilla.org. To use it, you need to add the following to your project's top-level build file, in the allprojects block (see e.g. Glean SDK's own build.gradle):

repositories {
maven {
url "https://maven.mozilla.org/maven2"
}
}


Each module that uses Glean SDK needs to specify it in its build file, in the dependencies block. Add this to your Gradle configuration:

implementation "org.mozilla.components:service-glean:{latest-version}"


Important: the {latest-version} placeholder in the above link should be replaced with the version of Android Components used by the project.

The Glean SDK is released as part of android-components. Therefore, it follows android-components' versions. The android-components release page can be used to determine the latest version.

For example, if version 33.0.0 is used, then the include directive becomes:

implementation "org.mozilla.components:service-glean:33.0.0"


Size impact on the application APK: the Glean SDK APK ships binary libraries for all the supported platforms. Each library file measures about 600KB. If the final APK size of the consuming project is a concern, please enable ABI splits.

#### Requirements

• Python >= 3.6.

#### Setting up the dependency

The Glean SDK can be consumed through Carthage, a dependency manager for macOS and iOS. For consuming the latest version of the Glean SDK, add the following line to your Cartfile:

github "mozilla/glean" "{latest-version}"


Important: the {latest-version} placeholder should be replaced with the version number of the latest Glean SDK release. You can find the version number on the release page.

Then check out and build the new dependency:

carthage update --platform iOS


#### Integrating with the build system

For integration with the build system you can follow the Carthage Quick Start steps.

1. After building the dependency one drag the built .framework binaries from Carthage/Build/iOS into your application's Xcode project.

2. On your application targets' Build Phases settings tab, click the + icon and choose New Run Script Phase. If you already use Carthage for other dependencies, extend the existing step. Create a Run Script in which you specify your shell (ex: /bin/sh), add the following contents to the script area below the shell:

/usr/local/bin/carthage copy-frameworks

3. Add the path to the Glean framework under "Input Files":

$(SRCROOT)/Carthage/Build/iOS/Glean.framework  4. Add the paths to the copied framework to the "Output Files": $(BUILT_PRODUCTS_DIR)/$(FRAMEWORKS_FOLDER_PATH)/Glean.framework  We recommend using a virtual environment for your work to isolate the dependencies for your project. There are many popular abstractions on top of virtual environments in the Python ecosystem which can help manage your project dependencies. The Glean SDK Python bindings currently have prebuilt wheels on PyPI for Windows (i686 and x86_64), Linux (x86_64) and macOS (x86_64). For other platforms, the glean_sdk package will be built from source on your machine. This requires that Cargo and Rust are already installed. The easiest way to do this is through rustup. Once you have your virtual environment set up and activated, you can install the Glean SDK into it using: $ python -m pip install glean_sdk


The Glean SDK Python bindings make extensive use of type annotations to catch type related errors at build time. We highly recommend adding mypy to your continuous integration workflow to catch errors related to type mismatches early.

TODO. To be implemented in bug 1643568.

All metrics that your project collects must be defined in a metrics.yaml file.

Important: as stated before, any new data collection requires documentation and data-review. This is also required for any new metric automatically collected by the Glean SDK.

In order for the Glean SDK to generate an API for your metrics, two Gradle plugins must be included in your build:

The Glean Gradle plugin is distributed through Mozilla's Maven, so we need to tell your build where to look for it by adding the following to the top of your build.gradle:

buildscript {
repositories {
// Include the next clause if you are tracking snapshots of android components
maven {
url "https://snapshots.maven.mozilla.org/maven2"
}
maven {
url "https://maven.mozilla.org/maven2"
}

dependencies {
}
}
}


Important: as above, the {android-components-version} placeholder in the above link should be replaced with the version number of android components used in your project.

The JetBrains Python plugin is distributed in the Gradle plugin repository, so it can be included with:

plugins {
id "com.jetbrains.python.envs" version "0.0.26"
}


Right before the end of the same file, we need to apply the Glean Gradle plugin. Set any additional parameters to control the behavior of the Glean Gradle plugin before calling apply plugin.

// Optionally, set any parameters to send to the plugin.
ext.gleanGenerateMarkdownDocs = true


Note: Earlier versions of Glean used a Gradle script (sdk_generator.gradle) rather than a Gradle plugin. Its use is deprecated and projects should be updated to use the Gradle plugin as described above.

Note: The Glean Gradle plugin has limited support for offline builds of applications that use the Glean SDK.

The metrics.yaml file is parsed at build time and Swift code is generated. Add a new metrics.yaml file to your Xcode project.

Follow these steps to automatically run the parser at build time:

1. Download the sdk_generator.sh script from the Glean repository:

https://raw.githubusercontent.com/mozilla/glean/{latest-release}/glean-core/ios/sdk_generator.sh


Important: as above, the {latest-version} placeholder should be replaced with the version number of Glean SDK release used in this project.

2. Add the sdk_generator.sh file to your Xcode project.

3. On your application targets' Build Phases settings tab, click the + icon and choose New Run Script Phase. Create a Run Script in which you specify your shell (ex: /bin/sh), add the following contents to the script area below the shell:

bash $PWD/sdk_generator.sh  4. Add the path to your metrics.yaml and (optionally) pings.yaml under "Input files": $(SRCROOT)/{project-name}/metrics.yaml
$(SRCROOT)/{project-name}/pings.yaml  5. Add the path to the generated code file to the "Output Files": $(SRCROOT)/{project-name}/Generated/Metrics.swift


Important: The parser now generates a single file called Metrics.swift (since Glean v31.0.0).

6. If you are using Git, add the following lines to your .gitignore file:

.venv/
{project-name}/Generated


This will ignore files that are generated at build time by the sdk_generator.sh script. They don't need to be kept in version control, as they can be re-generated from your metrics.yaml and pings.yaml files.

Important information about Glean and embedded extensions: Metric collection is a no-op in application extensions and Glean will not run. Since extensions run in a separate sandbox and process from the application, Glean would run in an extension as if it were a completely separate application with different client ids and storage. This complicates things because Glean doesn’t know or care about other processes. Because of this, Glean is purposefully prevented from running in an application extension and if metrics need to be collected from extensions, it's up to the integrating application to pass the information to the base application to record in Glean.

For Python, the metrics.yaml file must be available and loaded at runtime.

If your project is a script (i.e. just Python files in a directory), you can load the metrics.yaml using:

from glean import load_metrics

# Use a metric on the returned object
metrics.your_category.your_metric.set("value")


If your project is a distributable Python package, you need to include the metrics.yaml file using one of the myriad ways to include data in a Python package and then use pkg_resources.resource_filename() to get the filename at runtime.

from glean import load_metrics
from pkg_resources import resource_filename

# Use a metric on the returned object
metrics.your_category.your_metric.set("value")


### Automation steps

#### Documentation

The documentation for your application or library's metrics and pings are written in metrics.yaml and pings.yaml. However, you should also provide human-readable markdown files based on this information, and this is a requirement for Mozilla projects using the Glean SDK. For other languages and platforms, this transformation is done automatically as part of the build. However, for Python the integration to automatically generate docs is an additional step.

The Glean SDK provides a commandline tool for automatically generating markdown documentation from your metrics.yaml and pings.yaml files. To perform that translation, run glean_parser's translate command:

python3 -m glean_parser translate -f markdown -o docs metrics.yaml pings.yaml


To get more help about the commandline options:

python3 -m glean_parser translate --help


We recommend integrating this step into your project's documentation build. The details of that integration is left to you, since it depends on the documentation tool being used and how your project is set up.

#### Metrics linting

Glean includes a "linter" for metrics.yaml and pings.yaml files called the glinter that catches a number of common mistakes in these files.

As part of your continuous integration, you should run the following on your metrics.yaml and pings.yaml files:

python3 -m glean_parser glinter metrics.yaml pings.yaml


A new build target needs to be added to the project csproj file in order to generate the metrics and pings APIs from the registry files (e.g. metrics.yaml, pings.yaml).

<Project>
<!-- ... other directives ... -->

<Target Name="GleanIntegration" BeforeTargets="CoreCompile">
<ItemGroup>
<!--
Note that the two files are not required: Glean will work just fine
with just the 'metrics.yaml'. A 'pings.yaml' is only required if custom
pings are defined.
Please also note that more than one metrics file can be added.
-->
<GleanRegistryFiles Include="metrics.yaml" />
<GleanRegistryFiles Include="pings.yaml" />
</ItemGroup>
<!-- This is what actually runs the parser. -->
<GleanParser RegistryFiles="@(GleanRegistryFiles)" OutputPath="$(IntermediateOutputPath)Glean" Namespace="csharp.GleanMetrics" /> <!-- And this adds the generated files to the project, so that they can be found by the compiler and Intellisense. --> <ItemGroup> <Compile Include="$(IntermediateOutputPath)Glean\**\*.cs" />
</ItemGroup>
</Target>
</Project>


This is using the Python 3 interpreter found in PATH under the hood. The GLEAN_PYTHON environment variable can be used to provide the location of the Python 3 interpreter.

Please refer to the custom pings documentation.

Important: as stated before, any new data collection requires documentation and data-review. This is also required for any new metric automatically collected by the Glean SDK.

### Parallelism

All of the Glean SDK's target languages use a separate worker thread to do most of its work, including any I/O. This thread is fully managed by the Glean SDK as an implementation detail. Therefore, users should feel free to use the Glean SDK wherever it is most convenient, without worrying about the performance impact of updating metrics and sending pings.

Since the Glean SDK performs disk and networking I/O, it tries to do as much of its work as possible on separate threads and processes. Since there are complex trade-offs and corner cases to support Python parallelism, it is hard to design a one-size-fits-all approach.

#### Default behavior

When using the Python bindings, most of the Glean SDK's work is done on a separate thread, managed by the Glean SDK itself. The Glean SDK releases the Global Interpreter Lock (GIL) for most of its operations, therefore your application's threads should not be in contention with the Glean SDK's worker thread.

The Glean SDK installs an atexit handler so that its worker thread can cleanly finish when your application exits. This handler will wait up to 30 seconds for any pending work to complete.

By default, ping uploading is performed in a separate child process. This process will continue to upload any pending pings even after the main process shuts down. This is important for commandline tools where you want to return control to the shell as soon as possible and not be delayed by network connectivity.

#### Cases where subprocesses aren't possible

The default approach may not work with applications built using PyInstaller or similar tools which bundle an application together with a Python interpreter making it impossible to spawn new subprocesses of that interpreter. For these cases, there is an option to ensure that ping uploading occurs in the main process. To do this, set the allow_multiprocessing parameter on the glean.Configuration object to False.

#### Using the multiprocessing module

Additionally, the default approach does not work if your application uses the multiprocessing module for parallelism. The Glean SDK can not wait to finish its work in a multiprocessing subprocess, since atexit handlers are not supported in that context.
Therefore, if the Glean SDK detects that it is running in a multiprocessing subprocess, all of its work that would normally run on a worker thread will run on the main thread. In practice, this should not be a performance issue: since the work is already in a subprocess, it will not block the main process of your application.

### Testing metrics

In order to make testing metrics easier 'out of the box', all metrics include a set of test API functions in order to facilitate unit testing. These include functions to test whether a value has been stored, and functions to retrieve the stored value for validation. For more information, please refer to Unit testing Glean metrics.