Adding Glean to your Python project

This page provides a step-by-step guide on how to integrate the Glean library into a Python project.

Nevertheless this is just one of the required steps for integrating Glean successfully into a project. Check you the full Glean integration checklist for a comprehensive list of all the steps involved in doing so.

Setting up the dependency

We recommend using a virtual environment for your work to isolate the dependencies for your project. There are many popular abstractions on top of virtual environments in the Python ecosystem which can help manage your project dependencies.

The Glean Python SDK currently has prebuilt wheels on PyPI for Windows (i686 and x86_64), Linux/glibc (x86_64) and macOS (x86_64). For other platforms, including BSD or Linux distributions that don't use glibc, such as Alpine Linux, the glean_sdk package will be built from source on your machine. This requires that Cargo and Rust are already installed. The easiest way to do this is through rustup.

Once you have your virtual environment set up and activated, you can install the Glean Python SDK into it using:

$ python -m pip install glean_sdk
Important

Installing Python wheels is still a rapidly evolving feature of the Python package ecosystem. If the above command fails, try upgrading pip:

python -m pip install --upgrade pip
Important

The Glean Python SDK make extensive use of type annotations to catch type related errors at build time. We highly recommend adding mypy to your continuous integration workflow to catch errors related to type mismatches early.

Consuming YAML registry files

For Python, the metrics.yaml file must be available and loaded at runtime.

If your project is a script (i.e. just Python files in a directory), you can load the metrics.yaml using:

from glean import load_metrics

metrics = load_metrics("metrics.yaml")

# Use a metric on the returned object
metrics.your_category.your_metric.set("value")

If your project is a distributable Python package, you need to include the metrics.yaml file using one of the myriad ways to include data in a Python package and then use pkg_resources.resource_filename() to get the filename at runtime.

from glean import load_metrics
from pkg_resources import resource_filename

metrics = load_metrics(resource_filename(__name__, "metrics.yaml"))

# Use a metric on the returned object
metrics.your_category.your_metric.set("value")

Automation steps

Documentation

The documentation for your application or library's metrics and pings are written in metrics.yaml and pings.yaml.

For Mozilla projects, this SDK documentation is automatically published on the Glean Dictionary. For non-Mozilla products, it is recommended to generate markdown-based documentation of your metrics and pings into the repository. For most languages and platforms, this transformation can be done automatically as part of the build. However, for some SDKs the integration to automatically generate docs is an additional step.

The Glean Python SDK provides a commandline tool for automatically generating markdown documentation from your metrics.yaml and pings.yaml files. To perform that translation, run glean_parser's translate command:

python3 -m glean_parser translate -f markdown -o docs metrics.yaml pings.yaml

To get more help about the commandline options:

python3 -m glean_parser translate --help

We recommend integrating this step into your project's documentation build. The details of that integration is left to you, since it depends on the documentation tool being used and how your project is set up.

Metrics linting

Glean includes a "linter" for metrics.yaml and pings.yaml files called the glinter that catches a number of common mistakes in these files.

As part of your continuous integration, you should run the following on your metrics.yaml and pings.yaml files:

python3 -m glean_parser glinter metrics.yaml pings.yaml

Parallelism

Most Glean SDKs use a separate worker thread to do most of its work, including any I/O. This thread is fully managed by the SDK as an implementation detail. Therefore, users should feel free to use the Glean SDKs wherever they are most convenient, without worrying about the performance impact of updating metrics and sending pings.

Since the Glean SDKs perform disk and networking I/O, they try to do as much of their work as possible on separate threads and processes. Since there are complex trade-offs and corner cases to support Python parallelism, it is hard to design a one-size-fits-all approach.

Default behavior

When using the Python SDK, most of the Glean's work is done on a separate thread, managed by the SDK itself. The SDK releases the Global Interpreter Lock (GIL) for most of its operations, therefore your application's threads should not be in contention with the Glean's worker thread.

The Glean Python SDK installs an atexit handler so that its worker thread can cleanly finish when your application exits. This handler will wait up to 30 seconds for any pending work to complete.

By default, ping uploading is performed in a separate child process. This process will continue to upload any pending pings even after the main process shuts down. This is important for commandline tools where you want to return control to the shell as soon as possible and not be delayed by network connectivity.

Cases where subprocesses aren't possible

The default approach may not work with applications built using PyInstaller or similar tools which bundle an application together with a Python interpreter making it impossible to spawn new subprocesses of that interpreter. For these cases, there is an option to ensure that ping uploading occurs in the main process. To do this, set the allow_multiprocessing parameter on the glean.Configuration object to False.

Using the multiprocessing module

Additionally, the default approach does not work if your application uses the multiprocessing module for parallelism. The Glean Python SDK can not wait to finish its work in a multiprocessing subprocess, since atexit handlers are not supported in that context.
Therefore, if the Glean Python SDK detects that it is running in a multiprocessing subprocess, all of its work that would normally run on a worker thread will run on the main thread. In practice, this should not be a performance issue: since the work is already in a subprocess, it will not block the main process of your application.