Enabling data to be ingested by the data platform

This page provides a step-by-step guide on how to enable data from your product to be ingested by the data platform.

This is just one of the required steps for integrating Glean successfully into a product. Check the full Glean integration checklist for a comprehensive list of all the steps involved in doing so.

Requirements

  • GitHub Workflows

Add your product to probe scraper

At least one week before releasing your product, file a data engineering bug to enable your product's application id. This will result in your product being added to probe scraper's repositories.yaml.

Validate and publish metrics

After your product has been enabled, you must submit commits to probe scraper to validate and publish metrics. Metrics will only be published from branches defined in probe scraper's repositories.yaml, or the Git default branch if not explicitly configured. This should happen on every CI run to the specified branches. Nightly jobs will then automatically add published metrics to the Glean Dictionary and other data platform tools.

Enable the GitHub Workflow by creating a new file .github/workflows/glean-probe-scraper.yml with the following content:

---
name: Glean probe-scraper
on: [push, pull_request]
jobs:
  glean-probe-scraper:
    uses: mozilla/probe-scraper/.github/workflows/glean.yaml@main

Add your library to probe scraper

At least one week before releasing your product, file a data engineering bug to add your library to probe scraper and be scraped for metrics as a dependency of another product. This will result in your library being added to probe scraper's repositories.yaml.