Introduction

https://github.com/mozilla/ecosystem-test-scripts/

Command Line Tool

COMMANDS


install

Install dependencies.

USAGE

make install

SEE ALSO

  • clean -- Clean up installation and cache files.

clean

Clean up installation and cache files.

USAGE

make clean

SEE ALSO


check

Run linting, formatting, security, and type checks.

This script uses the following tools:

  • ruff -- check for linting issues and code formatting.
  • bandit -- check for security issues.
  • mypy -- check for type issues.

USAGE

make check

SEE ALSO


format

Apply formatting.

This script will use ruff to automatically fix linting issues and format the code.

USAGE

make format

SEE ALSO

  • check -- Run linting, formatting, security, and type checks.

test

Run tests.

USAGE

make test

SEE ALSO


test_coverage

Run tests with coverage reporting.

USAGE

make test_coverage

SEE ALSO

test_coverage_html

Run tests and generate HTML coverage report.

USAGE

make test_coverage_html

SEE ALSO


run_circleci_scraper

Run the CircleCI scraper.

USAGE

make run_circleci_scraper

In order to use this command, you need to make sure you set your personal CircleCI token in your local config.ini file, as seen below:

[circleci_scraper]
token = <YoUr_tOkEn_hErE>

If the days_of_data option is not present in the config.ini file, the default of "all available data" will be fetched. If you want to customize how many previous days are fetched, you can set the days_of_data option in your local config.ini file:

[circleci_scraper]
;(optional) Get data starting from x days past from now (default: all available data)
days_of_data = 7

If you have the previous day's data stored locally, the cached data will be used and not re-fetched from CircleCI.

SEE ALSO


run_google_sheet_uploader

Run the Google Sheet Uploader.

USAGE

make run_google_sheet_uploader

run_metric_reporter

Run the Test Metric Reporter.

USAGE

make run_metric_reporter

SEE ALSO


run_report_merger

Run the Report Merger.

USAGE

make run_report_merger

Developer Setup

Below are step-by-step instructions on how to set up a development environment in order to be able to successfully contribute to and execute the ecosystem test scripts.

1. Clone the ecosystem-test-scripts repository

The ecosystem test scripts are hosted on the Mozilla Github and can be cloned using the method of your choice (see Cloning a repository). Contributors should follow the Contributing Guidelines and Community Participation Guidelines for the repository.

2. Create a CircleCI API Token

In order to execute the circleci_scraper script, a personal CircleCI API Token is needed. To create a token, follow the creating a personal api token CircleCI instructions. Store the key value in a safe place.

DO NOT SHARE YOUR CIRCLECI API TOKEN

3. Copy the Google Sheet Service Account JSON Key

The google_sheet_uploader script is set up using the ecosystem-test-eng GCP project with the metric-gsheet service account. In order to execute the google_sheet_uploader script, a key for this service account, in the form of a JSON file, needs to be copied from the 1Password Ecosystem Test Engineering Team Vault into the root directory of the ecosystem-test-scripts project.

4. Set up the config.ini

All settings for the ecosystem-test-scripts are defined in the config.ini file. To set up a local config.ini file:

4.1 Make a copy of the config.ini.sample file found in the root directory of the ecosystem-test-scripts project and rename it to config.ini
4.2 Under the [circleci_scraper] section of the file, set the token value to the CircleCI API key created in step 2

5. Copy the latest raw data locally

By default, CircleCI has a retention policy of 30 days for artifacts and 90 days for uploaded test results; However, we have over a years worth of data gathered for some projects. In order to produce reports with full trend data and reduce scraping time, copy the latest raw_data from the ETE team folder to the root directory of the ecosystem-test-scripts project.

6. Set up the python virtual environment

This project uses Poetry for dependency management in conjunction with a pyproject.toml file. While you can use virtualenv to set up the dev environment, it is recommended to use pyenv and pyenv-virtualenv, as they work nicely with Poetry. Once poetry is installed, dependencies can be installed using the following Make command from the root directory:

make install

For more information on Make commands, run:

make help

7. Start Developing!

Metric Interpretation Guide

The purpose of this document is to help individuals understand and interpret the metrics that represent the health of their test suites and ensure that tests contribute to rapid development and high product quality. Test metrics provide insights into test performance, helping teams address potential issues early and monitor improvement efforts. While these metrics offer valuable data about the health of test suites, they do not necessarily measure the effectiveness of the test cases themselves.

Test Suite Size & Success Rates

Supported Test Frameworks: jest, mocha, nextest, playwright, pytest, tap
Supported CI: CircleCI

Test Suite Size

Test suite size refers to the number of tests in a suite and serves as a control measure. Unexplained changes, such as sudden growth or shrinkage, may indicate test runner issues or attempts to manipulate the scope or quality of the suite. Test suite size should correspond to the state of the product. For example, a product under active development should show a gradual increase in the size of its test suite, while a product in maintenance should exhibit more stable trends.

Success Rates

Success rates provide a quick indication of the test suite's health. A low success rate or high failure rate signals potential quality issues, either in the product or within the test suite itself. This metric can be tracked on both a test-by-test basis and for the entire test suite.

Average Success Rates

To avoid noise from isolated failures and spot trends more easily, it's helpful to calculate averaged success rates over time. This allows teams to act early if trends toward failure begin to emerge, preventing the test suite from becoming unreliable or mistrusted. Success rate averages are calculated as:

100 x (Successful Runs / (All Runs - Cancelled Runs))

These averages can be calculated over 30-day, 60-day, and 90-day periods, with the 90-day trend being preferred. Average success rates are typically interpreted as follows, though teams may adjust thresholds based on their specific needs:

ThresholdInterpretation
>= 95%Healthy - Tests pass the majority of the time
90% - 95%Caution - Tests show signs of instability, requiring investigation
< 90%Critical - Tests are faulty and need intervention

Time Measurements

Supported Test Frameworks: jest, mocha, nextest, playwright, pytest
Supported CI: CircleCI

Time measurements track how long it takes for tests to run. Ideally, these times should be proportional to the size of the test suite and remain stable over time. Significant increases or variations in execution time may indicate performance issues or inefficiencies within the test suite. Monitoring execution times allows teams to identify and address bottlenecks to keep test suites efficient.

Run Time

The cumulative time of all test runs in a suite.

Execution Time

The total time taken for the test suite to execute. If tests are not run in parallel, the execution time should match the run time.

Job Time

The time taken for the test job to complete in CI. Job time thresholds are typically interpreted as follows:

ThresholdInterpretation
> 10mSlow - The test suite may require optimization
<= 10mFast - The test suite runs within an acceptable time frame

Coverage Metrics

Supported Coverage Frameworks: pytest-cov, llvm-cov

Coverage metrics measure the percentage of the codebase covered by tests. They help identify untested areas of the code, allowing teams to determine whether critical paths are adequately covered.

While high coverage percentages are generally good, they don’t always guarantee that the tests are meaningful. The quality and relevance of tests should be balanced with coverage goals. The following thresholds for line coverage provide general guidance but can be adjusted according to project needs:

ThresholdInterpretation
>= 95%High - Potential for metric gaming or diminishing returns.
* For teams using pytest-cov, the line excluded measure may offer further insight into coverage gaming.
80% - 95%Good - Suitable for high-risk or high-incident projects
60% - 79%Acceptable - Suitable for low-risk or low-incident projects
< 60%Low - Coverage should be improved

Skip Rates

Supported Test Frameworks: jest, mocha, nextest, playwright, pytest
Supported CI: CircleCI

Skip rates indicate how often tests are temporarily excluded from execution. While skipping tests can be a necessary short-term solution to prevent flaky tests from disrupting workflows, high or sustained skip rates can signal deeper issues with the test suite's sustainability.

Long-term skips may indicate that tests have fallen into disrepair, and an increasing skip rate can point to team capacity or prioritization problems. Monitoring skip rates ensures that skipped tests are revisited and resolved promptly. Thresholds for skip rates are typically interpreted as follows:

ThresholdInterpretation
> 2%Critical - Test coverage is compromised, requiring immediate intervention
1% - 2%Caution - Test coverage is at risk, and the suite may become prone to silent failures
<= 1%Healthy - Most of the test suite is running, ensuring comprehensive coverage

Note: Playwright offers both skipme and fixme annotations, allowing for further refinement of this metric.

Retry Rates

Supported Test Frameworks: playwright

Retry rates track how often tests are re-executed following a failure. While retries can help address transient issues, such as network errors, elevated retry rates may indicate flakiness in the test suite or performance regressions in the product. High retry rates can increase execution times and negatively impact developer workflows. Monitoring retry rates helps teams identify and fix unstable tests, ensuring predictable test execution.

Metric Update Procedure

Please follow the steps below to update the following Google Sheets:

The sheets are typically updated on Monday mornings (North America ET/PT) to ensure the values are available for team check-in meetings. The process of updating all three Google Sheets can take up to 10 minutes.

Prerequisites

Before updating the test metrics, ensure that:

  • Your development environment is setup with proper permissions (see Developer Setup).
  • You are on the latest version of the main branch
  • Your config.ini file in the root directory is up to date
  • You have the latest raw data in the ecosystem-test-scripts root directory.
    • The raw data should be found in the test_result_dir specified in the config.ini file and is typically named raw_data.
    • The latest raw data is available in the ETE team folder

1. Scrape for New Raw Test Data

To retrieve the latest test and coverage results for local parsing, execute the following command from the ecosystem-test-scripts root directory:

make run_circleci_scraper

Notes:

  • Set the days_of_data option in the config.ini file to the appropriate number of days. This is typically 8 days since the update cadence is weekly on Mondays.

2. Create new CSV reports

To generate CSV reports with the latest test results, test averages, and test coverages, execute the following command from the ecosystem-test-scripts root directory:

make run_metric_reporter

Notes:

  • The reports will be output to the reports_dir specified in the config.ini file. Typically, this is a reports directory in the ecosystem-test-scripts root.
  • Average reports are produced only after 90 days of data is available. Therefore, some test suites may not have these reports.
  • Coverage reports are produced only for Autopush-rs unit tests and Merino-py unit and integration tests.

3. Import CSVs to Google Spreadsheets & Update Trend Table Dates

3.1 Import CSVs to Google Spreadsheets

To import the generated CSVs into the 3 different Google Spreadsheets, execute the following make command:

make run_google_sheet_uploader

This will go through the reports folder and import the CSV files into the right Google spreadsheet and tab. This will happen based on the mapping in the config.ini file.

Notes:

  • If the report being imported is a results or coverage report. It may be necessary to convert the type of the date column to 'plain text' so that the graphs display at an even cadence.
    • Highlight the Date column and in the top menu select Format > Number > Plain text

3.2 Update Trend Table Dates

  • At the beginning of each week, in the Weekly Trends table:

    • Update the End date to the current monday date
    • Update the Start date to the last monday date
    • Increment the week number
    • Example:
      • For week 40 in 2024, the Start value is 2024-09-30 and the End value is 2024-10-07
  • At the end of the quarter, in the Quarterly Trends table

    • Update the End date to the last date of the quarter
    • Update the Start date to the first date of the quarter
    • Increment the quarter number
    • Example:
      • For Q3 in 2024, the Start value is 2024-07-01 and the End value is 2024-09-30

4. Backup the latest test_result_dir to the ETE team folder

Compress the contents of the test_result_dir, typically called 'raw_data,' and replace the file located in the ETE team folder.

Contributors

The ecosystem-test-scripts repository is owned by the Ecosystem Test Engineering Team.

Contributors graph. Avatars provided by https://contrib.rocks

See https://github.com/mozilla/ecosystem-test-scripts/blob/main/CONTRIBUTING.md for contribution guidelines.