Guide

Basic experiment: get the data & crunch the stats

Let’s start by analysing a straightforward pref-flip experiment on the desktop browser.

If we’re using a Colab notebook, we begin by installing the latest version of mozanalysis into the notebook. It’s a good idea to specify the specific version (from pypi), for reproducibility:

!pip install mozanalysis=='{current_version}'

We take the per-notebook daily trudge:

from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Then we import the necessary classes for getting the data, and for analysing the data, and for interacting with BigQuery:

import mozanalysis.bayesian_stats.binary as mabsbin
from mozanalysis.experiment import Experiment
from mozanalysis.bq import BigQueryContext

And get a mozanalysis.bq.BigQueryContext (a client, and some config)

bq_context = BigQueryContext(
    dataset_id='your_dataset_id',  # e.g. mine's 'flawrence'
    project_id=...,  # Defaults to moz-fx-data-bq-data-science
)

If you do not have a dataset, you will need to create one. Mozanalysis will save data into this dataset - if you want to access them directly (i.e. not through mozanalysis), they live at project_id.dataset_id.table_name , where table_name will be printed by mozanalysis when it saves/retrieves the data.

To bill queries to a project other than moz-fx-data-bq-data-science, pass the project_id as an argument when initializing your mozanalysis.bq.BigQueryContext.

For querying data, the internal approach of mozanalysis is to start by obtaining a list of who was enrolled in what branch, when. Then we try to quantify what happened to each client: for a given analysis window (a specified period of time defined with respect to the client’s enrollment date), we seek to obtain a value for each client for each metric. We end up with a results (pandas) DataFrame with one row per client and one column per metric.

We start by instantiating our mozanalysis.experiment.Experiment object:

exp = Experiment(
    experiment_slug='pref-fingerprinting-protections-retention-study-release-70',
    start_date='2019-10-29',
    num_dates_enrollment=8,
    app_name="firefox_desktop"
)

start_date is the submission_date of the first enrollment (submission_date is in UTC). If you intended to study one week’s worth of enrollments, then set num_dates_enrollment=8: Normandy experiments typically go live in the evening UTC-time, so 8 days of data is a better approximation than 7.

We now gather a list of who was enrolled in what branch and when, and try to quantify what happened to each client. In many cases, the metrics in which you’re interested will already be in a metrics library, metric-hub. If not, then you can define your own—see mozanalysis.metrics.Metric for examples—and ideally submit a PR to add them to metric-hub for the next experiment. To load a Metric from metric-hub, for example:

from mozanalysis.config import ConfigLoader
active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")

In this example, we’ll compute four metrics from metric-hub:

  • active hours

  • uri count

  • ad clicks

  • search count

As it happens, the first three metrics all come from the clients_daily dataset, whereas “search count” comes from search_clients_daily. These details are taken care of in the metric-hub definitions so that we don’t have to think about them here.

A metric must be computed over some analysis window, a period of time defined with respect to the enrollment date. We could use mozanalysis.experiment.Experiment.get_single_window_data() to compute our metrics over a specific analysis window. But here, let’s create time series data: let’s have an analysis window for each of the first three weeks of the experiment, and measure the data for each of these analysis windows:

ts_res = exp.get_time_series_data(
    bq_context=bq_context,
    metric_list=[
        active_hours,
        uri_count,
        ad_clicks,
        search_count,
    ],
    last_date_full_data='2019-11-28',
    time_series_period='weekly'
)

The first two arguments to mozanalysis.experiment.Experiment.get_time_series_data() should be clear by this point. last_date_full_data is the last date for which we want to use data. For a currently-running experiment, it would typically be yesterday’s date (we have incomplete data for incomplete days!).

Metrics are pulled in from metric-hub based on the provided metric slugs.

time_series_period can be 'daily', 'weekly' or '28_day'. A 'weekly' time series neatly sidesteps/masks weekly seasonality issues: most of the experiment subjects will enroll within a day of the experiment launching - typically a Tuesday, leading to 'daily' time series reflecting a non-uniform convolution of the metrics’ weekly seasonalities with the uneven enrollment numbers across the week.

mozanalysis.experiment.Experiment.get_time_series_data() returns a mozanalysis.experiment.TimeSeriesResult object, which can return DataFrames keyed by the start of their analysis windows (measured in days after enrollment):

>>> ts_res.keys()
[0, 7, 14]

If RAM permits, we can dump all the results into a dict of DataFrames keyed by the start of their analysis windows:

res = dict(ts_res.items(bq_context))

Each value in res is a pandas DataFrame in “the standard format”, with one row per enrolled client and one column per metric.

Otherwise you might want to load one analysis window at a time, by calling ts_res.get(bq_context, analysis_window_start) for each analysis window in ts_res.keys(), processing the resulting DataFrame, then discarding the DataFrame from RAM before moving onto the next analysis window.

Here are the columns of each result DataFrame:

>>> res[7].columns
Index(['branch', 'enrollment_date', 'num_enrollment_events', 'active_hours',
       'uri_count', 'clients_daily_has_contradictory_branch',
       'clients_daily_has_non_enrolled_data', 'ad_clicks', 'search_count'],
      dtype='object')

The ‘branch’ column contains the client’s branch:

>>> res[7].branch.unique()
array(['treatment', 'control'], dtype=object)

And we can do the usual pandas DataFrame things - e.g. calculate the mean active hours per branch:

>>> res[7].groupby('branch').active_hours.mean()
branch
Cohort_1    6.246536
Cohort_2    6.719880
Cohort_3    6.468948
Name: active_hours, dtype: float64

Suppose we want to see whether the user had any active hours in their second week in the experiment. This information can be calculated from the active_hours metric - we add this as a column to the results pandas DataFrame, then use mozanalysis.bayesian_stats.binary to analyse this data:

res[7]['active_hours_gt_0'] = res[7]['active_hours'] > 0

retention_week_2 = mabsbin.compare_branches(res[7], 'active_hours_gt_0', ref_branch_label='Cohort_1')

Like most of the stats in mozanalysis, mozanalysis.bayesian_stats.binary.compare_branches() accepts a pandas DataFrame in “the standard format” and returns credible (or confidence) intervals for various quantities. It expects the reference branch to be named ‘control’; since this experiment used non-standard branch naming, we need to tell it that the control branch is named ‘Cohort_1’. The function returns credible intervals (CIs) for the fraction of active users in each branch.:

>>> retention_week_2['individual']
{'Cohort_1':
     0.005    0.733865
     0.025    0.734265
     0.5      0.735536
     0.975    0.736803
     0.995    0.737201
     mean     0.735535
     dtype: float64,
 'Cohort_2':
     0.005    0.732368
     0.025    0.732769
     0.5      0.734041
     0.975    0.735312
     0.995    0.735710
     mean     0.734041
     dtype: float64,
 'Cohort_3':
     0.005    0.732289
     0.025    0.732690
     0.5      0.733962
     0.975    0.735232
     0.995    0.735630
     mean     0.733962
     dtype: float64}

(output re-wrapped for clarity)

For example, we can see that the fraction of users in Cohort_2 with >0 active hours in week 2 has an expectation value of 0.734, with a 95% CI of (0.7328, 0.7353).

And the function also returns credible intervals for the uplift in this quantity for each branch with respect to a reference branch:

>>> retention_week_2['comparative']
{'Cohort_3':
    rel_uplift    0.005   -0.005222
                  0.025   -0.004568
                  0.5     -0.002173
                  0.975    0.000277
                  0.995    0.001056
                  exp     -0.002166
    abs_uplift    0.005   -0.003850
                  0.025   -0.003365
                  0.5     -0.001598
                  0.975    0.000204
                  0.995    0.000774
                  exp     -0.001594
    max_abs_diff  0.95     0.003092
    prob_win      NaN      0.041300
    dtype: float64,
 'Cohort_2':
    rel_uplift    0.005   -0.005215
                  0.025   -0.004502
                  0.5     -0.002065
                  0.975    0.000359
                  0.995    0.001048
                  exp     -0.002066
    abs_uplift    0.005   -0.003840
                  0.025   -0.003314
                  0.5     -0.001520
                  0.975    0.000264
                  0.995    0.000769
                  exp     -0.001520
    max_abs_diff  0.95     0.003043
    prob_win      NaN      0.046800
    dtype: float64}

(output re-wrapped for clarity)

rel_uplift contains quantities related to the relative uplift of a branch with respect to the reference branch (as given by ref_branch_label); for example, assuming a uniform prior, there is a 95% probability that Cohort_3 had between 0.457% fewer and 0.028% more users with >0 active hours in the second week, compared to Cohort_1. abs_uplift refers to the absolute uplifts, and prob_win gives the probability that the branch is better than the reference branch.

Since mozanalysis is designed around this “standard format”, you can pass any of the values in res to any of the statistics functions, as long as the statistics are suited to the column’s type (i.e. binary vs real-valued data):

import mozanalysis.bayesian_stats.binary as mabsbin
retention_week_2 = mabsbin.compare_branches(res[7], 'active_hours_gt_0')

import mozanalysis.frequentist_stats.bootstrap as mafsboot
boot_uri_week_1 = mafsboot.compare_branches(res[0], 'uri_count', threshold_quantile=0.9999)

import mozanalysis.bayesian_stats.survival_func as mabssf
sf_search_week_2 = mabssf.compare_branches(res[7], 'search_count')

dscontrib.flawrence.plot_experiments has some (shaky) support for visualising stats over time series experiment results.

Get the data: cookbook

Time series (of analysis windows)

Condensing the above example for simpler copying and pasting:

!pip install mozanalysis=='{current_version}'

from google.colab import auth
auth.authenticate_user()
print('Authenticated')

import mozanalysis.bayesian_stats.binary as mabsbin
from mozanalysis.experiment import Experiment
from mozanalysis.bq import BigQueryContext
from mozanalysis.config import ConfigLoader

bq_context = BigQueryContext(dataset_id='your_dataset_id')

active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")
uri_count = ConfigLoader.get_metric(slug="uri_count", app_name="firefox_desktop")
ad_clicks = ConfigLoader.get_metric(slug="ad_clicks", app_name="firefox_desktop")
search_count = ConfigLoader.get_metric(slug="search_count", app_name="firefox_desktop")

ts_res = exp.get_time_series_data(
    bq_context=bq_context,
    metric_list=[
        active_hours,
        uri_count,
        ad_clicks,
        search_count,
    ],
    last_date_full_data='2019-11-28',
    time_series_period='weekly'
)

res = dict(ts_res.items(bq_context))

One analysis window

If we’re only interested in users’ (say) second week in the experiment, then we don’t need to get a full time series.

!pip install mozanalysis=='{current_version}'

from google.colab import auth
auth.authenticate_user()
print('Authenticated')

import mozanalysis.bayesian_stats.binary as mabsbin
from mozanalysis.experiment import Experiment
from mozanalysis.bq import BigQueryContext
from mozanalysis.config import ConfigLoader

bq_context = BigQueryContext(dataset_id='your_dataset_id')

active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")

res = exp.get_single_window_data(
    bq_context=bq_context,
    metric_list=[
        active_hours,
    ],
    last_date_full_data='2019-01-07',
    analysis_start_days=7,
    analysis_length_days=7
)

last_date_full_data is less important for mozanalysis.experiment.Experiment.get_single_window_data() than for mozanalysis.experiment.Experiment.get_time_series_data(): while last_date_full_data determines the length of the time series, here it simply sanity checks that the specified analysis window doesn’t stretch into the future for any enrolled users.

Crunch the stats

Each stats technique has a module in mozanalysis.bayesian_stats or mozanalysis.frequentist_stats, and a function compare_branches(); for example mozanalysis.bayesian_stats.binary.compare_branches(). This function accepts a pandas DataFrame in “the standard format”, and must be passed the name of the column containing the metric to be studied.

import mozanalysis.bayesian_stats.binary as mabsbin
import mozanalysis.bayesian_stats.bayesian_bootstrap as mabsboot
import mozanalysis.bayesian_stats.survival_func as mabssf
import mozanalysis.frequentist_stats.bootstrap as mafsboot

res_from_ts[7]['active_hours_gt_0'] = res_from_ts[7].active_hours_gt_0 > 0
mabsbin.compare_branches(res_from_ts[7], 'active_hours_gt_0')
mabsbin.compare_branches(res_from_ts[7], 'active_hours_gt_0', ref_branch_label='Cohort_1')

gpcd_res['active_hours_gt_0'] = gpcd_res.active_hours_gt_0 > 0
mabsbin.compare_branches(gpcd_res, 'active_hours_gt_0')

mafsboot.compare_branches(gpcd_res, 'active_hours', threshold_quantile=0.9999)

sf_search_week_2 = mabssf.compare_branches(gpcd_res, 'search_count')