Guide
Basic experiment: get the data & crunch the stats
Let’s start by analysing a straightforward pref-flip experiment on the desktop browser.
If we’re using a Colab notebook, we begin by installing the latest version of mozanalysis
into the notebook. It’s a good idea to specify the specific version (from pypi), for reproducibility:
!pip install mozanalysis=='{current_version}'
We take the per-notebook daily trudge:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Then we import the necessary classes for getting the data, and for analysing the data, and for interacting with BigQuery:
import mozanalysis.bayesian_stats.binary as mabsbin
from mozanalysis.experiment import Experiment
from mozanalysis.bq import BigQueryContext
And get a mozanalysis.bq.BigQueryContext
(a client, and some config)
bq_context = BigQueryContext(
dataset_id='your_dataset_id', # e.g. mine's 'flawrence'
project_id=..., # Defaults to moz-fx-data-bq-data-science
)
If you do not have a dataset, you will need to create one. Mozanalysis will save data into this dataset - if you want to access them directly (i.e. not through mozanalysis), they live at project_id.dataset_id.table_name
, where table_name
will be printed by mozanalysis when it saves/retrieves the data.
To bill queries to a project other than moz-fx-data-bq-data-science
, pass the project_id
as an argument when initializing your mozanalysis.bq.BigQueryContext
.
For querying data, the internal approach of mozanalysis
is to start by obtaining a list of who was enrolled in what branch, when. Then we try to quantify what happened to each client: for a given analysis window (a specified period of time defined with respect to the client’s enrollment date), we seek to obtain a value for each client for each metric. We end up with a results (pandas) DataFrame with one row per client and one column per metric.
We start by instantiating our mozanalysis.experiment.Experiment
object:
exp = Experiment(
experiment_slug='pref-fingerprinting-protections-retention-study-release-70',
start_date='2019-10-29',
num_dates_enrollment=8,
app_name="firefox_desktop"
)
start_date
is the submission_date
of the first enrollment (submission_date
is in UTC). If you intended to study one week’s worth of enrollments, then set num_dates_enrollment=8
: Normandy experiments typically go live in the evening UTC-time, so 8 days of data is a better approximation than 7.
We now gather a list of who was enrolled in what branch and when, and try to quantify what happened to each client. In many cases, the metrics in which you’re interested will already be in a metrics library, metric-hub. If not, then you can define your own—see mozanalysis.metrics.Metric
for examples—and ideally submit a PR to add them to metric-hub for the next experiment. To load a Metric from metric-hub, for example:
from mozanalysis.config import ConfigLoader
active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")
In this example, we’ll compute four metrics from metric-hub:
active hours
uri count
ad clicks
search count
As it happens, the first three metrics all come from the clients_daily
dataset, whereas “search count” comes from search_clients_daily
. These details are taken care of in the metric-hub definitions so that we don’t have to think about them here.
A metric must be computed over some analysis window, a period of time defined with respect to the enrollment date. We could use mozanalysis.experiment.Experiment.get_single_window_data()
to compute our metrics over a specific analysis window. But here, let’s create time series data: let’s have an analysis window for each of the first three weeks of the experiment, and measure the data for each of these analysis windows:
ts_res = exp.get_time_series_data(
bq_context=bq_context,
metric_list=[
active_hours,
uri_count,
ad_clicks,
search_count,
],
last_date_full_data='2019-11-28',
time_series_period='weekly'
)
The first two arguments to mozanalysis.experiment.Experiment.get_time_series_data()
should be clear by this point. last_date_full_data
is the last date for which we want to use data. For a currently-running experiment, it would typically be yesterday’s date (we have incomplete data for incomplete days!).
Metrics are pulled in from metric-hub based on the provided metric slugs.
time_series_period
can be 'daily'
, 'weekly'
or '28_day'
. A 'weekly'
time series neatly sidesteps/masks weekly seasonality issues: most of the experiment subjects will enroll within a day of the experiment launching - typically a Tuesday, leading to 'daily'
time series reflecting a non-uniform convolution of the metrics’ weekly seasonalities with the uneven enrollment numbers across the week.
mozanalysis.experiment.Experiment.get_time_series_data()
returns a mozanalysis.experiment.TimeSeriesResult
object, which can return DataFrames keyed by the start of their analysis windows (measured in days after enrollment):
>>> ts_res.keys()
[0, 7, 14]
If RAM permits, we can dump all the results into a dict
of DataFrames keyed by the start of their analysis windows:
res = dict(ts_res.items(bq_context))
Each value in res
is a pandas DataFrame in “the standard format”, with one row per enrolled client and one column per metric.
Otherwise you might want to load one analysis window at a time, by calling ts_res.get(bq_context, analysis_window_start)
for each analysis window in ts_res.keys()
, processing the resulting DataFrame, then discarding the DataFrame from RAM before moving onto the next analysis window.
Here are the columns of each result DataFrame:
>>> res[7].columns
Index(['branch', 'enrollment_date', 'num_enrollment_events', 'active_hours',
'uri_count', 'clients_daily_has_contradictory_branch',
'clients_daily_has_non_enrolled_data', 'ad_clicks', 'search_count'],
dtype='object')
The ‘branch’ column contains the client’s branch:
>>> res[7].branch.unique()
array(['treatment', 'control'], dtype=object)
And we can do the usual pandas DataFrame things - e.g. calculate the mean active hours per branch:
>>> res[7].groupby('branch').active_hours.mean()
branch
Cohort_1 6.246536
Cohort_2 6.719880
Cohort_3 6.468948
Name: active_hours, dtype: float64
Suppose we want to see whether the user had any active hours in their second week in the experiment. This information can be calculated from the active_hours
metric - we add this as a column to the results pandas DataFrame, then use mozanalysis.bayesian_stats.binary
to analyse this data:
res[7]['active_hours_gt_0'] = res[7]['active_hours'] > 0
retention_week_2 = mabsbin.compare_branches(res[7], 'active_hours_gt_0', ref_branch_label='Cohort_1')
Like most of the stats in mozanalysis
, mozanalysis.bayesian_stats.binary.compare_branches()
accepts a pandas DataFrame in “the standard format” and returns credible (or confidence) intervals for various quantities. It expects the reference branch to be named ‘control’; since this experiment used non-standard branch naming, we need to tell it that the control branch is named ‘Cohort_1’. The function returns credible intervals (CIs) for the fraction of active users in each branch.:
>>> retention_week_2['individual']
{'Cohort_1':
0.005 0.733865
0.025 0.734265
0.5 0.735536
0.975 0.736803
0.995 0.737201
mean 0.735535
dtype: float64,
'Cohort_2':
0.005 0.732368
0.025 0.732769
0.5 0.734041
0.975 0.735312
0.995 0.735710
mean 0.734041
dtype: float64,
'Cohort_3':
0.005 0.732289
0.025 0.732690
0.5 0.733962
0.975 0.735232
0.995 0.735630
mean 0.733962
dtype: float64}
(output re-wrapped for clarity)
For example, we can see that the fraction of users in Cohort_2 with >0 active hours in week 2 has an expectation value of 0.734, with a 95% CI of (0.7328, 0.7353).
And the function also returns credible intervals for the uplift in this quantity for each branch with respect to a reference branch:
>>> retention_week_2['comparative']
{'Cohort_3':
rel_uplift 0.005 -0.005222
0.025 -0.004568
0.5 -0.002173
0.975 0.000277
0.995 0.001056
exp -0.002166
abs_uplift 0.005 -0.003850
0.025 -0.003365
0.5 -0.001598
0.975 0.000204
0.995 0.000774
exp -0.001594
max_abs_diff 0.95 0.003092
prob_win NaN 0.041300
dtype: float64,
'Cohort_2':
rel_uplift 0.005 -0.005215
0.025 -0.004502
0.5 -0.002065
0.975 0.000359
0.995 0.001048
exp -0.002066
abs_uplift 0.005 -0.003840
0.025 -0.003314
0.5 -0.001520
0.975 0.000264
0.995 0.000769
exp -0.001520
max_abs_diff 0.95 0.003043
prob_win NaN 0.046800
dtype: float64}
(output re-wrapped for clarity)
rel_uplift
contains quantities related to the relative uplift of a branch with respect to the reference branch (as given by ref_branch_label
); for example, assuming a uniform prior, there is a 95% probability that Cohort_3 had between 0.457% fewer and 0.028% more users with >0 active hours in the second week, compared to Cohort_1. abs_uplift
refers to the absolute uplifts, and prob_win
gives the probability that the branch is better than the reference branch.
Since mozanalysis
is designed around this “standard format”, you can pass any of the values in res
to any of the statistics functions, as long as the statistics are suited to the column’s type (i.e. binary vs real-valued data):
import mozanalysis.bayesian_stats.binary as mabsbin
retention_week_2 = mabsbin.compare_branches(res[7], 'active_hours_gt_0')
import mozanalysis.frequentist_stats.bootstrap as mafsboot
boot_uri_week_1 = mafsboot.compare_branches(res[0], 'uri_count', threshold_quantile=0.9999)
import mozanalysis.bayesian_stats.survival_func as mabssf
sf_search_week_2 = mabssf.compare_branches(res[7], 'search_count')
dscontrib.flawrence.plot_experiments
has some (shaky) support for visualising stats over time series experiment results.
Get the data: cookbook
Time series (of analysis windows)
Condensing the above example for simpler copying and pasting:
!pip install mozanalysis=='{current_version}'
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
import mozanalysis.bayesian_stats.binary as mabsbin
from mozanalysis.experiment import Experiment
from mozanalysis.bq import BigQueryContext
from mozanalysis.config import ConfigLoader
bq_context = BigQueryContext(dataset_id='your_dataset_id')
active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")
uri_count = ConfigLoader.get_metric(slug="uri_count", app_name="firefox_desktop")
ad_clicks = ConfigLoader.get_metric(slug="ad_clicks", app_name="firefox_desktop")
search_count = ConfigLoader.get_metric(slug="search_count", app_name="firefox_desktop")
ts_res = exp.get_time_series_data(
bq_context=bq_context,
metric_list=[
active_hours,
uri_count,
ad_clicks,
search_count,
],
last_date_full_data='2019-11-28',
time_series_period='weekly'
)
res = dict(ts_res.items(bq_context))
One analysis window
If we’re only interested in users’ (say) second week in the experiment, then we don’t need to get a full time series.
!pip install mozanalysis=='{current_version}'
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
import mozanalysis.bayesian_stats.binary as mabsbin
from mozanalysis.experiment import Experiment
from mozanalysis.bq import BigQueryContext
from mozanalysis.config import ConfigLoader
bq_context = BigQueryContext(dataset_id='your_dataset_id')
active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")
res = exp.get_single_window_data(
bq_context=bq_context,
metric_list=[
active_hours,
],
last_date_full_data='2019-01-07',
analysis_start_days=7,
analysis_length_days=7
)
last_date_full_data
is less important for mozanalysis.experiment.Experiment.get_single_window_data()
than for mozanalysis.experiment.Experiment.get_time_series_data()
: while last_date_full_data
determines the length of the time series, here it simply sanity checks that the specified analysis window doesn’t stretch into the future for any enrolled users.
Crunch the stats
Each stats technique has a module in mozanalysis.bayesian_stats
or mozanalysis.frequentist_stats
, and a function compare_branches()
; for example mozanalysis.bayesian_stats.binary.compare_branches()
. This function accepts a pandas DataFrame in “the standard format”, and must be passed the name of the column containing the metric to be studied.
import mozanalysis.bayesian_stats.binary as mabsbin
import mozanalysis.bayesian_stats.bayesian_bootstrap as mabsboot
import mozanalysis.bayesian_stats.survival_func as mabssf
import mozanalysis.frequentist_stats.bootstrap as mafsboot
res_from_ts[7]['active_hours_gt_0'] = res_from_ts[7].active_hours_gt_0 > 0
mabsbin.compare_branches(res_from_ts[7], 'active_hours_gt_0')
mabsbin.compare_branches(res_from_ts[7], 'active_hours_gt_0', ref_branch_label='Cohort_1')
gpcd_res['active_hours_gt_0'] = gpcd_res.active_hours_gt_0 > 0
mabsbin.compare_branches(gpcd_res, 'active_hours_gt_0')
mafsboot.compare_branches(gpcd_res, 'active_hours', threshold_quantile=0.9999)
sf_search_week_2 = mabssf.compare_branches(gpcd_res, 'search_count')