=====
Guide
=====

Basic experiment: get the data & crunch the stats
=================================================

Let's start by analysing a straightforward pref-flip experiment on the desktop browser.

If we're using a Colab notebook, we begin by installing the latest version of :mod:`mozanalysis` into the notebook. It's a good idea to specify the specific version (`from pypi <https://pypi.org/project/mozanalysis/>`_), for reproducibility::

    !pip install mozanalysis=='{current_version}'

We take the per-notebook daily trudge::

    from google.colab import auth
    auth.authenticate_user()
    print('Authenticated')

Then we import the necessary classes for getting the data, and for analysing the data, and for interacting with BigQuery::

    import mozanalysis.bayesian_stats.binary as mabsbin
    from mozanalysis.experiment import Experiment
    from mozanalysis.bq import BigQueryContext


And get a :class:`mozanalysis.bq.BigQueryContext` (a client, and some config)
::

    bq_context = BigQueryContext(
        dataset_id='your_dataset_id',  # e.g. mine's 'flawrence'
        project_id=...,  # Defaults to moz-fx-data-bq-data-science
    )

If you do not have a dataset, you will need to `create one <https://cloud.google.com/bigquery/docs/datasets#create-dataset>`_. Mozanalysis will save data into this dataset - if you want to access them directly (i.e. not through mozanalysis), they live at ``project_id.dataset_id.table_name`` , where ``table_name`` will be printed by mozanalysis when it saves/retrieves the data.

To bill queries to a project other than ``moz-fx-data-bq-data-science``, pass the ``project_id`` as an argument when initializing your :class:`mozanalysis.bq.BigQueryContext`.

For querying data, the internal approach of :mod:`mozanalysis` is to start by obtaining a list of who was enrolled in what branch, when. Then we try to quantify what happened to each client: for a given analysis window (a specified period of time defined with respect to the client's enrollment date), we seek to obtain a value for each client for each metric. We end up with a results (pandas) DataFrame with one row per client and one column per metric.


We start by instantiating our :class:`mozanalysis.experiment.Experiment` object::

    exp = Experiment(
        experiment_slug='pref-fingerprinting-protections-retention-study-release-70',
        start_date='2019-10-29',
        num_dates_enrollment=8,
        app_name="firefox_desktop"
    )

``start_date`` is the ``submission_date`` of the first enrollment (``submission_date`` is in UTC). If you intended to study one week's worth of enrollments, then set ``num_dates_enrollment=8``: Normandy experiments typically go live in the evening UTC-time, so 8 days of data is a better approximation than 7.


We now gather a list of who was enrolled in what branch and when, and try to quantify what happened to each client. In many cases, the metrics in which you're interested will already be in a metrics library, `metric-hub <https://github.com/mozilla/metric-hub>`_. If not, then you can define your own---see :class:`mozanalysis.metrics.Metric` for examples---and ideally submit a PR to add them to metric-hub for the next experiment. To load a Metric from metric-hub, for example::

    from mozanalysis.config import ConfigLoader
    active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")

In this example, we'll compute four metrics from metric-hub:

* active hours
* uri count
* ad clicks
* search count

As it happens, the first three metrics all come from the ``clients_daily`` dataset, whereas "search count" comes from ``search_clients_daily``. These details are taken care of in the `metric-hub definitions <https://github.com/mozilla/metric-hub/tree/main/definitions>`_ so that we don't have to think about them here.

A metric must be computed over some `analysis window`, a period of time defined with respect to the enrollment date. We could use :meth:`mozanalysis.experiment.Experiment.get_single_window_data()` to compute our metrics over a specific analysis window. But here, let's create time series data: let's have an analysis window for each of the first three weeks of the experiment, and measure the data for each of these analysis windows::

    ts_res = exp.get_time_series_data(
        bq_context=bq_context,
        metric_list=[
            active_hours,
            uri_count,
            ad_clicks,
            search_count,
        ],
        last_date_full_data='2019-11-28',
        time_series_period='weekly'
    )

The first two arguments to :meth:`mozanalysis.experiment.Experiment.get_time_series_data()` should be clear by this point. ``last_date_full_data`` is the last date for which we want to use data. For a currently-running experiment, it would typically be yesterday's date (we have incomplete data for incomplete days!).

Metrics are pulled in from `metric-hub <https://github.com/mozilla/metric-hub>`_ based on the provided metric slugs.

``time_series_period`` can be ``'daily'``, ``'weekly'`` or ``'28_day'``. A ``'weekly'`` time series neatly sidesteps/masks weekly seasonality issues: most of the experiment subjects will enroll within a day of the experiment launching - typically a Tuesday, leading to ``'daily'`` time series reflecting a non-uniform convolution of the metrics' weekly seasonalities with the uneven enrollment numbers across the week.

:meth:`mozanalysis.experiment.Experiment.get_time_series_data()` returns a :class:`mozanalysis.experiment.TimeSeriesResult` object, which can return DataFrames keyed by the start of their analysis windows (measured in days after enrollment)::

    >>> ts_res.keys()
    [0, 7, 14]

If RAM permits, we can dump all the results into a ``dict`` of DataFrames keyed by the start of their analysis windows::

    res = dict(ts_res.items(bq_context))

Each value in ``res`` is a pandas DataFrame in "the standard format", with one row per enrolled client and one column per metric.

Otherwise you might want to load one analysis window at a time, by calling ``ts_res.get(bq_context, analysis_window_start)`` for each analysis window in ``ts_res.keys()``, processing the resulting DataFrame, then discarding the DataFrame from RAM before moving onto the next analysis window.

Here are the columns of each result DataFrame::

    >>> res[7].columns
    Index(['branch', 'enrollment_date', 'num_enrollment_events', 'active_hours',
           'uri_count', 'clients_daily_has_contradictory_branch',
           'clients_daily_has_non_enrolled_data', 'ad_clicks', 'search_count'],
          dtype='object')

The 'branch' column contains the client's branch::

    >>> res[7].branch.unique()
    array(['treatment', 'control'], dtype=object)

And we can do the usual pandas DataFrame things - e.g. calculate the mean active hours per branch::

    >>> res[7].groupby('branch').active_hours.mean()
    branch
    Cohort_1    6.246536
    Cohort_2    6.719880
    Cohort_3    6.468948
    Name: active_hours, dtype: float64

Suppose we want to see whether the user had any active hours in their second week in the experiment. This information can be calculated from the ``active_hours`` metric - we add this as a column to the results pandas DataFrame, then use :mod:`mozanalysis.bayesian_stats.binary` to analyse this data::

    res[7]['active_hours_gt_0'] = res[7]['active_hours'] > 0

    retention_week_2 = mabsbin.compare_branches(res[7], 'active_hours_gt_0', ref_branch_label='Cohort_1')

Like most of the stats in :mod:`mozanalysis`, :func:`mozanalysis.bayesian_stats.binary.compare_branches()` accepts a pandas DataFrame in "the standard format" and returns credible (or confidence) intervals for various quantities. It expects the reference branch to be named 'control'; since this experiment used non-standard branch naming, we need to tell it that the control branch is named 'Cohort_1'. The function returns credible intervals (CIs) for the fraction of active users in each branch.::

    >>> retention_week_2['individual']
    {'Cohort_1':
         0.005    0.733865
         0.025    0.734265
         0.5      0.735536
         0.975    0.736803
         0.995    0.737201
         mean     0.735535
         dtype: float64,
     'Cohort_2':
         0.005    0.732368
         0.025    0.732769
         0.5      0.734041
         0.975    0.735312
         0.995    0.735710
         mean     0.734041
         dtype: float64,
     'Cohort_3':
         0.005    0.732289
         0.025    0.732690
         0.5      0.733962
         0.975    0.735232
         0.995    0.735630
         mean     0.733962
         dtype: float64}

(output re-wrapped for clarity)

For example, we can see that the fraction of users in Cohort_2 with >0 active hours in week 2 has an expectation value of 0.734, with a 95% CI of (0.7328, 0.7353).

And the function also returns credible intervals for the uplift in this quantity for each branch with respect to a reference branch::

    >>> retention_week_2['comparative']
    {'Cohort_3':
        rel_uplift    0.005   -0.005222
                      0.025   -0.004568
                      0.5     -0.002173
                      0.975    0.000277
                      0.995    0.001056
                      exp     -0.002166
        abs_uplift    0.005   -0.003850
                      0.025   -0.003365
                      0.5     -0.001598
                      0.975    0.000204
                      0.995    0.000774
                      exp     -0.001594
        max_abs_diff  0.95     0.003092
        prob_win      NaN      0.041300
        dtype: float64,
     'Cohort_2':
        rel_uplift    0.005   -0.005215
                      0.025   -0.004502
                      0.5     -0.002065
                      0.975    0.000359
                      0.995    0.001048
                      exp     -0.002066
        abs_uplift    0.005   -0.003840
                      0.025   -0.003314
                      0.5     -0.001520
                      0.975    0.000264
                      0.995    0.000769
                      exp     -0.001520
        max_abs_diff  0.95     0.003043
        prob_win      NaN      0.046800
        dtype: float64}

(output re-wrapped for clarity)

``rel_uplift`` contains quantities related to the relative uplift of a branch with respect to the reference branch (as given by ``ref_branch_label``); for example, assuming a uniform prior, there is a 95% probability that Cohort_3 had between 0.457% fewer and 0.028% more users with >0 active hours in the second week, compared to Cohort_1. ``abs_uplift`` refers to the absolute uplifts, and ``prob_win`` gives the probability that the branch is better than the reference branch.

Since :mod:`mozanalysis` is designed around this "standard format", you can pass any of the values in ``res`` to any of the statistics functions, as long as the statistics are suited to the column's type (i.e. binary vs real-valued data)::

    import mozanalysis.bayesian_stats.binary as mabsbin
    retention_week_2 = mabsbin.compare_branches(res[7], 'active_hours_gt_0')

    import mozanalysis.frequentist_stats.bootstrap as mafsboot
    boot_uri_week_1 = mafsboot.compare_branches(res[0], 'uri_count', threshold_quantile=0.9999)

    import mozanalysis.bayesian_stats.survival_func as mabssf
    sf_search_week_2 = mabssf.compare_branches(res[7], 'search_count')

:mod:`dscontrib.flawrence.plot_experiments` has some (shaky) support for visualising stats over time series experiment results.


Get the data: cookbook
=============================

Time series (of analysis windows)
---------------------------------
Condensing the above example for simpler copying and pasting::

    !pip install mozanalysis=='{current_version}'

    from google.colab import auth
    auth.authenticate_user()
    print('Authenticated')

    import mozanalysis.bayesian_stats.binary as mabsbin
    from mozanalysis.experiment import Experiment
    from mozanalysis.bq import BigQueryContext
    from mozanalysis.config import ConfigLoader

    bq_context = BigQueryContext(dataset_id='your_dataset_id')

    active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")
    uri_count = ConfigLoader.get_metric(slug="uri_count", app_name="firefox_desktop")
    ad_clicks = ConfigLoader.get_metric(slug="ad_clicks", app_name="firefox_desktop")
    search_count = ConfigLoader.get_metric(slug="search_count", app_name="firefox_desktop")
    
    ts_res = exp.get_time_series_data(
        bq_context=bq_context,
        metric_list=[
            active_hours,
            uri_count,
            ad_clicks,
            search_count,
        ],
        last_date_full_data='2019-11-28',
        time_series_period='weekly'
    )

    res = dict(ts_res.items(bq_context))

One analysis window
-------------------

If we're only interested in users' (say) second week in the experiment, then we don't need to get a full time series.
::

    !pip install mozanalysis=='{current_version}'

    from google.colab import auth
    auth.authenticate_user()
    print('Authenticated')

    import mozanalysis.bayesian_stats.binary as mabsbin
    from mozanalysis.experiment import Experiment
    from mozanalysis.bq import BigQueryContext
    from mozanalysis.config import ConfigLoader

    bq_context = BigQueryContext(dataset_id='your_dataset_id')
    
    active_hours = ConfigLoader.get_metric(slug="active_hours", app_name="firefox_desktop")

    res = exp.get_single_window_data(
        bq_context=bq_context,
        metric_list=[
            active_hours,
        ],
        last_date_full_data='2019-01-07',
        analysis_start_days=7,
        analysis_length_days=7
    )

``last_date_full_data`` is less important for :meth:`mozanalysis.experiment.Experiment.get_single_window_data` than for :meth:`mozanalysis.experiment.Experiment.get_time_series_data`: while ``last_date_full_data`` determines the length of the time series, here it simply sanity checks that the specified analysis window doesn't stretch into the future for any enrolled users.


Crunch the stats
================

Each stats technique has a module in :mod:`mozanalysis.bayesian_stats` or :mod:`mozanalysis.frequentist_stats`, and a function ``compare_branches()``; for example :func:`mozanalysis.bayesian_stats.binary.compare_branches`. This function accepts a pandas DataFrame in "the standard format", and must be passed the name of the column containing the metric to be studied.
::

    import mozanalysis.bayesian_stats.binary as mabsbin
    import mozanalysis.bayesian_stats.bayesian_bootstrap as mabsboot
    import mozanalysis.bayesian_stats.survival_func as mabssf
    import mozanalysis.frequentist_stats.bootstrap as mafsboot

    res_from_ts[7]['active_hours_gt_0'] = res_from_ts[7].active_hours_gt_0 > 0
    mabsbin.compare_branches(res_from_ts[7], 'active_hours_gt_0')
    mabsbin.compare_branches(res_from_ts[7], 'active_hours_gt_0', ref_branch_label='Cohort_1')

    gpcd_res['active_hours_gt_0'] = gpcd_res.active_hours_gt_0 > 0
    mabsbin.compare_branches(gpcd_res, 'active_hours_gt_0')

    mafsboot.compare_branches(gpcd_res, 'active_hours', threshold_quantile=0.9999)

    sf_search_week_2 = mabssf.compare_branches(gpcd_res, 'search_count')