mozanalysis.frequentist_stats.sample_size
- mozanalysis.frequentist_stats.sample_size.sample_size_curves(df: DataFrame, metrics_list: list, solver, effect_size: float | ndarray | Series | List[float] = 0.01, power: float | ndarray | Series | List[float] = 0.8, alpha: float | ndarray | Series | List[float] = 0.05, **solver_kwargs) Dict[str, DataFrame] [source]
Loop over a list of different parameters to produce sample size estimates given those parameters. A single parameter in [effect_size, power, alpha] should be passed a list; the sample size curve will be calculated with this as the variable.
- Parameters:
df – A pandas DataFrame of queried historical data.
metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.
of (solver (any function that returns sample size as function) – effect_size, power, alpha): The solver being used to calculate sample size.
effect_size (float or ArrayLike, default .01) – For test of differences in proportions, the absolute difference; for tests of differences in mean, the percent change.
alpha (float or ArrayLike, default .05) – Significance level for the experiment.
power (float or ArrayLike, default .90) – Probability of detecting an effect, when a significant effect exists.
**solver_kwargs (dict) – Arguments necessary for the provided solver.
- Returns:
A dictionary of pd.DataFrame objects. An item in the dictionary is created for each metric in metric_list, containing a DataFrame of sample size per branch, number of clients that satisfied targeting, and population proportion per branch at each value of the iterable parameter.
- mozanalysis.frequentist_stats.sample_size.difference_of_proportions_sample_size_calc(df: DataFrame, metrics_list: List[Metric], effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) dict [source]
Perform sample size calculation for an experiment to test for a difference in proportions.
- Parameters:
df – A pandas DataFrame of queried historical data.
metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each
effect_size (float, default .01) – Difference in proportion for the minimum detectable effect – effect_size = p(event under alt) - p(event under null)
alpha (float, default .05) – Significance level for the experiment.
power (float, default .90) – Probability of detecting an effect, when a significant effect exists.
outlier_percentile (float, default .995) – Percentile at which to trim each columns.
- Returns:
A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.
- mozanalysis.frequentist_stats.sample_size.z_or_t_ind_sample_size_calc(df: DataFrame, metrics_list: List[Metric], test: str = 'z', effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) dict [source]
Perform sample size calculation for an experiment based on independent samples t or z tests.
- Parameters:
df – A pandas DataFrame of queried historical data.
metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each
test (str, default z) – z or t to indicate which solver to use
effect_size (float, default .01) – Percent change in metrics expected as a result of the experiment treatment
alpha (float, default .05) – Significance level for the experiment.
power (float, default .90) – Probability of detecting an effect, when a significant effect exists.
outlier_percentile (float, default .995) – Percentile at which to trim each columns.
- Returns:
A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.
- mozanalysis.frequentist_stats.sample_size.empirical_effect_size_sample_size_calc(res: TimeSeriesResult, bq_context: BigQueryContext, metric_list: list, quantile: float = 0.9, power: float = 0.8, alpha: float = 0.05, parent_distribution: str = 'normal', plot_effect_sizes: bool = False) dict [source]
Perform sample size calculation with empirical effect size and asymptotic approximation of Wilcoxen-Mann-Whitney U Test. Empirical effect size is estimated using a quantile of week-to-week changes over the course of the study, and the variance in the test statistic is estimated as a quantile of weekly variance in metrics. Sample size calculation is based on the asymptotic relative efficiency (ARE) of the U test to the T test (see Stapleton 2008, pg 266, or https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/ Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf)
- Parameters:
res – A TimeSeriesResult, generated by mozanalysis.sizing.HistoricalTarget.get_time_series_data.
bq_context – A mozanalysis.bq.BigQueryContext object that handles downloading time series data from BigQuery.
metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.
quantile (float, default .90) – Quantile used to calculate the effect size as the quantile of week-to-week metric changes and the variance of the mean.
alpha (float, default .05) – Significance level for the experiment.
power (float, default .90) – Probability of detecting an effect, when a significant effect exists.
parent_distribution (str, default "normal") – Distribution of the parent data; must be normal, uniform, logistic, or laplace.
plot_effect_sizes (bool, default False) – Whether or not to plot the distribution of effect sizes observed in historical data.
- Returns:
A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.
- mozanalysis.frequentist_stats.sample_size.poisson_diff_solve_sample_size(df: DataFrame, metrics_list: List[Metric], effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) dict [source]
Sample size for test of difference of Poisson rates, based on Poisson rate’s asymptotic normality.
- Parameters:
df – A pandas DataFrame of queried historical data.
metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each
test (str, default z) – z or t to indicate which solver to use
effect_size (float, default .01) – Percent change in metrics expected as a result of the experiment treatment
alpha (float, default .05) – Significance level for the experiment.
power (float, default .90) – Probability of detecting an effect, when a significant effect exists.
outlier_percentile (float, default .995) – Percentile at which to trim each columns.
- Returns:
A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.
- mozanalysis.frequentist_stats.sample_size.variable_enrollment_length_sample_size_calc(bq_context: BigQueryContext, start_date: str | datetime, max_enrollment_days: int, analysis_length: int, metric_list: List[Metric], target_list: List[Segment], variable_window_length: int = 7, experiment_name: str | None = '', app_id: str | None = '', to_pandas: bool = True, **sizing_kwargs) Dict[str, Dict[str, int] | DataFrame] [source]
Sample size calculation over a variable enrollment window. This function will fetch a DataFrame with metrics defined in metric_list for a target population defined in the target_list over an enrollment window of length max_enrollment_days. Sample size calculation is performed using clients enrolled in the first variable_window_length dates in that max enrollment window; that window is incrementally widened by the variable window length and sample size calculation performed again, until the last enrollment date is reached.
- Parameters:
bq_context – A mozanalysis.bq.BigQueryContext object that handles downloading data from BigQuery.
start_date (str or datetime in %Y-%m-%d format) – First date of enrollment for sizing job.
max_enrollment_days (int) – Maximum number of dates to consider for the enrollment period for the experiment in question.
analysis_length (int) – Number of days to record metrics for each client in the experiment in question.
metric_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.
target_list (list of mozanalysis.segments.Segment) – List of segments used to identify clients to include in the study.
variable_window_length (int) – Length of the intervals used to extend the enrollment period incrementally. Sample sizes are recalculated over each variable enrollment period.
experiment_name (str) – Optional name used to name the target and metric tables in BigQuery.
app_id (str) – Application that experiment will be run on.
**sizing_kwargs – Arguments to pass to z_or_t_ind_sample_size_calc
- Returns:
A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.