mozanalysis.segments
- class mozanalysis.segments.SegmentDataSource(name, from_expr, window_start: int = 0, window_end: int = 0, client_id_column: str = 'client_id', submission_date_column: str = 'submission_date', default_dataset: str | None = None, app_name: str | None = None, group_id_column: str = 'profile_group_id')[source]
Represents a table or view, from which segments may be defined.
window_start
andwindow_end
define the window of data used to determine whether each client fits a segment. Ideally this window ends at/before the moment of enrollment, so that user’s branches can’t bias the segment assignment.window_start
andwindow_end
are integers, representing the number of days before or after enrollment.- Parameters:
name (str) – Name for the Data Source. Should be unique to avoid confusion.
from_expr (str) – FROM expression - often just a fully-qualified table name. Sometimes a subquery. May contain the string
{dataset}
which will be replaced with an app-specific dataset for Glean apps. If the expression is templated on dataset, default_dataset is mandatory.window_start (int, optional) – See above.
window_end (int, optional) – See above.
client_id_column (str, optional) – Name of the column that contains the
client_id
(join key). Defaults to ‘client_id’.submission_date_column (str, optional) – Name of the column that contains the submission date (as a date, not timestamp). Defaults to ‘submission_date’.
default_dataset (str, optional) – The value to use for {dataset} in from_expr if a value is not provided at runtime. Mandatory if from_expr contains a {dataset} parameter.
app_name – (str, optional): app_name used with metric-hub, used for validation
group_id_column (str, optional) – Name of the column that contains the
group_id
(join key). Defaults to ‘profile_group_id’.
- from_expr_for(dataset: str | None) str [source]
Expands the
from_expr
template for the given dataset. Iffrom_expr
is not a template, returnsfrom_expr
.- Parameters:
dataset (str or None) – Dataset name to substitute into the from expression.
- build_query(segment_list, time_limits, experiment_slug, from_expr_dataset=None, analysis_unit: AnalysisUnit = AnalysisUnit.CLIENT)[source]
Return a nearly self contained SQL query.
The query takes a list of
{analysis_id}``s from ``raw_enrollments
, and adds one non-NULL boolean column per segment: True if the client is in the segment, False otherwise.
- build_query_target(target, time_limits, from_expr_dataset=None, analysis_unit: AnalysisUnit = AnalysisUnit.CLIENT)[source]
Return a nearly-self contained SQL query, for use with mozanalysis.sizing.HistoricalTarget.
This query returns all distinct client IDs that satisfy the criteria for inclusion in a historical analysis using this datasource. Separate sub-queries are constructed for each additional Segment in the analysis.
- class mozanalysis.segments.Segment(name: str, data_source, select_expr: str, friendly_name: str | None = None, description: str | None = None, app_name: str | None = None)[source]
Represents an experiment Segment.
- Parameters:
name (str) – The segment’s name; will be a column name.
data_source (SegmentDataSource) – Data source that provides the columns referenced in
select_expr
.select_expr (str) – A SQL select expression that includes an aggregation function (we
GROUP BY {analysis_unit}
). Returns a non-NULLBOOL
:True
if the user is in the segment,False
otherwise.friendly_name (str) – A human-readable dashboard title for this segment
description (str) – A paragraph of Markdown-formatted text describing the segment in more detail, to be shown on dashboards
app_name – (str, optional): app_name used with metric-hub, used for validation