mozetl.addon_aggregates package¶
Submodules¶
mozetl.addon_aggregates.addon_aggregates module¶
ETL code for the addon_aggregates dataset
-
mozetl.addon_aggregates.addon_aggregates.
add_addon_columns
(df)[source]¶ Constructs additional indicator columns decribing the add-on/theme present in a given record. The columns are
is_self_install is_shield_addon is_foreign_install is_system is_web_extension Which maps True -> 1 and False -> 0
- Parameters
df – SparkDF, exploded on active_addons, each record maps to a single add-on
:return df with the above columns added
-
mozetl.addon_aggregates.addon_aggregates.
aggregate_addons
(df)[source]¶ Aggregates add-on indicators by client, channel, version and locale. The result is a DataFrame with the additional aggregate columns:
n_self_installed_addons (int) n_shield_addons (int) n_foreign_installed_addons (int) n_system_addons (int) n_web_extensions (int) first_addon_install_date (str %Y%m%d) profile_creation_date (str %Y%m%d)
for each of the above facets.
- Parameters
df – an expoded instance of main_summary by active_addons with various additional indicator columns
- Return SparkDF
an aggregated dataset with each of the above columns
-
mozetl.addon_aggregates.addon_aggregates.
get_dest
(output_bucket, output_prefix, output_version, date=None, sample_id=None)[source]¶ Stitches together an s3 destination.
- Parameters
output_bucket – s3 output_bucket
output_prefix – s3 output_prefix (within output_bucket)
output_version – dataset output_version
:retrn str -> s3://output_bucket/output_prefix/output_version/submissin_date_s3=[date]/sample_id=[sid]
-
mozetl.addon_aggregates.addon_aggregates.
load_main_summary
(spark, input_bucket, input_prefix, input_version)[source]¶ Loads main_summary from the bucket constructed from input_bucket, input_prefix, input_version
- Parameters
spark – SparkSession object
input_bucket – s3 bucket (telemetry-parquet)
input_prefix – s3 prefix (main_summary)
input_version – dataset version (v4)
:return SparkDF