mozetl package¶

Subpackages¶

Submodules¶

mozetl.cli module¶

mozetl.constants module¶

mozetl.main module¶

mozetl.main.etl_job(sc, sqlContext)[source]¶: This is the function that will be executed on the cluster

mozetl.main.get_data(sc)[source]¶

mozetl.main.ping_to_row(ping)[source]¶

mozetl.main.transform_pings(pings)[source]¶: Take a dataframe of main pings and summarize OS share

mozetl.schemas module¶

mozetl.utils module¶

mozetl.utils.delete_from_s3(bucket_name, keys_to_delete)[source]¶

mozetl.utils.extract_submission_window_for_activity_day(frame, date, lag_days)[source]¶

Extract rows with an activity_date of date minus lag_days and a submission_date between activity_date and date (inclusive).

:date ‘Y-m-d’ of the end of the target period :lag_days number of days after date in the target period :frame DataFrame homologous with main_summary

Note that the start_date will be lag-days days before date. In other words, if you pass in 2017-01-20 and set lag-days to 5, the aggregation will be processed for day 2017-01-15 (the resulting data will cover submission dates including the activity day itself plus 5 days of lag for a total of 6 days).

mozetl.utils.format_as_submission_date(date)[source]¶

mozetl.utils.format_spark_path(bucket, prefix)[source]¶

mozetl.utils.generate_filter_parameters(end_date, days_back)[source]¶

mozetl.utils.parse_as_submission_date(date_string)[source]¶

mozetl.utils.send_ses('me@example.com, 'greetings', "Hi!", 'you@example.com)[source]¶

```

Raises a RuntimeError if the message did not send correctly.

mozetl.utils.stop_session_safely(spark_session)[source]¶: Safely stops Spark session. This is no-op if running on Databricks - we shouldn’t stop session there since it’s managed by the platform, doing so fails the job.

mozetl.utils.upload_file_to_s3(client, filepath, bucket, key, ACL='bucket-owner-full-control')[source]¶

mozetl.utils.write_csv(dataframe, path, header=True)[source]¶

Write a dataframe to local disk.

Disclaimer: Do not write csv files larger than driver memory. This is ~15GB for ec2 c3.xlarge (due to caching overhead).

mozetl.utils.write_csv_to_s3(dataframe, bucket, key, header=True)[source]¶

mozetl package¶

Subpackages¶

Submodules¶

mozetl.cli module¶

mozetl.constants module¶

mozetl.main module¶

mozetl.schemas module¶

mozetl.utils module¶

Module contents¶

python_mozetl

Navigation

Related Topics