Public Data

For background, see Accessing Public Data on

  • To make query results publicly available, the public_bigquery flag must be set in metadata.yaml
    • Tables will get published in the mozilla-public-data GCP project which is accessible by everyone, also external users
  • To make query results publicly available as JSON, public_json flag must be set in metadata.yaml
    • Data will be accessible under
      • A list of all available datasets is published under
    • For example:
    • Output JSON files have a maximum size of 1GB, data can be split up into multiple files (000000000000.json, 000000000001.json, ...)
    • incremental_export controls how data should be exported as JSON:
      • false: all data of the source table gets exported to a single location
      • true: only data that matches the submission_date parameter is exported as JSON to a separate directory for this date
  • For each dataset, a metadata.json gets published listing all available files, for example:
  • The timestamp when the dataset was last updated is recorded in last_updated, e.g.: