bigquery-etl and Bigeye
Monitors can be defined alongside derived datasets in bigquery-etl. Monitoring in Bigeye for a specific table can be enabled by adding monitoring
metadata to the metadata.yaml
file:
friendly_name: Some Table [warn]
monitoring:
enabled: true # Enables monitoring for the table in Bigeye and deploys freshness and volume metrics
collection: Test # An existing collection these monitors should be part of in Bigeye
Enabling monitoring for a table automatically deploys freshness and volume metrics for this table.
Bigeye monitors are triggered automatically via Airflow for queries that have monitoring
set to enabled: true
. The checks are executed after the ETL run for the table has been completed.
To indicate whether a failing check should block any downstream Airflow tasks, a [warn]
or [fail]
can be added to the name of the Bigeye metric. By default, all metrics that do not have either of those tags specified are considered as [warn]
. These metrics won't be blocking any downstream Airflow tasks when checks fail, but any failing check will appear in the Bigeye dashboard. Metrics that have [fail]
specified in their names will block the execution of downstream Airflow tasks in the event of a check failing.
Bigconfig
Additional and custom monitors can be defined in a Bigconfig bigconfig.yml
file that is stored in the same directory as the table query. Bigconfig allows users to deploy other pre-defined monitors, such as row counts or null checks on a table or column level.
Custom SQL Rules
This is a temporary workaround until custom SQL rules are supported in Bigconfig, which is currently being worked on.
Custom SQL rules can be configured in a separate bigeye_custom_rules.sql
file alongside the query. This file can contain various rules:
-- {
-- "name": "Fenix releases version format",
-- "alert_conditions": "value",
-- "range": {
-- "min": 0,
-- "max": 1
-- },
-- "collections": ["Test"],
-- "owner": "",
-- "schedule": "Default Schedule - 13:00 UTC"
-- }
SELECT
ROUND((COUNTIF(NOT REGEXP_CONTAINS(version, r"^[0-9]+\..+$"))) / COUNT(*) * 100, 2) AS perc
FROM
`{{ project_id }}.{{ dataset_id }}.{{ table_name }}`;
-- {
-- "name": "Fenix releases product check",
-- "alert_conditions": "value",
-- "range": {
-- "min": 0,
-- "max": 1
-- },
-- "collections": ["Test"],
-- "owner": "",
-- "schedule": "Default Schedule - 13:00 UTC"
-- }
SELECT
ROUND((COUNTIF(product != "fenix")) / COUNT(*) * 100, 2) AS perc
FROM
`{{ project_id }}.{{ dataset_id }}.{{ table_name }}`;
The SQL comment before the rule SQL has to be a JSON object that contains the configuration parameters for this rule:
name
: the name of the SQL rule. Specify[warn]
or[fail]
to indicate whether a rule failure should block downstream Airflow tasksalert_conditions
: one ofvalue
(alerts based on the returned value) orcount
(alerts based on whether the query returns rows)collections
: list of collections this rule should be a part ofowner
: email address of the rule ownerschedule
: optional schedule of when this rule should be triggered. The rule will also get triggered as part of Airflowrange
: optional range of allowed values when"alert_conditions": "value"
Deployment
To generate a bigconfig.yml
file with the default metrics when monitoring is enabled run: bqetl monitoring update [PATH]
.
The created file can be manually edited. For tables that do not have a bigconfig.yml
checked into the repository, the file will get generated automatically before deployment to Bigeye. Files only need to be checked in if there are some customizations.
The deployment of Bigconfig files runs automatically as part of the artifact deployment process, after tables and views have been deployed.