Events Daily
- Introduction
- Contents
- Limitations
- Accessing the Data
- Scheduling
- Code Reference
- Background and Caveats
Introduction
The events_daily
datasets can help answer questions about sequences of events (sometimes called "funnels").
It is used by the Event Funnel Analysis Explore in Looker, but can also be queried directly using BigQuery.
As of this writing, variations of the events_daily
dataset are available for Firefox Desktop, Firefox for Android, and Mozilla VPN.
Contents
events_daily
has one row per-client per-day, much the same as clients_daily
. The table is created in a two-step process:
- An ancillary table,
event_types
, is updated with the new events seen on that day. Each event is mapped to a unique unicode character, and each event property (theextras
fields) is also mapped to a unique unicode character. - For every user, that day's events are mapped to their associated unicode characters (including
event_properties
). The strings are aggregated and comma-separated, giving a single ordered string that represents all of that user's events on that day.
For most products, only events in the Glean events ping are aggregated (Firefox Desktop currently aggregates events in the legacy desktop "event" ping). If you're looking for events sent in other pings, you'll need to query them directly.
Included in this data is a set of dimensional information about the user, also derived from the events ping. The full list of fields is available in the templated source query.
Limitations
This approach makes some queries fast and easy, but has some limits:
- Each product is limited to at most 1 million unique event types
- Each event property is limited to a set number of values (currently set to 1000 for most products). As a result, some properties will not be accessible in the table.
- Queries do not know the amount of time that passed between events, only that they occurred on the same day. This can be alleviated by sessionizing and splitting the events string using an event which indicates the start of a session. For example, for Firefox for Android, this could be
events.app_opened
.
Accessing the Data
While it is possible to build queries that access this events data directly, the Data Platform instead recommends using a set of stored procedures we have available.
These procedures create BigQuery views that hide the complexity of the event representation.
The mozfun
library documentation has information about these procedures and examples of their usage.
Example Queries
This query gives the event-count and client-counts per-event per-day.
SELECT
submission_date,
category,
event,
COUNT(*) AS client_count,
SUM(count) AS event_count
FROM
`moz-fx-data-shared-prod`.fenix.events_daily
CROSS JOIN
UNNEST(mozfun.event_analysis.extract_event_counts(events))
JOIN
`moz-fx-data-shared-prod`.fenix.event_types
USING (index)
WHERE
submission_date >= DATE_SUB(current_date, INTERVAL 28 DAY)
GROUP BY
submission_date,
category,
event
Link to a dashboard using this query in STMO.
Scheduling
This dataset is scheduled on Airflow and updated daily.
Code Reference
The source for events daily is defined inside bigquery-etl as a set of templated queries which generate the events_daily
tables as well as the dependent event_types
tables for each supported application.
You can find the source under bigquery_etl/sql_generators
.
Background and Caveats
See this presentation for background.