event_analysis
These functions are specific for use with the events_daily
and event_types
tables.
By themselves, these two tables are nearly impossible to use since the event history
is compressed; however, these stored procedures should make the data accessible.
The events_daily
table is created as a result of two steps:
1. Map each event to a single UTF8 char which will represent it
2. Group each client-day and store a string that records, using the
compressed format, that clients' event history for that day.
The characters are ordered by the timestamp which they appeared
that day.
The best way to access this data is to create a view to do the heavy lifting. For example, to see which clients completed a certain action, you can create a view using these functions that knows what that action's representation is (using the compressed mapping from 1.) and create a regex string that checks for the presence of that event. The view makes this transparent, and allows users to simply query a boolean field representing the presence of that event on that day.
aggregate_match_strings (UDF)
Given an array of strings that each match a single event, aggregate those into a single regex string that will match any of the events.
Parameters
INPUTS
match_strings ARRAY<STRING>
OUTPUTS
STRING
create_count_steps_query (Stored Procedure)
Generate the SQL statement that can be used to create an easily queryable view on events data.
Parameters
INPUTS
project STRING, dataset STRING, events ARRAY<STRUCT<category STRING, event_name STRING>>
OUTPUTS
sql STRING
create_events_view (Stored Procedure)
Create a view that queries the events_daily
table. This view currently supports both funnels and event counts.
Funnels are created as a struct, with each step in the funnel as a boolean column in the struct, indicating whether the user completed that step on that day.
Event counts are simply integers.
Usage
create_events_view(
view_name STRING,
project STRING,
dataset STRING,
funnels ARRAY<STRUCT<
funnel_name STRING,
funnel ARRAY<STRUCT<
step_name STRING,
events ARRAY<STRUCT<
category STRING,
event_name STRING>>>>>>,
counts ARRAY<STRUCT<
count_name STRING,
events ARRAY<STRUCT<
category STRING,
event_name STRING>>>>
)
view_name
: The name of the view that will be created. This view will be in the shared-prod project, in the analysis bucket, and so will be queryable at:`moz-fx-data-shared-prod`.analysis.{view_name}
project
: The project where thedataset
is located.
dataset
: The dataset that must contain both theevents_daily
andevent_types
tables.
funnels
: An array of funnels that will be created. Each funnel has two parts: 1.funnel_name
: The name of the funnel is what the column representing the funnel will be named in the view. For example, with the value"onboarding"
, the view can be selected as follows:2.SELECT onboarding FROM `moz-fx-data-shared-prod`.analysis.{view_name}
funnel
: The ordered series of steps that make up a funnel. Each step also has: 1.step_name
: Used to name the column within the funnel and represents whether the user completed that step on that day. For example, withinonboarding
a user may havecompleted_first_card
as a step; this can be queried at2.SELECT onboarding.completed_first_step FROM `moz-fx-data-shared-prod`.analysis.{view_name}
events
: The set of events which indicate the user completed that step of the funnel. Most of the time this is a single event. Each event has acategory
andevent_name
.
counts
: An array of counts. Each count has two parts, similar to funnel steps: 1.count_name
: Used to name the column representing the event count. E.g."clicked_settings_count"
would be queried at2.SELECT clicked_settings_count FROM `moz-fx-data-shared-prod`.analysis.{view_name}
events
: The set of events you want to count. Each event has acategory
andevent_name
.
Recommended Pattern
Because the view definitions themselves are not informative about the contents of the events fields, it is best to put your query immediately after the procedure invocation, rather than invoking the procedure and running a separate query.
This STMO query is an example of doing so. This allows viewers of the query to easily interpret what the funnel and count columns represent.
Structure of the Resulting View
The view will be created at
`moz-fx-data-shared-prod`.analysis.{event_name}.
The view will have a schema roughly matching the following:
root
|-- submission_date: date
|-- client_id: string
|-- {funnel_1_name}: record
| |-- {funnel_step_1_name} boolean
| |-- {funnel_step_2_name} boolean
...
|-- {funnel_N_name}: record
| |-- {funnel_step_M_name}: boolean
|-- {count_1_name}: integer
...
|-- {count_N_name}: integer
...dimensions...
Funnels
Each funnel will be a STRUCT
with nested columns representing completion of each step
The types of those columns are boolean, and represent whether the user completed that
step on that day.
STRUCT(
completed_step_1 BOOLEAN,
completed_step_2 BOOLEAN,
...
) AS funnel_name
With one row per-user per-day, you can use COUNTIF(funnel_name.completed_step_N)
to query
these fields. See below for an example.
Event Counts
Each event count is simply an INT64
representing the number of times the user completed
those events on that day. If there are multiple events represented within one count,
the values are summed. For example, if you wanted to know the number of times a user
opened or closed the app, you could create a single event count with those two
events.
event_count_name INT64
Examples
The following creates a few fields:
- collection_flow
is a funnel for those that started creating
a collection within Fenix, and then finished, either by adding
those tabs to an existing collection or saving it as a new
collection.
- collection_flow_saved
represents users who started the collection
flow then saved it as a new collection.
- number_of_collections_created
is the number of collections created
- number_of_collections_deleted
is the number of collections deleted
CALL mozfun.event_analysis.create_events_view(
'fenix_collection_funnels',
'moz-fx-data-shared-prod',
'org_mozilla_firefox',
-- Funnels
[
STRUCT(
"collection_flow" AS funnel_name,
[STRUCT(
"started_collection_creation" AS step_name,
[STRUCT('collections' AS category, 'tab_select_opened' AS event_name)] AS events),
STRUCT(
"completed_collection_creation" AS step_name,
[STRUCT('collections' AS category, 'saved' AS event_name),
STRUCT('collections' AS category, 'tabs_added' AS event_name)] AS events)
] AS funnel),
STRUCT(
"collection_flow_saved" AS funnel_name,
[STRUCT(
"started_collection_creation" AS step_name,
[STRUCT('collections' AS category, 'tab_select_opened' AS event_name)] AS events),
STRUCT(
"saved_collection" AS step_name,
[STRUCT('collections' AS category, 'saved' AS event_name)] AS events)
] AS funnel)
],
-- Event Counts
[
STRUCT(
"number_of_collections_created" AS count_name,
[STRUCT('collections' AS category, 'saved' AS event_name)] AS events
),
STRUCT(
"number_of_collections_deleted" AS count_name,
[STRUCT('collections' AS category, 'removed' AS event_name)] AS events
)
]
);
From there, you can query a few things. For example, the fraction of users who completed each step of the collection flow over time:
SELECT
submission_date,
COUNTIF(collection_flow.started_collection_creation) / COUNT(*) AS started_collection_creation,
COUNTIF(collection_flow.completed_collection_creation) / COUNT(*) AS completed_collection_creation,
FROM
`moz-fx-data-shared-prod`.analysis.fenix_collection_funnels
WHERE
submission_date >= DATE_SUB(current_date, INTERVAL 28 DAY)
GROUP BY
submission_date
Or you can see the number of collections created and deleted:
SELECT
submission_date,
SUM(number_of_collections_created) AS number_of_collections_created,
SUM(number_of_collections_deleted) AS number_of_collections_deleted,
FROM
`moz-fx-data-shared-prod`.analysis.fenix_collection_funnels
WHERE
submission_date >= DATE_SUB(current_date, INTERVAL 28 DAY)
GROUP BY
submission_date
Parameters
INPUTS
view_name STRING, project STRING, dataset STRING, funnels ARRAY<STRUCT<funnel_name STRING, funnel ARRAY<STRUCT<step_name STRING, events ARRAY<STRUCT<category STRING, event_name STRING>>>>>>, counts ARRAY<STRUCT<count_name STRING, events ARRAY<STRUCT<category STRING, event_name STRING>>>>
create_funnel_regex (UDF)
Given an array of match strings, each representing a single funnel step, aggregate them into a regex string that will match only against the entire funnel. If intermediate_steps is TRUE, this allows for there to be events that occur between the funnel steps.
Parameters
INPUTS
step_regexes ARRAY<STRING>, intermediate_steps BOOLEAN
OUTPUTS
STRING
create_funnel_steps_query (Stored Procedure)
Generate the SQL statement that can be used to create an easily queryable view on events data.
Parameters
INPUTS
project STRING, dataset STRING, funnel ARRAY<STRUCT<list ARRAY<STRUCT<category STRING, event_name STRING>>>>
OUTPUTS
sql STRING
escape_metachars (UDF)
Escape all metachars from a regex string. This will make the string an exact match, no matter what it contains.
Parameters
INPUTS
s STRING
OUTPUTS
STRING
event_index_to_match_string (UDF)
Given an event index string, create a match string that is an exact match in the events_daily table.
Parameters
INPUTS
index STRING
OUTPUTS
STRING
event_property_index_to_match_string (UDF)
Given an event index and property index from an event_types
table, returns a regular expression to match corresponding events within an events_daily
table's events
string that aren't missing the specified property.
Parameters
INPUTS
event_index STRING, property_index INTEGER
OUTPUTS
STRING
event_property_value_to_match_string (UDF)
Given an event index, property index, and property value from an event_types
table, returns a regular expression to match corresponding events within an events_daily
table's events
string.
Parameters
INPUTS
event_index STRING, property_index INTEGER, property_value STRING
OUTPUTS
STRING
extract_event_counts (UDF)
Extract the events and their counts from an events string. This function explicitly ignores event properties, and retrieves just the counts of the top-level events.
Usage
extract_event_counts(
events STRING
)
events
- A comma-separated events string,
where each event is represented as a string
of unicode chars.
Example
See this dashboard for example usage.
Parameters
INPUTS
events STRING
OUTPUTS
ARRAY<STRUCT<index STRING, count INT64>>
extract_event_counts_with_properties (UDF)
Extract events with event properties and their associated counts. Also extracts raw events and their counts. This allows for querying with and without properties in the same dashboard.
Usage
extract_event_counts_with_properties(
events STRING
)
events
- A comma-separated events string,
where each event is represented as a string
of unicode chars.
Example
See this query for example usage.
Caveats
This function extracts both counts for events with each property, and for all events without their properties.
This allows us to include both total counts for an event (with any property value), and events that don't have properties.
Parameters
INPUTS
events STRING
OUTPUTS
ARRAY<STRUCT<event_index STRING, property_index INT64, property_value_index STRING, count INT64>>
get_count_sql (Stored Procedure)
For a given funnel, get a SQL statement that can be used to determine if an events string contains that funnel.
Parameters
INPUTS
project STRING, dataset STRING, count_name STRING, events ARRAY<STRUCT<category STRING, event_name STRING>>
OUTPUTS
count_sql STRING
get_funnel_steps_sql (Stored Procedure)
For a given funnel, get a SQL statement that can be used to determine if an events string contains that funnel.
Parameters
INPUTS
project STRING, dataset STRING, funnel_name STRING, funnel ARRAY<STRUCT<step_name STRING, list ARRAY<STRUCT<category STRING, event_name STRING>>>>
OUTPUTS
funnel_sql STRING