"Real-Time" Events
Defining "real-time" events within the Glean SDK
For the purposes of the Glean SDK and its capabilities, "real-time" is limited to: minimizing the time between instrumentation and reporting. It does not imply or describe how quickly received data is made available for querying.
Methods to achieve this with Glean
Option 1: Configuring Glean to send all events as soon as they are recorded
Glean "events" ping submission can be configured either during initialization or through Server Knobs.
Setting the maximum event threshold to a value of 1
will configure the Glean SDK to submit an "events" ping for each and every event as they
are recorded. By default, the Glean SDK will batch 500 events per "events" ping.
Option 2: Using a custom ping and submitting it immediately ("Pings-as-Events")
If it isn't necessary to receive all Glean SDK events that are instrumented in an application in "real-time", it may be preferable to create a custom ping which contains the relevant information to capture the context around the event and submit it as soon as the application event occurs.
This has some additional advantages over using just an event in that custom pings are less restrictive than the extras attached to the event in what data and Glean SDK metric types can be used.
If it is important to see the event that is being represented as a custom ping in context with other application events, then you only need to
define an event metric and use the send_in_pings
parameter to send it in both the custom ping and the Glean built-in "events" ping. It can
then be seen in sequence and within context of all of the application events, and still be sent in "real-time" as needed.
Considerations
What "real-time" Glean events/pings are not
Configuring the Glean SDK to submit events as soon as they are recorded or using custom pings to submit data immediately does not mean that the data is available for analysis in real time. There are networks to traverse, ingestion pipelines, etl, etc. that are all factors to keep in mind when considering how soon the data is available for analysis purposes. This documentation only purports to cover configuring the Glean SDK to send the data in a real-time fashion and does not make any assumptions about the analysis of data in real-time.
More network requests
For every event recorded or custom ping submitted, a network request will be generated as the ping is submitted for ingestion. By default, the Glean SDK batches up to 500 events per "events" ping, so this has the potential to generate up to 500 times as many network requests than the current defaults for the Glean SDK "events" ping.
More ingestion endpoint traffic
As a result of the increased network requests, the ingestion endpoint will need to handle this additional traffic. This increases the load of all the processing steps that are involved with ingesting event data from an application.
Storage space requirements
Typically the raw dataset for Glean events contains 1-500 events in a single row of the database. This row also includes metadata such as information about the client application and the ping itself. With only a single event per "events" ping, the replication of this metadata across the database will use additional space to house this repeated information that should rarely if ever change between events