Telemetry/Data Bug Investigation Recommendations
This document outlines several diagnostic categories and the insights they may offer when investigating unusual telemetry patterns or data anomalies.
1. Countries
- Purpose: Identify geographical patterns that could explain anomalies.
- Column Name:
metadata.geo.country
- Considerations:
- Are there ongoing national holidays or similar events that could affect data?
- Is the region known for bot activity or unusual behavior?
2. ISP (Internet Service Provider)
- Purpose: Analyze data at a more granular level than countries to identify potential automation or bot activity.
- Column Name:
metadata.isp.name
- Considerations:
- Could the anomaly be traced back to a single ISP, potentially indicating automation?
- Be mindful of the large number of ISPs; consider applying filters (e.g.,
HAVING
clause) to exclude smaller ISPs.
3. Product Version / Build ID
- Purpose: Check if issues began with a specific product version or build.
- Column Names:
client_info.app_display_version
,client_info.app_build
- Considerations:
- Did the issue arise after a particular version update? If so, collaborate with the product team to identify changes.
- Ensure that the build ID matches a known Mozilla build. If not, it could be a clone, fork, or side-load build.
4. Glean SDK Version
- Purpose: Determine whether the issue is tied to a specific Glean SDK version.
- Column Name:
client_info.telemetry_sdk_build
- Considerations:
- Did the anomaly start after an update to Glean? Work with the Glean team to verify version changes.
5. Other Library Version Changes
- Purpose: Identify possible regressions due to library updates.
- Considerations:
- Review updates to Application Services, Gecko, and other dependencies (e.g., Viaduct, rkv) that could affect telemetry collection.
6. OS/Platform SDK Version
- Purpose: Check if Operating System or platform SDK changes are impacting data collection.
- Column Names:
client_info.os_version
(Android only:client_info.android_sdk_version
) - Considerations:
- Have there been changes to platform lifecycle events or background task behaviors (e.g., 0-duration pings, or ping submission issues)?
- Has the OS changed the behaviour of system APIs?
7. Time Differences: start/end_time vs. submission_timestamp
- Purpose: Assess the delay between telemetry collection and submission.
- Column Names:
ping_info.parsed_start_time
,ping_info.parsed_end_time
,submission_timestamp
- Considerations:
- Are the recorded timestamps reasonable, both in terms of the ping time window and the delay from collection to submission?
8. Glean Errors
- Purpose: Identify telemetry or network errors related to data collection.
- Considerations:
- Are there networking errors, ingestion issues, or other telemetry failures that could be related to the anomaly?
9. Hardware Details (Manufacturer/Version) (Mobile platforms only)
- Purpose: Determine if the issue is hardware-specific.
- Column Names:
client_info.device_manufacturer
,client_info.device_model
- Considerations:
- Does the anomaly occur primarily on older or newer hardware models?
10. Ping reason
- Purpose: Determine the reason a ping was sent.
- Column Names:
ping_info.reason
- Considerations:
- Does the anomaly occur primarily for a specific reason?
- The built-in pings have different ping reasons based on their schedule