Telemetry/Data Bug Investigation Recommendations

This document outlines several diagnostic categories and the insights they may offer when investigating unusual telemetry patterns or data anomalies.

1. Countries

  • Purpose: Identify geographical patterns that could explain anomalies.
  • Column Name: metadata.geo.country
  • Considerations:
    • Are there ongoing national holidays or similar events that could affect data?
    • Is the region known for bot activity or unusual behavior?

2. ISP (Internet Service Provider)

  • Purpose: Analyze data at a more granular level than countries to identify potential automation or bot activity.
  • Column Name: metadata.isp.name
  • Considerations:
    • Could the anomaly be traced back to a single ISP, potentially indicating automation?
    • Be mindful of the large number of ISPs; consider applying filters (e.g., HAVING clause) to exclude smaller ISPs.

3. Product Version / Build ID

  • Purpose: Check if issues began with a specific product version or build.
  • Column Names: client_info.app_display_version, client_info.app_build
  • Considerations:
    • Did the issue arise after a particular version update? If so, collaborate with the product team to identify changes.
    • Ensure that the build ID matches a known Mozilla build. If not, it could be a clone, fork, or side-load build.

4. Glean SDK Version

  • Purpose: Determine whether the issue is tied to a specific Glean SDK version.
  • Column Name: client_info.telemetry_sdk_build
  • Considerations:
    • Did the anomaly start after an update to Glean? Work with the Glean team to verify version changes.

5. Other Library Version Changes

  • Purpose: Identify possible regressions due to library updates.
  • Considerations:
    • Review updates to Application Services, Gecko, and other dependencies (e.g., Viaduct, rkv) that could affect telemetry collection.

6. OS/Platform SDK Version

  • Purpose: Check if Operating System or platform SDK changes are impacting data collection.
  • Column Names: client_info.os_version (Android only: client_info.android_sdk_version)
  • Considerations:
    • Have there been changes to platform lifecycle events or background task behaviors (e.g., 0-duration pings, or ping submission issues)?
    • Has the OS changed the behaviour of system APIs?

7. Time Differences: start/end_time vs. submission_timestamp

  • Purpose: Assess the delay between telemetry collection and submission.
  • Column Names: ping_info.parsed_start_time, ping_info.parsed_end_time, submission_timestamp
  • Considerations:
    • Are the recorded timestamps reasonable, both in terms of the ping time window and the delay from collection to submission?

8. Glean Errors

  • Purpose: Identify telemetry or network errors related to data collection.
  • Considerations:
    • Are there networking errors, ingestion issues, or other telemetry failures that could be related to the anomaly?

9. Hardware Details (Manufacturer/Version) (Mobile platforms only)

  • Purpose: Determine if the issue is hardware-specific.
  • Column Names: client_info.device_manufacturer, client_info.device_model
  • Considerations:
    • Does the anomaly occur primarily on older or newer hardware models?

10. Ping reason