-
Notifications
You must be signed in to change notification settings - Fork 76
Linkage Monitor Analytics
- Status: Draft Proposal, not implemented
- Authors: @elharo
- Contributors:
- Last updated: 2020-11-16
Measure usage of linkage monitor.
We'd like to have a good overview of:
- How many repositories and projects use the linkage monitor
- How many PRs it checks
- How many linkage errors it finds.
We will create a new Google Analytics Project. We'll use the Google Analytics Java client library to collect metrics about
- Number of runs
- Linkage errors detected
- Number of repositories installed in
- Number of artifacts
This will all be behind the send-analytics
flag which is off by default. We'll update com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor so we run it from the script like
java -jar ${JAR} --send-analytics com.google.cloud:libraries-bom
If the flag is present, the monitor pings GA when it runs. If the flag isn't present, it doesn't ping.
Google Analytics
We hook into com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor
. No other packages
will include analytics code or depend on GA in any way.
In particular the Maven enforcer rule and the dependencies library will not have any dependencies on analytics.
We will collect:
- URL of the Github repository
- Github repository name
- Github organization
- Linkage monitor version
- Java version
- Maven version
- PR number and URL
- Linkage errors detected
- Amount of time the tool ran
We do not include any user data or personally identifiable information, as can be seen above.
Google analytics pings are asynchronous and should not block our existing code.
Google Analytics handles much larger systems and traffic than this.
If Google Analytics goes down, metrics might be lost. However the asynchronous nature of the client library means the linkage monitor will not fail to run.
We rely on Google analytics to store and retrieve all data. Worst case, this data is not critical and can be lost.
Same as GA.
We will need a client key for Google analytics that is not published in the Github repository but is bundled into the jar file as part of the build process.
We collect information about open source repositories and build systems only. We do not collect any information about any people.
Furthermore, we whitelist the Github organizations we collect information from. Organizations include:
- GoogleCloudPlatform
- googleapis
- census-instrumentation
- grpc
We may further restrict this by repository; for instance, to allow collection of information from Apache Beam but not all Apache projects.
TBD
TBD
TBD