Skip to content

Linkage Monitor Analytics

Elliotte Rusty Harold edited this page Nov 16, 2020 · 13 revisions
  • Status: Draft Proposal, not implemented
  • Authors: @elharo
  • Contributors:
  • Last updated: 2020-11-16

Objective

Measure usage of linkage monitor.

Background

We'd like to have a good overview of:

  • How many repositories and projects use the linkage monitor
  • How many PRs it checks
  • How many linkage errors it finds.

Overview

We will create a new Google Analytics Project. We'll use the Google Analytics Java client library to collect metrics about

  • Number of runs
  • Linkage errors detected
  • Number of repositories installed in
  • Number of artifacts

This will all be behind a flag which is off by default.

Infrastructure

Google Analytics

Detailed design

We hook into com.google.cloud.tools.dependencies.linkagemonitor.LinkageMonitor. No other packages will include analytics code or depend on GA in any way.

In particular the Maven enforcer rule and the dependencies library will not have any dependencies on analytics.

We will collect:

  • URL of the Github repository
  • Github repository name
  • Github organization
  • Linkage monitor version
  • Java version
  • Maven version
  • PR number and URL
  • Linkage errors detected
  • Amount of time the tool ran

We do not include any user data or personally identifiable information, as can be seen above.

Caveats

Latency

Google analytics pings are asynchronous and should not block our existing code.

Scalability

Google Analytics handles much larger systems and traffic than this.

Dependency considerations

If Google Analytics goes down, metrics might be lost. However the asynchronous nature of the client library means the linkage monitor will not fail to run.

Data integrity

We rely on Google analytics to store and retrieve all data. Worst case, this data is not critical and can be lost.

SLA requirements

Same as GA.

Security considerations

We will need a client key for Google analytics that is not published in the Github repository but is bundled into the jar file as part of the build process.

Privacy considerations

We collect information about open source repositories and build systems only. We do not collect any information about any people.

Furthermore, we whitelist the Github organizations we collect information from. Organizations include:

  • GoogleCloudPlatform
  • googleapis
  • google
  • census-instrumentation
  • grpc

We may further restrict this by repository; for instance, to allow collection of information from Apache Beam but not all Apache projects.

Testing plan

TBD

Work estimates

TBD

Launch plans

TBD

Rollback strategy