Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate Project health metrics from devstats #88

Open
halcyondude opened this issue Aug 1, 2022 · 3 comments
Open

Incorporate Project health metrics from devstats #88

halcyondude opened this issue Aug 1, 2022 · 3 comments
Labels
brainstorm idea incubation, not always actionable. data Data Model, GraphQL

Comments

@halcyondude
Copy link
Collaborator

https://all.devstats.cncf.io/d/53/projects-health-table?orgId=1

image

@halcyondude halcyondude added data Data Model, GraphQL enhancement New feature or request labels Aug 1, 2022
@halcyondude halcyondude moved this to Triage in landscape-graph Aug 1, 2022
@alolita
Copy link
Member

alolita commented Aug 2, 2022

Is this data available in JSON?

@halcyondude
Copy link
Collaborator Author

halcyondude commented Aug 4, 2022

Caveat: I started peeling back the layers of devstats shortly (minutes) ago, still assessing.

High Level: https://github.com/cncf/devstats#architecture
Detailed, Specific overview: https://github.com/cncf/devstats-helm#architecture

This is the best overview that I've found thus far detailing how it works, and inputs to the design.

https://github.com/cncf/devstats/blob/master/ARCHITECTURE.md

In a nutshell, they are parsing the GH archives to avoid pulling the ocean thru a straw (rate limiting) to access the full event stream, as well as keeping (as part of devstats) local git clones (in individual PV's) for file info. This is somewhat similar to gitbase in it's design.

I think bulk loading of git commits / history into a graph will be more readily accomplished w/ gitbase's mysql endpoint as an etl source. However devstats does an amazing amount of aggregation and summarization already today.


The data are available in a few different ways/layers. There's the raw data from https://www.gharchive.org, a REST API, database dumps, and grafana dashboards, and I'm not yet sure what else :)

REST API

api docs https://github.com/cncf/devstatscode/blob/master/API.md
endpoint https://devstats.cncf.io/api/v1
impl (in Go) https://github.com/cncf/devstatscode/blob/master/cmd/api/api.go

it's REST w/ a markdown doc, so something like https://github.com/ibm/openapi-to-graphql isn't possible.

Database Dumps

https://devstats.cncf.io/backups

Grafana

Dashboards use the PG data source and have queries...but is brittle, would require running devstats or access to underlying DB, or standing up a new DB w/ this data

@halcyondude halcyondude added brainstorm idea incubation, not always actionable. and removed enhancement New feature or request labels Aug 4, 2022
@halcyondude
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorm idea incubation, not always actionable. data Data Model, GraphQL
Projects
Status: Triage
Development

No branches or pull requests

2 participants