-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: get repos from graphql #120
Conversation
context: codecov/engineering-team#971 Adds function to get repositories from Github's GraphQL api. It uses node_id to get the specified node from gh. This can be found in the INSTALLATION webhook that we will process. Along with it there will be the databaseId (service_id). Adding the `expected_owner` entry because when processing an installation webhook we do know what the expected owner is. Fetching the owner would require another request, but can be done (again by node_id) The idea is to use this function when syncing repos if we know what the repos covered by an installation are.
Codecov ReportAttention:
@@ Coverage Diff @@
## main #120 +/- ##
==========================================
+ Coverage 89.52% 89.55% +0.03%
==========================================
Files 115 115
Lines 9392 9434 +42
Branches 1499 1506 +7
==========================================
+ Hits 8408 8449 +41
Misses 781 781
- Partials 203 204 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention:
📢 Thoughts on this report? Let us know! |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #120 +/- ##
==========================================
+ Coverage 89.52% 89.55% +0.03%
==========================================
Files 115 115
Lines 9392 9434 +42
Branches 1499 1506 +7
==========================================
+ Hits 8408 8449 +41
Misses 781 781
- Partials 203 204 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #120 +/- ##
==========================================
+ Coverage 91.54% 91.60% +0.05%
==========================================
Files 122 122
Lines 9413 9490 +77
Branches 1613 1621 +8
==========================================
+ Hits 8617 8693 +76
Misses 781 781
- Partials 15 16 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -47,11 +50,46 @@ | |||
} | |||
} | |||
} | |||
""" | |||
""", | |||
REPOS_FROM_NODEIDS=""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does github let you fetch the whole list without pagination or will it paginate for you with some default page size?
codecov/engineering-team#139 has an example of pagination with github's graphql api if we need to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good question that I probably should have put in the PR description.
I spent quite some time trying to get pagination in this query. From the docs it indicates you can specify a first
, last
and get pageInfo
cursors.
But from the explorer docs the nodes
query doesn't include the inputs for first
and last
(I tried, didn't work) AND the Repostory
object doesn't include pageInfo
as a possible field to query. For that reason I don't think there's pagination? At least I couldn't find a way to paginate the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: I decided to implement simple pagination from our side just in case.
...tegration/cassetes/test_github/TestGithubTestCase/test_get_repos_from_nodeids_generator.yaml
Show resolved
Hide resolved
There's been question if the nodes query has pagination. So we don't trip up unecessarilly I decided to implement simple pagination ourselves. I made page_size 100 as github docs indicate that that's the max number of records you can get at once from endpoints that use `first` or `last` pagination. 100 repos is a lot of repos. I was also not sure if _all_ repos will actually belong to the same owner. Just in case I decided to add owner lookup as well. Notice it does 1 extra request per unique owner that we encounter. We can use this info to upsert owners if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change looks good to me, would be nice not to need an extra query if we can work that owner info into the first one
owner { | ||
# This ID is actually the node_id, not the ownerid | ||
id | ||
login | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a separate query for getting owner information or could we include databaseId here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have included if it was possible, but that is not a User
or Organization
object. It's some other node that only include those 2 pieces of information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIP
context: codecov/engineering-team#971
Adds function to get repositories from Github's GraphQL api.
It uses node_id to get the specified node from gh. This can be found in
the INSTALLATION webhook that we will process.
Along with it there will be the databaseId (service_id).
Adding the
expected_owner
entry because when processing an installation webhookwe do know what the expected owner is. Fetching the owner would require another request,
but can be done (again by node_id)
The idea is to use this function when syncing repos if we know what the repos
covered by an installation are.