-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Content Discovery pagination support #2715
Comments
@eusebiu-constantin-petu-dbk can you take a look pls? |
@uvegla what would be a good upstream registry to test this against? We may want to include that in our ci tests as well. |
@rchincha The registry where I bumped into this issue is a private one of our own. I guess any large, public OCI registry could work. However, that way you are relying on a source outside of your control in the tests and depending on how often the tests run, you are bombarding it with requests, the larger the registry, the more. Also they dynamically change, so what sort of assertion would we do? Without knowing the complexity of it, my initial approach would be to bring up a Zot in the test env and push lets say 250 empty images into it under different names and tags and query that. This way, if you are testing the actual active - not on-demand - replication as well, you could then query the zot in test to see if the images are there that you want, cos you have a full and stable picture of what to expect. What do you think? |
Signed-off-by: Eusebiu Petu <[email protected]>
Signed-off-by: Eusebiu Petu <[email protected]>
Signed-off-by: Eusebiu Petu <[email protected]>
Signed-off-by: Eusebiu Petu <[email protected]>
I took a look and something is fishy. 🤔 The discovery is taking place following the links, but some images are still not getting active synched. Debugging into the code I think the cause is that list of repositories is constructed somehow wrong. The 1st 100 item in the list is the result of the 2nd page. Then item 101-200 is the 2nd page again. Pages after than look good, but then this way items in the 1st 100 repository is ignored. I will take a look bit further down into the code to see why. If I figure out something, I will update this thread. EDIT: Found the issue and suggested a solution. |
@uvegla thanks for the PR review. |
zot version
v2.1.1
Describe the bug
When setting up
onDemand: false
sync extension rules, the prefix matching is done only the 1st page result of the content discovery made towards the remote registry. With large registries this results in some images never getting synced by Zot.The pagination rules are defined by the specification here: https://github.com/opencontainers/distribution-spec/blob/v1.1.0/spec.md#content-discovery
The code that does the discovery is rooted here: https://github.com/project-zot/zot/blob/v2.1.1/pkg/extensions/sync/service.go#L221-L232
It makes a call to
<remote>/v2/_catalog
and only makes it once. Going further down the path, theLink
response header is not checked and simply the response body is returned, thus not getting a full picture of the registry.This could potentially be an issue for listing tags as well - as according to the docs, that supports pagination as well - but I did not check that use case.
To reproduce
Configuration
You have to try this with a large registry with more than a 100 repositories. Relevant part of the Zot JSON configuration:
Then simply run:
Assuming there are more than 100 images in the repository and
my-org/zzz-operator
is alphabetically beyond 100, the syncing will never occur for it, practically it is ignored.Manual reproduction
Expected behavior
In the
GetRepositories
logic theLink
response header should be respected and followed until it is empty. Potentially this might affect listing tags for repositories as well, according to the specification.Screenshots
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: