Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we speed up the refresher stage by seperating refresh and reload into 2 steps? #268

Open
ghost opened this issue Jul 20, 2023 · 3 comments

Comments

@ghost
Copy link

ghost commented Jul 20, 2023

Currently, the service_loop() Runs

  • refresh() in single thread
  • THEN reload multi thread
  • THEN It sleeps a minute, I guess to avoid hammering registry API

The problem is that a long reload process prevents refresh data coming in.

Can we separate these into 2 service loops and 2 stages?

I don't think one depends on the other?

@akmiller01
Copy link
Contributor

They could be separated, but I think we might also need to make sure there couldn't be a race condition between a document being picked up by refresh while it's in the middle of downloading by reload.

@ghost
Copy link
Author

ghost commented Jul 20, 2023

Is that a current concern anyway - a race condition between the refresh and any of the other later stages (eg validate, solrize)?

@akmiller01
Copy link
Contributor

akmiller01 commented Jul 20, 2023

Not to my knowledge. Most steps have a flag or a timestamp that indicates the end of processing, and the subsequent steps wait for that flag or end timestamp before picking the file up. My worry would be that reload can be so long running, that a publisher could update a file in the middle of a reload running. So refresh would update the database with modified, and then reload would overwrite it with downloaded despite not picking up the new modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant