-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce cron jobs to clean up the export cache (data/cache/export/
) and temporary files/directories (data/tmp/
)
#8804
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request introduces a comprehensive refactoring of the export and cache management system in the CVAT application. The changes span multiple files and focus on improving the modularity, error handling, and organization of export-related functionality. Key modifications include introducing new classes for file type management, restructuring export cache handling, adding periodic cleanup jobs for export caches, and enhancing the overall file management process across projects, tasks, and jobs. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
b885dac
to
f20eadf
Compare
f20eadf
to
92f9d0b
Compare
/check |
❌ Some checks failed |
14084b8
to
e4c24ea
Compare
/check |
✔️ All checks completed successfully |
ed20c3d
to
476c688
Compare
@zhiltsov-max, @SpecLad, okay, I've made several changes related to working with tmp dirs (including the export process):
|
Do you know why any value other than 1 day may be needed? Is there anything potentially useful in that directory?
The only concern I have about this is that we're still using |
7da8a17
to
2c6794a
Compare
2c6794a
to
716f13c
Compare
data/cache/export/
) and temporary files/directories (data/tmp/
)
No, I don't. I set this value to make sure I don't delete something we need too early.
|
|
||
In version 2.25.0, CVAT changed the location where the export cache is stored. | ||
To clean up the outdated cache, run the following command: `python manage.py exportcachecleanup`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enqueuing RQ jobs to backup a project/task: A filename argument was removed from the background function signature. Previously enqueued jobs will fail.
What do you think about adding a one-off migration script to avoid losing existing RQ jobs? Unlike cleanup jobs these are real user requests. It should probably just remove 1 parameter from the existing jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we'll have to make this PR dependent on #8898
@@ -542,7 +559,7 @@ def _export_task(self, zip_obj, target_dir=None): | |||
self._write_annotations(zip_obj, target_dir) | |||
self._write_annotation_guide(zip_obj, target_dir) | |||
|
|||
def export_to(self, file, target_dir=None): | |||
def export_to(self, file: str | ZipFile, target_dir: str | None = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an error, but such annotations in inherited functions tend to become outdated quickly. Typically it's better to just use annotations from the original function instead of duplicating them everywhere.
# FUTURE-FIXME: there db_instance_id should be passed | ||
db_instance: models.Project | models.Task, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not change this parameter if the function args are already changed in this PR?
progress = (i + 1) / objects_count | ||
done = int(progress_bar_len * progress) | ||
progress_bar = "#" * done + "-" * (progress_bar_len - done) | ||
self.stdout.write(f"\rProgress: |{progress_bar}| {progress:.0%}", ending="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not work on mac and windows. Why not use tqdm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ask Roman about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SpecLad, I saw you were saying it's not installed, but why not install it? Do you see any problems with that?
cvat/apps/dataset_manager/management/commands/exportcachecleanup.py
Outdated
Show resolved
Hide resolved
log_exception(logger) | ||
|
||
|
||
def cleanup(thread_class_path: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python manage.py runperiodicjob cron_export_cache_cleanup
TypeError: cleanup() missing 1 required positional argument: 'thread_class_path'
seconds_left = rq_job.timeout - 60
AttributeError: 'NoneType' object has no attribute 'timeout'
Probably, default args should be passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runperiodicjob
command was added this week. This command should support passing args rather than setting the default value for the cleanup
function.
AttributeError: 'NoneType' object has no attribute 'timeout'
because this command should be executed only from the worker process (by the current design). If you think that it will be useful to allow running this command not only by worker process, please provide reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a periodic job, and we have a command to run periodic jobs manually. Probably, it should support such execution or clarify why it doesn't.
Quality Gate passedIssues Measures |
{{< tabpane lang="shell" >}} | ||
{{< tab header="Docker" >}} | ||
docker exec -it cvat_server python manage.py exportcachecleanup | ||
{{< /tab >}} | ||
{{< tab header="Kubernetes" >}} | ||
cvat_backend_pod=$(kubectl get pods -l component=server -o 'jsonpath={.items[0].metadata.name}') | ||
kubectl exec -it ${cvat_backend_pod} -- python manage.py exportcachecleanup | ||
{{< /tab >}} | ||
{{< tab header="Development" >}} | ||
python manage.py exportcachecleanup | ||
{{< /tab >}} | ||
{{< /tabpane >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{< tabpane lang="shell" >}} | |
{{< tab header="Docker" >}} | |
docker exec -it cvat_server python manage.py exportcachecleanup | |
{{< /tab >}} | |
{{< tab header="Kubernetes" >}} | |
cvat_backend_pod=$(kubectl get pods -l component=server -o 'jsonpath={.items[0].metadata.name}') | |
kubectl exec -it ${cvat_backend_pod} -- python manage.py exportcachecleanup | |
{{< /tab >}} | |
{{< tab header="Development" >}} | |
python manage.py exportcachecleanup | |
{{< /tab >}} | |
{{< /tabpane >}} | |
<!--lint disable no-undefined-references--> | |
{{< tabpane lang="shell" >}} | |
{{< tab header="Docker" >}} | |
docker exec -it cvat_server python manage.py exportcachecleanup | |
{{< /tab >}} | |
{{< tab header="Kubernetes" >}} | |
cvat_backend_pod=$(kubectl get pods -l component=server -o 'jsonpath={.items[0].metadata.name}') | |
kubectl exec -it ${cvat_backend_pod} -- python manage.py exportcachecleanup | |
{{< /tab >}} | |
{{< tab header="Development" >}} | |
python manage.py exportcachecleanup | |
{{< /tab >}} | |
{{< /tabpane >}} | |
<!--lint enable no-undefined-references--> |
It looks like there is some bug in remarklint.
Motivation and context
Depends on #8721PR introduces the following changes:
data/tmp/
) instead ofproject|task|job/id/tmp/export_cache/
exportcachecleanup
management command to remove outdatedproject|task|job/id/tmp/export_cache/
directoriesdata/tmp/
directory. The new settingTMP_FILE_OR_DIR_RETENTION_DAYS
is used to determine whether a file or directory should be removedNotes:
data/project|task|job/id/tmp/
is still used during uploading annotations/datasets, but this should be fixed in a separate PR.Breaking changes:
cvat.apps.dataset_manager.views.clear_export_cache
was moved andcvat.apps.engine.backup._clear_export_cache
was deleted.How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation
Chores