Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix automatic size hint on uploads #626

Merged
merged 1 commit into from
Feb 20, 2024

Conversation

JustAnotherArchivist
Copy link
Contributor

When uploading without a size hint (via headers or --size-hint), a x-archive-size-hint header is added automatically. However, prior to this commit, the value was the individual file size on each file's PUT request. This effectively made the size hint useless because it does not, in fact, provide a size hint for the total item size to the S3 backend at item creation time.

Notes on detailed changes to implement this:

  • Rename internetarchive.utils.recursive_file_count to recursive_file_count_and_size, adding a wrapper for backwards compatibility
  • Add support for paths (rather than only file-like objects) to internetarchive.utils.get_file_size
  • Add a internetarchive.utils.is_filelike_obj helper function
  • Fix a bug introduced by 62c8513 where total_files would never be None and so recursive_file_count was never called, possibly leading to incorrect derive queueing.
  • Add tests for the fixed behaviour

When uploading without a size hint (via headers or `--size-hint`), a x-archive-size-hint header is added automatically. However, prior to this commit, the value was the individual file size on each file's `PUT` request. This effectively made the size hint useless because it does not, in fact, provide a size hint for the total item size to the S3 backend at item creation time.

Notes on detailed changes to implement this:

* Rename `internetarchive.utils.recursive_file_count` to `recursive_file_count_and_size`, adding a wrapper for backwards compatibility
* Add support for paths (rather than only file-like objects) to `internetarchive.utils.get_file_size`
* Add a `internetarchive.utils.is_filelike_obj` helper function
* Fix a bug introduced by 62c8513 where `total_files` would never be `None` and so `recursive_file_count` was never called, possibly leading to incorrect derive queueing.
* Add tests for the fixed behaviour
@jjjake jjjake merged commit d36093a into jjjake:master Feb 20, 2024
12 of 13 checks passed
@jjjake
Copy link
Owner

jjjake commented Feb 20, 2024

Looks great. Thanks again @JustAnotherArchivist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants