Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Item status check: error on maximum item size exceedance and test with specific identifier/access key #485

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ Unreleased
- advancedsearch.php endpoint now supports IAS3 authorization.
- ``ia upload`` now has a ``--keep-directories`` option to use the full local file paths as the
remote name.
- ``ia upload <identifier> --status-check`` now checks whether the item exceeds the maximum size.
It further now also checks S3 with the specific identifier and access key rather than whether S3
is overloaded in general.

**Bugfixes**

Expand Down
14 changes: 10 additions & 4 deletions internetarchive/cli/ia_upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,9 @@
is_valid_metadata_key, json, validate_s3_identifier)


MAX_ITEM_SIZE = 2**40 # 1 TiB


def _upload_files(item, files, upload_kwargs, prev_identifier=None, archive_session=None):
"""Helper function for calling :meth:`Item.upload`"""
responses = []
Expand Down Expand Up @@ -160,19 +163,22 @@ def main(argv, session):
sys.exit(1)

# Status check.
if args['<identifier>']:
item = session.get_item(args['<identifier>'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to get an item that could be 1TB or more before we do a status-check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we need the Item object for both the size status check and the actual upload. While this structure means we needlessly fetch the item metadata when S3 is overloaded, it avoids more complicated conditions (e.g. first run the S3 overload check if --status-check is present, then fetch item metadata if the identifier is present, then check the item size if both are present, then exit successfully if --status-check is present), which in my opinion leads to less readable code. Alternatives are two lines with get_item calls, which is just as ugly, or some sort of lazy evaluation, which is somewhat complicated to implement. So I found this to be the least awkward solution.

if args['--status-check']:
if session.s3_is_overloaded():
if session.s3_is_overloaded(identifier=args['<identifier>'], access_key=session.access_key):
print(f'warning: {args["<identifier>"]} is over limit, and not accepting requests. '
'Expect 503 SlowDown errors.',
file=sys.stderr)
sys.exit(1)
elif item.item_size >= MAX_ITEM_SIZE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif item.item_size >= MAX_ITEM_SIZE:
elif item.item_size > MAX_ITEM_SIZE:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require some testing whether IA's servers still accept any upload (including an empty file) if the item is exactly 1 TiB. Might be tricky though since I think the metadata files, which get modified after every upload, also count towards the item size.

print(f'warning: {args["<identifier>"]} is exceeding the maximum item size '
'and not accepting uploads.', file=sys.stderr)
sys.exit(1)
else:
print(f'success: {args["<identifier>"]} is accepting requests.')
sys.exit()

elif args['<identifier>']:
item = session.get_item(args['<identifier>'])

# Upload keyword arguments.
if args['--size-hint']:
args['--header']['x-archive-size-hint'] = args['--size-hint']
Expand Down
17 changes: 17 additions & 0 deletions tests/cli/test_ia_upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ def test_ia_upload_invalid_identifier(capsys, caplog):

def test_ia_upload_status_check(capsys):
with IaRequestsMock() as rsps:
rsps.add_metadata_mock('nasa')
rsps.add(responses.GET, f'{PROTOCOL}//s3.us.archive.org',
body=STATUS_CHECK_RESPONSE,
content_type='application/json')
Expand All @@ -53,6 +54,7 @@ def test_ia_upload_status_check(capsys):
j = json.loads(STATUS_CHECK_RESPONSE)
j['over_limit'] = 1
rsps.reset()
rsps.add_metadata_mock('nasa')
rsps.add(responses.GET, f'{PROTOCOL}//s3.us.archive.org',
body=json.dumps(j),
content_type='application/json')
Expand All @@ -62,6 +64,21 @@ def test_ia_upload_status_check(capsys):
assert ('warning: nasa is over limit, and not accepting requests. '
'Expect 503 SlowDown errors.') in err

def fake_big_item(body):
body = json.loads(body)
body['item_size'] = 2**41 # 2 TiB
return json.dumps(body)

rsps.reset()
rsps.add_metadata_mock('nasa', transform_body=fake_big_item)
rsps.add(responses.GET, f'{PROTOCOL}//s3.us.archive.org',
body=STATUS_CHECK_RESPONSE,
content_type='application/json')

ia_call(['ia', 'upload', 'nasa', '--status-check'], expected_exit_code=1)
out, err = capsys.readouterr()
assert 'warning: nasa is exceeding the maximum item size and not accepting uploads.' in err


def test_ia_upload_debug(capsys, tmpdir_ch, nasa_mocker):
with open('test.txt', 'w') as fh:
Expand Down