Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-running clients see 0-byte files uploaded after 401 and automatic refresh of access token #344

Open
4 tasks done
erewok opened this issue Oct 21, 2024 · 4 comments
Open
4 tasks done
Assignees
Labels
bug Something isn't working

Comments

@erewok
Copy link

erewok commented Oct 21, 2024

Description of the Issue

We are seeing 0-byte files uploaded to Box when an access token has expired and a new token has been (automatically) issued. We did not see this on the old Python SDK.

Our code looks roughly like this (I typed it from an example, so may not be perfect, but should convey the idea):

def get_client(box_config_json_path: str) -> BoxClient:
    with open(box_config_json_path, "r") as fl:
        box_config_json_string = fl.read()

    jwt_config = JWTConfig.from_config_json_string(box_config_json_string)
    auth = BoxJWTAuth(config=jwt_config)
    client = BoxClient(auth=auth)
    return client

class OurBoxClient:
   def __init__(box_config_json_path: str):
    self.box_config_json_string = box_config_json_path
    self._client = get_client(self.box_config_json_path)


    def upload_file(self, box_folder_id: str, file_content_stream: BinaryIO, file_name: str) -> None:

        self._client.uploads.upload_file(
                UploadFileAttributes(
                    name=file_name,
                    parent=UploadFileAttributesParentField(id=box_folder_id),
                ),
                file_content_stream,
            )

Most of the time this works fine.

However, we sometimes see a pattern where when a token has expired and the client automatically renews its access token and when it attempts to upload again after renewal, it results in a 0-byte file. We have duplicated these files over to a cloud blob storage in order to confirm that these are not 0-byte files.

Here are some screenshots of our otel tracing spans, which show the flow where we're seeing an error:

First, our code to upload a file fires and the client receives a 401:

Screenshot 2024-10-21 at 9 16 26 AM

Next: the client automatically refreshes its token (receives a 200 OK response on a POST to "https://api.box.com/oauth2/token") and then it uploads the file and receives a 201 response:

Screenshot 2024-10-21 at 9 16 49 AM

Finally, this results in an empty file (but this same file is fine in our object storage):

Screenshot 2024-10-21 at 9 29 43 AM

To reiterate: this does not happen all the time. It's somehow related to an expired token and we tend to see it at the beginning of the day.

Steps to Reproduce

  1. Create a long-running client
  2. Upload a file and check file-size
  3. Wait for access token expiry
  4. Upload a file and check file-size

Expected Behavior

Access tokens are automatically renewed and files upload successfully.

Error Message, Including Stack Trace

There are no errors raised in our code. The otel tracing spans show client behavior, which is otherwise opaque to our application.

Screenshots

See above.

Versions Used

Python SDK: 1.5.1
Python: 3.12.5

@mwwoda
Copy link
Contributor

mwwoda commented Oct 22, 2024

Did you try to upload the same file again after uploading it with 0 bytes? Is the file uploaded correctly after doing this?

Can you reproduce it consistently? Some sample code that we could run and observe the issue could be useful to find the root cause.

What kind of stream are you using for the upload (e.g. file stream)?

I wasn't able to reproduce your issue so far, but it could be related to the potentially incorrect stream position when request is retried. In case of chunked uploads we reset the stream position to 0 when request is retried. I see that it's not done in case of multipart requests. I could try to introduce the fix for it. Hopefully it will solve this problem.

@erewok
Copy link
Author

erewok commented Oct 22, 2024

Did you try to upload the same file again after uploading it with 0 bytes? Is the file uploaded correctly after doing this?

This problem is found by our staff and they send in a resubmit request which this application picks up to reprocess the file. It is successfully uploaded on the second try.

Can you reproduce it consistently? Some sample code that we could run and observe the issue could be useful to find the root cause.

No, I can't reproduce it consistently. But it may be worth mentioning the following:

  • This code has been in production with few modifications since 2019,
  • The only recent change is moving from the old python box SDK to this new library, and
  • When we duplicated all files over to our cloud storage, we saw that when these 0-byte files appeared on Box, these same files were not 0-bytes on our cloud storage: only on Box did we have broken files.

What kind of stream are you using for the upload (e.g. file stream)?

The code I shared above is part of some complicated code that builds an excel spreadsheet, but it's ultimately called by the following (probably unsurprising) lines:

def upload_to_box(
    upload_filename: str,
    upload_type: FileUploadType,
    box_folder_id: str,
    local_file_path: Path | None = None,
) -> str:
    """Upload file to Box."""
    if not local_file_path or not local_file_path.exists():
        raise FileUploadError(
            "`local_file_path` required", upload_filename, upload_type
        )
    with local_file_path.open("rb") as file_stream:
        filename = box_client.upload_file(box_folder_id, file_stream, upload_filename)

@mwwoda
Copy link
Contributor

mwwoda commented Oct 22, 2024

Thanks for the sample. Looks just like normal file stream.

It is successfully uploaded on the second try.

So it looks like, it could be issue with how the retries for multipart requests are handled.

I submitted small fix for now - #348. It resets the stream position to 0 on retry which should help in some scenarios (we could think about setting it to the original position of the stream instead). Are you able to test it before we release new package version?

@erewok
Copy link
Author

erewok commented Oct 22, 2024

Thanks for that. I can test it, yeah, and I'd be happy to do so. I may have to leave it running for a few days to reproduce the conditions where the bug appeared if that's alright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants