Fix a bug in the `_upload_file_part_concurrent method` #910

nils-braun · 2024-11-05T09:24:40Z

The _upload_file_part_concurrent method is used as part of the put_file function to upload the file in multiple parts (when the file is larger than a certain limit).
The function basically reads from the original file in chunks (by default 50MB) and then schedules 10 upload calls in one block. It has two different "branches": if there is more than one chunk left, it schedules them in parallel - if not, it just runs it directly.

This last branch has a bug: it uses a variable chunk which is actually defined in another scope (in the for-loop before it). This leads to wrong data on the remote location: if you upload a file which has e.g. between 20 * 50MB and 21 * 50MB size, it will always be truncated to to exactly 20 * 50MB on s3. This bug is fixed in this PR.

… sizes

martindurant · 2024-11-05T12:56:22Z

Thanks for the fix. It should be easy to test this, right?

martindurant · 2024-11-05T16:46:54Z

Maye a simpler fix would be to run "in parallel" even for just one remaining chunk

nils-braun · 2024-11-06T15:44:31Z

@martindurant - I added a test and simplified the code to only use a single branch ("in parallel" for both cases)

martindurant · 2024-11-06T16:00:45Z

Perfect, thank you

Fix a bug in the _upload_file_part_concurrent method for certain file…

dff2a85

… sizes

Added test against file truncation and only use a single code

bedae4a

martindurant merged commit ff8e4fe into fsspec:main Nov 6, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug in the `_upload_file_part_concurrent method` #910

Fix a bug in the `_upload_file_part_concurrent method` #910

nils-braun commented Nov 5, 2024

martindurant commented Nov 5, 2024

martindurant commented Nov 5, 2024 •

edited

Loading

nils-braun commented Nov 6, 2024

martindurant commented Nov 6, 2024

Fix a bug in the _upload_file_part_concurrent method #910

Fix a bug in the _upload_file_part_concurrent method #910

Conversation

nils-braun commented Nov 5, 2024

martindurant commented Nov 5, 2024

martindurant commented Nov 5, 2024 • edited Loading

nils-braun commented Nov 6, 2024

martindurant commented Nov 6, 2024

Fix a bug in the `_upload_file_part_concurrent method` #910

Fix a bug in the `_upload_file_part_concurrent method` #910

martindurant commented Nov 5, 2024 •

edited

Loading