Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] Improve normal file upload design for public APIs #3878

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Harshg999
Copy link
Collaborator

@Harshg999 Harshg999 commented Nov 5, 2024

What changes were proposed in this pull request?

  • The original file upload implementation is super old and directly uploads the file by checking the available upload handlers set in settings.py when the request reaches the first middleware.
  • This flow did not follow the current Hue filebrowser design of routing the FS calls via ProxyFS.
  • The design also had historical flaws of uploading the file even if request was meant for other file operation (if you send a file with /copy call, it will still upload the file and then try doing the copy operation).
  • There was no flexibility for upload pre-checks and error handling. Most of such stuff was shifted post file upload.

  • The new implementation helps in solving all the above problem along with cleaner code, improved design, upload on-demand, greater control and performance improvements.
  • The new implementation is only available in API form and does not affect or modifies the older implementation (both are available in parallel for the time being) because we are in the process of phasing out the old filebrowser.
  • Once the new filebrowser is upto speed, the old implementation will be removed.

How was this patch tested?

  • Manually

@Harshg999 Harshg999 changed the title [api] Refactor /upload API [WIP}[api] Refactor /upload API Nov 5, 2024
@Harshg999 Harshg999 changed the title [WIP}[api] Refactor /upload API [WIP][api] Refactor /upload API Nov 5, 2024
@Harshg999 Harshg999 changed the title [WIP][api] Refactor /upload API [api] Refactor /upload API Nov 13, 2024
Copy link

⚠️ No unit test files modified. Please ensure that changes are properly tested. ⚠️

Copy link

⚠️ No unit test files modified. Please ensure that changes are properly tested. ⚠️

@Harshg999 Harshg999 changed the title [api] Refactor /upload API [api] Improve normal file upload design for public APIs Nov 14, 2024
@Harshg999 Harshg999 self-assigned this Nov 14, 2024
Copy link
Collaborator

@bjornalm bjornalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I have a couple of thoughts without having looked to deep into the python code.

  1. I think I have raised this before, but why are we not taking the opportunity while building completely new APIs to implement the new Error handling? Both the UI and the API are being rebuilt so I doubt there will be a better opportunity.

https://docs.google.com/document/d/1PLaOEZnDHI0GiCqCzkRpnVnrenPE8LbGm1WlaJTn0cg/edit?usp=sharing

  1. Since this is a professional file service API, we should consider the difference between Base-2 and Base-10 when dealing with file sizes, e.g. how to calculate and how to communicate sizes in a correct and consistent manner. I don't know how big of an issue this is for our new filebrowser, but I think we should to look into it.

For instance, I notice that FILE_UPLOAD_CHUNK_SIZE is calculated as Base-2 and should be expressed as KiB, MiB, GiB etc but is communicated as base-10.

We should look into which files systems use Base-2 (like macOS Terminal and unix) and which use base-10 (like S3) and have a strategy for dealing with that. My fear is that in the context of big data these differences can have a large impact if we don't get it right.

  1. We should also add unit tests to this new API

# Check if the file type is restricted
_, file_type = os.path.splitext(uploaded_file.name)
if RESTRICT_FILE_EXTENSIONS.get() and file_type.lower() in [ext.lower() for ext in RESTRICT_FILE_EXTENSIONS.get()]:
return HttpResponse(f'File type "{file_type}" is not allowed. Please choose a file with a different extetypension.', status=400)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling extetypension

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two questions

  • Is there room for a longer error description, e.g including while formats that are allowed?
  • are we exposing the allowed types via some setting or API so that we can prevent those files from being uplaaded in the UI in the first place?


if request.fs.isdir(dest_path) and posixpath.sep in uploaded_file.name:
raise Exception(f'Upload failed: {posixpath.sep} is not allowed in the filename {uploaded_file.name}.') # TODO: status code
if request.fs.exists(filepath):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally there should be an option to override an existing file here if the user has already been informed and made that choice. E.g. in the case of an updated version of the file but at same file path.

if request.META.get('upload_failed'):
raise Exception(request.META.get('upload_failed')) # TODO: Check error message and status code
# Check if the file size exceeds the maximum allowed size
max_size = MAX_FILE_SIZE_UPLOAD_LIMIT.get()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same for all file systems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants