Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Docsum file upload handling fails with concurrent http request with same file name #1279

Open
2 of 6 tasks
vrantala opened this issue Dec 20, 2024 · 2 comments
Open
2 of 6 tasks
Assignees
Labels
aitce bug Something isn't working Dev
Milestone

Comments

@vrantala
Copy link

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

opea/docsum:latest sha256:42115ddbf5199060c94f66d654e9e2f128406e94b8dc0a3b3929bae93d4ba829
opea/llm-docsum-tgi:latest sha256:06c8095ccd0ca48de6588ee3bfbddb233d1e9e8cf4c4aeaa60e10eca4571fea4

Description

When sending http-request
Making request to /v1/docsum with files {'type': (None, 'text'), 'messages': (None, ''), 'files': ('pubmed_10.txt', <_io.BufferedReader name='/home/testrunner/users.vrantala/opea_benchmarking/data/pubmed_10.txt'>, 'text/plain'), 'max_tokens': (None, '1024'), 'language': (None, 'en'), 'stream': (None, 'true')}
to docsum service endpoint with Locust test-bench.
It starts giving "POST /v1/docsum HTTP/1.1" 500 Internal Server Error.

Reproduce steps

Send concurrent http-requests with same filename to the docsum service endpoint.
Http-request info:
Making request to /v1/docsum with files {'type': (None, 'text'), 'messages': (None, ''), 'files': ('pubmed_10.txt', <_io.BufferedReader name='/home/testrunner/users.vrantala/opea_benchmarking/data/pubmed_10.txt'>, 'text/plain'), 'max_tokens': (None, '1024'), 'language': (None, 'en'), 'stream': (None, 'true')}

Raw log

INFO:     10.245.0.1:57716 - "POST /v1/docsum HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/docsum.py", line 129, in handle_request
    os.remove(file_path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pubmed_10.txt'
@vrantala vrantala added the bug Something isn't working label Dec 20, 2024
@eero-t
Copy link
Contributor

eero-t commented Dec 20, 2024

With multiple parallel requests, Python HTTP module calls code here, from multiple threads, in parallel:
https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docsum.py#L146

Meaning that those threads overwrite and remove each other's temporary file.

That code has also other security / reliability bugs:

  • It's writing to a location (/tmp) that is shared with rest of the system (at least when program is not containerized)
  • It's using path/filename controlled by the (potential) attacker

There are few ways to fix that:

  • Use unique temp file names (as security guidelines require), or
  • Do not use extra temporary file, but parse the file object directly
    • => This could be noticeable perf optimization for large file uploads

Also, are (HTTP server component) limits set on how large and how many files can be upload in parallel?

I.e. could the (potential) attacker just send e.g. 1000x 1GB files to evict all OPEA services from a node[1], or OOM the node[2], because service exhausted /tmp space?


[1] in case /tmp is disk backed.
[2] iin case /tmp is in RAM.

@ZailiWang ZailiWang self-assigned this Dec 23, 2024
@joshuayao joshuayao added the Dev label Dec 24, 2024
@joshuayao joshuayao added this to the v1.2 milestone Dec 24, 2024
@joshuayao joshuayao added this to OPEA Dec 24, 2024
@joshuayao joshuayao moved this to In progress in OPEA Dec 24, 2024
@Spycsh Spycsh mentioned this issue Dec 24, 2024
7 tasks
@Spycsh
Copy link
Member

Spycsh commented Dec 24, 2024

#1286 should fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aitce bug Something isn't working Dev
Projects
Status: In progress
Development

No branches or pull requests

5 participants