Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Exception in ASGI application - fastapi.exceptions.HTTPException: 404: Session not found #9169

Closed
1 task done
skye0402 opened this issue Aug 22, 2024 · 13 comments
Closed
1 task done
Labels
bug Something isn't working needs repro Awaiting full reproduction

Comments

@skye0402
Copy link

Describe the bug

Since some time (I can't say exactly which release version it started, (currently on 4.41.0 it wasn't happening with 4.22 (and maybe later) that I know) I get below session error.
The Gradio app is running on Kubernetes behind an approuter. The error isn't reproducible for me but I saw other issues with same error #9070 but more suitable #6920. I already use sticky sessions, have maybe 20 concurrent users at peak and 4-5 instances of the app running. It happens maybe in 5% of the cases (it's hard to measure). But it didn't happen on older Gradio (I never upgraded the approuter). So I wonder if there's any way we can fix it? This is a nasty error, because I user can only circumvent it by opening an incognito window or clearing the cookie.

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Reproduction

import gradio as gr

Screenshot

No response

Logs

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 261, in wrap
await func()
File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/usr/local/lib/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fbaab652990
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/usr/local/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/applications.py", line 123, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
| raise exc
| File "/usr/local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
| await self.app(scope, receive, _send)
| File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 727, in __call__
| await self.app(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 754, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 774, in app
| await route.handle(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 295, in handle
| await self.app(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 77, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 75, in app
| await response(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 258, in __call__
| async with anyio.create_task_group() as task_group:
| File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/usr/local/lib/python3.12/site-packages/gradio/routes.py", line 980, in sse_stream
| raise e
| File "/usr/local/lib/python3.12/site-packages/gradio/routes.py", line 915, in sse_stream
| raise HTTPException(
| fastapi.exceptions.HTTPException: 404: Session not found.
+------------------------------------

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.41.0
gradio_client version: 1.3.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
anyio: 4.4.0
fastapi: 0.112.1
ffmpy: 0.4.0
gradio-client==1.3.0 is not installed.
httpx: 0.27.0
huggingface-hub: 0.24.5
importlib-resources: 6.4.2
jinja2: 3.1.4
markupsafe: 2.1.5
matplotlib: 3.9.2
numpy: 1.26.4
orjson: 3.10.7
packaging: 24.1
pandas: 2.2.2
pillow: 10.4.0
pydantic: 2.8.2
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.2
ruff: 0.6.0
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.12.3
typing-extensions: 4.12.2
urllib3: 2.2.2
uvicorn: 0.30.6
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.


gradio_client dependencies in your environment:

fsspec: 2024.6.1
httpx: 0.27.0
huggingface-hub: 0.24.5
packaging: 24.1
typing-extensions: 4.12.2
websockets: 12.0

Severity

I can work around it

@skye0402 skye0402 added the bug Something isn't working label Aug 22, 2024
@skye0402
Copy link
Author

skye0402 commented Sep 5, 2024

Any chance to look into it? I found the likelihood of 404 errors increases if the pod ages (e.g. more than 1 day old). It was definitely not happening with older Gradio versions.

@wesngoh
Copy link

wesngoh commented Sep 10, 2024

Issue persists for me too, I am running gradio app on multiple AWS EKS pods and 404 error shows up frequently tho not all the time. Had to downgrade gradio version to gradio==3.50.2.

Please look into it.

@skye0402
Copy link
Author

@w8jie It works without errors with e.g. Gradio 4.1x - at some point the bug was introduced.

@abidlabs
Copy link
Member

Apologies for the late response. We'll need a methodical repro in order for us to investigate this issue. Would either of you be able to provide one?

@abidlabs abidlabs added the needs repro Awaiting full reproduction label Sep 12, 2024
@skye0402
Copy link
Author

skye0402 commented Sep 13, 2024

@abidlabs - I understand that. Thing is, this error just happens not all the time. A session is working fine over a certain time. Then the error occurs leading to error 404. If I open an incognito window I can work again because that's a new session. But the session in the regular browsing window is lost. Istio will use the session ID from the browser to direct it to the pod where the gradio app runs that owns this session ID. But then above error log appears.
So far I wasn't able to provoke it, I think it's more likely the "older" the instance gets that runs Gradio. In such a case I have 2 options: Wait until the session expires or restart the pod (manually).

It's become a real problem - I'd say it happens in 10 to 20% of the cases a user wants to continue work. It's always above error and it seems the session ID is forgotten by Gradio (maybe after starlette raised the ASGI error?)

I can offer access to the instance for one of your developers if that's of any help and of course access to the source code.

@skye0402
Copy link
Author

@abidlabs I took my chances and downgraded starlette to 0.37.2 (which goes back to March this year) and see if this fixes the problem. Next starlette was from July which could be the time the problems started. Will update if that helped.

@skye0402
Copy link
Author

@abidlabs - Downgrading starlette didn't fix the error at least not until 0.37.2. I don't know which version was part of Gradio in May/June where Gradio didn't show the error. If it went back to an older version I could try to further downgrade.

@abidlabs
Copy link
Member

Hi @skye0402 sorry we've been very busy with the 5.0 release. Could you check to see if this is still an issue with 5.0? We've changed significant parts of the codebase so its possible that this has been addressed.

Otherwise, we won't be able to proceed without a repro.

@abidlabs
Copy link
Member

abidlabs commented Nov 5, 2024

Closing for now, can reopen with confirmation and reproduction

@abidlabs abidlabs closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2024
@imhaggarwal
Copy link

imhaggarwal commented Nov 12, 2024

Hello @abidlabs
Still facing this issue even with 5.1.0.
I'm using IIS on Windows.

http://localhost/gradio_api/queue/data?session_hash=8jd18n3t166
This URL gives 502 error while
http://localhost/gradio_api/queue/data
the URL without session_hash does not produce any error

@AusafG5
Copy link

AusafG5 commented Nov 28, 2024

Closing for now, can reopen with confirmation and reproduction

@abidlabs
This bug is still present in the current 5.7.0, has almost made all our internal prod apps unusable when deployed to google cloud run services or local-tunnelling/port forwarding, basically only works locally.
The following documentation is basically entire flow of the app:

import gradio as gr
app = FastAPI()
@app.get("/")
def read_main():
    return {"message": "This is your main app"}
io = gr.Interface(lambda x: "Hello, " + x + "!", "textbox", "textbox")
app = gr.mount_gradio_app(app, io, path="/gradio")```

@abidlabs
Copy link
Member

@AusafG5 can you help us with a minimal repro?

@AusafMo
Copy link

AusafMo commented Nov 28, 2024

@AusafG5 can you help us with a minimal repro?

@abidlabs
would love to, but i don't exactly know or can't figure out what would constitute a minimal repro in this case.
the codebase basically makes 3-4 different api calls, does some on server image processing and returns result image.
This whole job can take from about 4 minutes to 10 minutes.
Following is my attempt at reproducing:

from fastapi import FastAPI
import gradio as gr
from PIL import Image
import io
import os
import random
import boto3

app = FastAPI()

def upload_to_s3(image_data, filename):
    temp_path = f"/tmp/{filename}"
    with open(temp_path, "wb") as f:
        f.write(image_data)
    os.remove(temp_path)  # Cleanup
    return f"https://fake-s3-url.com/{filename}"

def process_image(input_image: Image.Image):
    try:
        temp_file = f"temp_{random.randint(0,1000)}.png"
        input_image.save(temp_file)
        processed = input_image.resize((512, 512))
        
        output_file = f"output_{random.randint(0,1000)}.png"
        processed.save(output_file)
        
        with open(output_file, 'rb') as f:
            output_url = upload_to_s3(f.read(), "result.png")
            
        os.remove(temp_file)
        os.remove(output_file)
        
        return processed, output_url
        
    except Exception as e:
        raise Exception(f"Failed to process: {str(e)}")
    finally:
        if os.path.exists(temp_file):
            os.remove(temp_file)
        if os.path.exists(output_file):
            os.remove(output_file)

with gr.Blocks() as demo:
    input_img = gr.Image(type="pil")
    output_img = gr.Image(type="pil")
    output_url = gr.Textbox(label="Output URL")
    
    gr.Button("Process").click(
        fn=process_image,
        inputs=input_img,
        outputs=[output_img, output_url]
    )

app = gr.mount_gradio_app(app, demo, path="/")

The following is the stripped down yaml

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name:demo
spec:
  template:
    spec:
      containerConcurrency: 10
      timeoutSeconds: 3600
      containers:
      - name: demo-1
        image: image-url
        ports:
        - name: http1
          containerPort: 8080
        resources:
          limits:
            cpu: 2000m
            memory: 8Gi
        startupProbe:
          timeoutSeconds: 240
          periodSeconds: 240
          failureThreshold: 1
          tcpSocket:
            port: 8080

Please feel free to specifically ask for details which you think might be missing.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs repro Awaiting full reproduction
Projects
None yet
Development

No branches or pull requests

6 participants