-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exiting Chain EOF #85
Comments
Hi :). I assume your using the default 2k context window on open-webui? Until today, my project used a much larger context window if possible (like in the case of command-r). I just pushed an update which contains a new settings window, which allows you to adjust the context window. Please confirm that this causes the increase in vRAM usage / decrease in offloaded layers. |
If that's the case, I assume ollama just run out of memory on your system? |
Yes, it's certainly quicker when I lower the context window size. Though it seems to be breaking. It froze when trying to pull info from the internet here for maybe a minute or so:
And after that went through it then got stuck in a loop: Here's the full logs: |
I'm pretty sure that it run out of context. 2k tokens isnt much. You can see an estimate of the current context in the backend logs. I assume that the format instructions arent in the context anymore at this point. Which results in the LLM ignoring the requested structure. |
closing for #91 |
Describe the bug
When submitting a question:
Exiting chain with error: Post "http://ollama:11434/api/chat": EOF
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Not break.
Screenshots
![image](https://private-user-images.githubusercontent.com/10090878/322061998-111d39cf-0e9f-4f0b-b65f-eb1f5a2242e9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNjE0NzEsIm5iZiI6MTczOTA2MTE3MSwicGF0aCI6Ii8xMDA5MDg3OC8zMjIwNjE5OTgtMTExZDM5Y2YtMGU5Zi00ZjBiLWI2NWYtZWIxZjVhMjI0MmU5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDAwMzI1MVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk1MGI0MGYyZTM2MjEzNGFmYzM0OGJkMDc3ZmRiMzAzMGQxNTI2NzMzZGI2MTMzYTY2ZjUzZTkwYWMwNWEzODEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Z67wTOHHbKaf4nHRGVanoDSctJv7Z8jtFvxjhrtNn4Q)
Additional context
I think this may be some sort of timeout issue? I note it only happens to me with Command-R. I can use Command-R (18.8gb) fine in Ollama's Web UI, with 39/41 layers offloaded to GPU (3090, 24Gb). But when I use LLocalSearch, I only see 19/41 layers offloaded. Not sure if that has anything to do with it, but is confusing me since when I use Mixtral-8x-7b(19Gb) it loads all layers to GPU and has no issues with LLocalSearch.
The text was updated successfully, but these errors were encountered: