Skip to content
This repository has been archived by the owner on Jan 5, 2025. It is now read-only.

This model's maximum context length is 4097 tokens, however you requested 5586 tokens (5330 in your prompt; 256 for the completion). Please reduce your prompt; or completion length. #186

Open
zhongchongan opened this issue Oct 24, 2023 · 3 comments

Comments

@zhongchongan
Copy link

Helpful Answer:
2023-10-24 20:43:22 web | This model's maximum context length is 4097 tokens, however you requested 5586 tokens (5330 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
2023-10-24 20:43:22 web | Traceback (most recent call last):
2023-10-24 20:43:22 web | File "/app/api/views/views_chat.py", line 45, in chat
2023-10-24 20:43:22 web | response_text = get_completion_response(vector_store=vector_store, initial_prompt=initial_prompt,mode=mode, sanitized_question=sanitized_question, session_id=session_id)
2023-10-24 20:43:22 web | File "/app/api/views/views_chat.py", line 85, in get_completion_response
2023-10-24 20:43:22 web | response = chain({"question": sanitized_question, "chat_history": chat_history}, return_only_outputs=True)
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 258, in call
2023-10-24 20:43:22 web | raise e
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 252, in call
2023-10-24 20:43:22 web | self._call(inputs, run_manager=run_manager)
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/conversational_retrieval/base.py", line 142, in _call
2023-10-24 20:43:22 web | answer = self.combine_docs_chain.run(
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 456, in run
2023-10-24 20:43:22 web | return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 258, in call
2023-10-24 20:43:22 web | raise e
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 252, in call
2023-10-24 20:43:22 web | self._call(inputs, run_manager=run_manager)
2023-10-24 20:43:22 web | File "/usr/local/lib/python3.9/site-packages/langchain/chains/combine_documents/base.py", line 106, in _call

@codebanesr
Copy link
Contributor

@zhongchongan We can use a different model to get around this, but i need more details on how to reproduce this.

@zhongchongan
Copy link
Author

@codebanesr I use STORE=QDRANT, looking at the log, you can see that the beginning part of the original data taken out each time is related to the question I asked, but the rest of the pile is unrelated to my problem, taking out too much at one time, passing to GPT-3.5, causing the input token to exceed the limit, how to configure or optimize to reduce the original data taken out?

@davidsmithdevops
Copy link
Contributor

davidsmithdevops commented Oct 25, 2023

@zhongchongan
Reducing the number of records retrieved from Qdrant or decreasing the text segments during the ingestion process can both solve the issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants