Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlit cache leading to empty index #48

Open
tinamil opened this issue May 2, 2024 · 3 comments
Open

Streamlit cache leading to empty index #48

tinamil opened this issue May 2, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@tinamil
Copy link

tinamil commented May 2, 2024

I deployed with docker and uploaded a local file. When I tried to chat I got a blank response and the following error:
local-rag | 2024-05-02 12:28:54,674 - ollama - ERROR - Ollama chat stream error: 'HuggingFaceEmbedding' object has no attribute '_model'

However, it works normally when I upload a website instead.

I traced the problem to line 114 of utils/llama_index.py: @st.cache_data(show_spinner=False) for the create_index(_documents) function. So, one workaround is to comment out that line, and then local files work again. I believe create_index is being called when the documents are being uploaded, but before they have been saved to disk then read into memory, so the index is empty and then streamlit is caching the result instead of regenerating the document index when the query comes through.

@tinamil tinamil added the bug Something isn't working label May 2, 2024
@jonfairbanks jonfairbanks changed the title Unable to chat after uploading local file Streamlit cache leading to empty index May 3, 2024
@jonfairbanks
Copy link
Owner

Cache was added to help speed up subsequent chat messages since Streamlit essentially runs the full app each time, triggering embedding again. It certainly has edge cases and almost creates more problems than it solves.

This is sort of outlined in the Known Issues section.

I'll try to take a second look at this and see if there's a better option.

@jonfairbanks jonfairbanks self-assigned this May 3, 2024
@tinamil
Copy link
Author

tinamil commented May 3, 2024

I found some additional information. Streamlit @st.cache.data excludes parameter names that begin with an underscore from being hashed. https://docs.streamlit.io/develop/concepts/architecture/caching#excluding-input-parameters

So, that is most likely why the create_index function is not detecting that the documents have changed.

I attempted to change line 115 to def create_index(documents): and line 134 to documents=documents, show_progress=True, i.e. I removed the underscore from the parameter. However, that caused more errors, which is likely why the underscore existed in the first place:

llama_index - ERROR - Error when creating Query Engine: Cannot hash argument 'documents' (of type builtins.list) in 'create_index'.
To address this, you can tell Streamlit not to hash this argument by adding a
leading underscore to the argument's name in the function signature:

@st.cache_data
def create_index(_documents, ...):
    ...

@JoepdeJong
Copy link
Contributor

I think this is fixed in #54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants