Streamlit cache leading to empty index #48

tinamil · 2024-05-02T16:53:21Z

I deployed with docker and uploaded a local file. When I tried to chat I got a blank response and the following error:
local-rag | 2024-05-02 12:28:54,674 - ollama - ERROR - Ollama chat stream error: 'HuggingFaceEmbedding' object has no attribute '_model'

However, it works normally when I upload a website instead.

I traced the problem to line 114 of utils/llama_index.py: @st.cache_data(show_spinner=False) for the create_index(_documents) function. So, one workaround is to comment out that line, and then local files work again. I believe create_index is being called when the documents are being uploaded, but before they have been saved to disk then read into memory, so the index is empty and then streamlit is caching the result instead of regenerating the document index when the query comes through.

The text was updated successfully, but these errors were encountered:

jonfairbanks · 2024-05-03T04:50:33Z

Cache was added to help speed up subsequent chat messages since Streamlit essentially runs the full app each time, triggering embedding again. It certainly has edge cases and almost creates more problems than it solves.

This is sort of outlined in the Known Issues section.

I'll try to take a second look at this and see if there's a better option.

tinamil · 2024-05-03T17:15:52Z

I found some additional information. Streamlit @st.cache.data excludes parameter names that begin with an underscore from being hashed. https://docs.streamlit.io/develop/concepts/architecture/caching#excluding-input-parameters

So, that is most likely why the create_index function is not detecting that the documents have changed.

I attempted to change line 115 to def create_index(documents): and line 134 to documents=documents, show_progress=True, i.e. I removed the underscore from the parameter. However, that caused more errors, which is likely why the underscore existed in the first place:

llama_index - ERROR - Error when creating Query Engine: Cannot hash argument 'documents' (of type builtins.list) in 'create_index'.
To address this, you can tell Streamlit not to hash this argument by adding a
leading underscore to the argument's name in the function signature:

@st.cache_data
def create_index(_documents, ...):
    ...

JoepdeJong · 2024-05-21T09:01:07Z

I think this is fixed in #54

tinamil added the bug Something isn't working label May 2, 2024

jonfairbanks changed the title ~~Unable to chat after uploading local file~~ Streamlit cache leading to empty index May 3, 2024

jonfairbanks self-assigned this May 3, 2024

jonfairbanks mentioned this issue Jun 25, 2024

Inconsistent Document Recognition and Indexing in Application #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamlit cache leading to empty index #48

Streamlit cache leading to empty index #48

tinamil commented May 2, 2024

jonfairbanks commented May 3, 2024

tinamil commented May 3, 2024

JoepdeJong commented May 21, 2024

Streamlit cache leading to empty index #48

Streamlit cache leading to empty index #48

Comments

tinamil commented May 2, 2024

jonfairbanks commented May 3, 2024

tinamil commented May 3, 2024

JoepdeJong commented May 21, 2024