Storage container names don't match storage_name parameter - is this causing indexing to fail? #47

brian-mayer · 2024-07-03T20:57:47Z

Describe the bug
I've deployed the infrastructure and it all seems to have deployed successfully. I am able to walk through the Jupyter Quickstart notebook and use the API upload the recommended sample UTF-8 text documents. Indexing 'seems' to start per the API message but stops at 6.25% or 12.5%. No indexes ever show up on the Azure AI Search instance.

To Reproduce
Steps to reproduce the behavior:

Deploy accelerator solution
Use Jupyter notebook Quickstart to walk through API calls

Upload sample UTF-8 files successfully into BLOB container, however containers have random file identifer strings - not specified storage_name parameter as the container name - example: 345yu37291db2aa8ced66f43edw5f6n7
Try to start an indexing job using notebook API call
Indexing job initiates but fails - either at 6.25% or 12.5%

Looks like this when API is queried for status
{
"status_code": 200,
"index_name": "wiki-articles-index",
"storage_name": "wiki-articles-storage",
"status": "failed",
"percent_complete": 12.5,
"progress": "2 out of 16 workflows completed successfully."

Expected behavior
I expect the index will be built so I can query it

Desktop (please complete the following information):

OS: MacOS
Version 14.4.1

Additional context
I've tried restarting the graphrag AKS containers and tried stripping down the files being processed to just one file. Nothing has altered the outcome of no apparent indexing happening. Is this related to the container names not matching the storage_name parameter input in the Jupyter Quickstart cell?

jgbradley1 · 2024-07-05T16:39:39Z

Hello @brian-mayer! The storage_name will not match the actual name of the blob container. For better security posture, we first sanitize the name provided by an API end-user by computing a hash and use that hash as the actual blob container name. The hash calculation from a user-provided storage_name string is done in this function to be exact.

jgbradley1 · 2024-07-07T17:15:24Z

To assist with debugging, there is one place you can look for additional logging. In the Azure Storage instance that gets deployed within the resource group at deployment time, there will be a blob container with the name reports. That is a continuously running log of the FastAPI application so if there are errors, you might see errors logged there. Also within the blob container that is associated with the hash of the index_name you tried to build, there is a reports directory that contains a log file associated with the indexing job. That file will contain all output from running the indexing job. If you tried to run the same indexing job multiple times, there will be a separate log file per attempt.

We are looking into hooking these logs up to App Insights so you don’t have to go hunt for these log files manually. The code to support App Insights is in the codebase but has not been fully tested again due to some recent changes we made so we never turned back on this form of logging by default.

We will look into it soon and try to get better logging enabled by default again.

jgbradley1 · 2024-07-22T21:56:27Z

I recently pushed a PR that hooks up graphrag to app insights. If interested, check out the latest on the main branch.

In that PR, log messages are captured in app insights as well as if any errors occur with calls to the API. Please report back if you encounter any further issues. Please note that step 2 of the pipeline (entity extraction) is responsible for a large portion of the overall indexing time (something like 90%). Once that step is complete, the other steps complete fairly quickly.

brian-mayer added the bug Something isn't working label Jul 3, 2024

jgbradley1 closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage container names don't match storage_name parameter - is this causing indexing to fail? #47

Storage container names don't match storage_name parameter - is this causing indexing to fail? #47

brian-mayer commented Jul 3, 2024

jgbradley1 commented Jul 5, 2024 •

edited

Loading

jgbradley1 commented Jul 7, 2024

jgbradley1 commented Jul 22, 2024

Storage container names don't match storage_name parameter - is this causing indexing to fail? #47

Storage container names don't match storage_name parameter - is this causing indexing to fail? #47

Comments

brian-mayer commented Jul 3, 2024

jgbradley1 commented Jul 5, 2024 • edited Loading

jgbradley1 commented Jul 7, 2024

jgbradley1 commented Jul 22, 2024

jgbradley1 commented Jul 5, 2024 •

edited

Loading