-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - "Indexing failed at 12.5 %" #139
Comments
Any luck @doruit? Did you happen to try running again? When you kick of an indexing run, a kubernetes job is spun up (within about 5 mins). If you ran deploy.sh, you should be able to
and wait for the indexing job to appear. Then
to watch the logs to monitor progress. You'll possibly see some 503 and 429 errors, which is normal as the indexer runs out of tokens and has to wait for the rate limiter to let it back in. (There's ongoing work to clean this up) But, if for some reason your indexer dies you would be able to see what happened when it did. |
@timothymeyers, just did a fresh deployment to rule out some possible causes..... I've checked the storage account, it seems that the files are uploaded to a container with a random name where i expected a name i declared in the notebook: file_directory = "testdata" However, the files are uploaded in a container with the number as name instead: ![]() Is this expected? |
@doruit Please check logs of your indexing pod and you will get an idea. |
Hi @doruit - yes this is the expected behavior. The names that you give are hashed to improve the overall security posture. Did you run into the same issues during indexing with your new deployment? Did you happen to try inspecting the index pod logs like I mentioned? |
Hi @timothymeyers, earlier i saw in the indexing pod logs that the token limit is reached many times. To me strange as i'm using the following TPM settings: ![]() Should be sufficient right? I have also turned of dynamic quota allocation. When looking at the monitor of jobs it says no jobs running: ![]() When checking the job status from the notebook at the same time it says: ![]() |
@doruit, could you please add the api_key property under each LLM node in the following file: pipeline-settings.yaml?. |
@rnpramasamyai, i've added the api_key property: After this i did a rerun of the Quickstart notebook to build a new index: ![]() But now the indexing manager does not seem to instantiate an indexing job at all. Should i remove the graphrag namespace and run the deployment again ? |
@doruit. Please run deployment script again. |
@doruit Please always check the pod's logs if indexing is not working and post that logs. |
I did a full deployment again, check all parameters and ran the notebook again from the start. After running the step "Build an Index" i get this message:
At the same time i'm watching the logs and wait for the indexing job to come-by but i only get messages from the graphrag index manager every 5 minutes:
This is my parameters file:
Not sure where to look now as the indexing job does not start at all anymore. What region, LLM model version, API version, etc should i use as reference? |
@doruit Indexing will take time to complete. |
@rnpramasamyai, i've waited for an hour, but it seems it will not start nor can i find any clue where to look for errors. In the job log i only see this message every 5 minutes: ![]() What else can i check/ rule out? |
@rnpramasamyai @timothymeyers
I checked if quota or Azure policy caused the issue in the CSP tenant/subscription, however i could not find any log so for to rule everything out. There is only 1 policy that might impact the creation of VM/VMSS. The policy requires VMs to have managed disks, which they all have so i guess the policy won't block anything. Other policy blocks creating classic resources. However, the good news is that with the alternative method the deployment was successful. |
Firstly: thank you for this repo, and thanks for trying to help us punters understand what you have written. I do have the same issues with stopping at 2/16 workflows. 12.5% . Pod log command, does not seem to work??? tried with both names when job was running: graphrag-solution-accelerator-py3.10vscode@docker-desktop:/graphrag-accelerator$ kubectl logs job/graphrag-index-manager-28746945 -n graphrag -f Can I suggest / request - as it may make everyone's job a little easier:
i.e. It is very difficult to see what is going on, to try to understand what is going wrong. Lastly, when you add a comment like: @doruit, could you please add the api_key property under each LLM node in the following file: pipeline-settings.yaml?. For the rest of us trying to follow along, would you mind telling us quickly why you are suggesting that, so that we can also understand why it might fix the issue. |
I still don't know what caused the process to get stuck. It was not due to Azure policy or api_key in the pipeline-settings.yaml. Perhaps the model and API version might cause the issue. In other issue threads they mention that if the vector size is slightly different from what is expected, the indexing will fail. @timothymeyers, in deployment.md it looks like the API version is fixed to "2023-03-15-preview". Is that correct? or should the documentation instruct the developer to get the right API version from the deployed model (i.e. via the portal)? |
Hi, I'm facing a similar issue. I'm running it on a dev container. Which API version did you end up using? |
Describe the bug
Stuck at the indexing job. After this message:
I'm checking the status every now and then, after a while get this:
{
'status_code': 200,
'index_name': 'index-2',
'storage_name': 'testdata1',
'status': 'failed',
'percent_complete': 12.5,
'progress': '2 out of 16 workflows completed successfully.',
}
To Reproduce
Steps to reproduce the behavior:
Expected behavior
At the indexing job i expect the job to finish succesfully.
Screenshots
n/a
Desktop (please complete the following information):
Additional context
n/a
The text was updated successfully, but these errors were encountered: