Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector indexing #24

Open
angelo337 opened this issue Jan 19, 2023 · 4 comments
Open

vector indexing #24

angelo337 opened this issue Jan 19, 2023 · 4 comments

Comments

@angelo337
Copy link

hi there
I am not sure if this issue is regarding commuter or importing, however I will try to explain it as much as I can.

I got a bunch of documents to index and do some semantic search, I just finis the create all components need it in order to include all information relevant to this matter. when I try to include vectors of every document I am getting this error:

Error adding field 'vector'='[0.007063671946525574, -0.032926689833402634, -0.04629294574260712, -0.008444702252745628]' , expected format:'[f1, f2, f3...fn]' e.g. [1.0, 3.4, 5.6]

however I double check with the debugger on importer and the field is in the right format. do you have any idea why is trowing me this error?
thanks a lot
Angelo

@essiembre
Copy link
Contributor

This error is from Solr? If in doubt, I suggest you try sending a document manually to Solr and see if you still get that error. If so you may have to check on Solr support group.

If I had to guess, I would say maybe Solr is expecting float values and the values you are sending may have too many digits to be represented as such? Try reducing the decimal precision. Maybe the Importer TruncateTagger can be of assistance.

@angelo337
Copy link
Author

hi there,
Its a SolR Error, however it's no possible to to reduce decimal precision, nor the vector dimension, as is defined un the Transformer model.
I am going to test this manually process. and let you know
best regards
Angelo

@angelo337
Copy link
Author

hi there
I just test manually a JSON generated from Norconex and later I modified it in order to work with it. this First one the from Norconex:
[ "upsert": { { "id": "https://www.client.com/", "metadata": { "summary": [ "Copyright 2021 - Todos los derechos reservados" ], "vector": [ 0.034525495022535324, -0.006657381076365709, 0.007999160327017307, 0.1010849773883819 ] } } } ]

I manually sub this Json and it's working
[ { "id": "https://www.client.com/", "summary": [ "Copyright 2021 - Todos los derechos reservados" ], "vector": [ 0.034525495022535324, -0.006657381076365709, 0.007999160327017307, 0.1010849773883819 ] } ]

it's possible to alter the order on the way that this commuter behave? please

thanks a lot
Angelo

@essiembre
Copy link
Contributor

What do you mean by changing the order?

The JSON generated by the crawler is an internal format. When sent to solr, solrj is used which sends the data in binary form, in its own optimized way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants