Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use AISearch import and vectorize data wizard with azure-search-openai-demo #2275

Open
evan2k opened this issue Jan 13, 2025 · 1 comment
Open

Comments

@evan2k
Copy link

evan2k commented Jan 13, 2025

I was trying to use Azure AI Search import and vectorize data wizard (from the portal) with the app.
It seems that the wizard is quite opinionated on the name of index fields it uses when
creating the index, which makes it incompatible with the app, unless you make changes to
the code and use the same names. Am I missing something or am I correct in my assumption?
thank you.

@pamelafox
Copy link
Collaborator

I just helped another team use the app with the integrated vectorization. These were the changes they had to make.

  1. Modify this code in approach.py to pull out the correct field names.
 
                documents.append(
                    Document(
                        id=document.get("id"),
                        content=document.get("content"),
                        embedding=document.get("embedding"),
                        image_embedding=document.get("imageEmbedding"),
                        category=document.get("category"),
                        sourcepage=document.get("sourcepage"),
                        sourcefile=document.get("sourcefile"),
                        oids=document.get("oids"),
                        groups=document.get("groups"),
                        captions=cast(List[QueryCaptionResult], document.get("@search.captions")),
                        score=document.get("@search.score"),
                        reranker_score=document.get("@search.reranker_score"),
                    )
                )

You probably want id to be "chunk_id", and also for sourcepage and sourcefile to be "chunk_id", as I've been told that field contains the filename. You need filenames for citations to work. The embedding field should be "text_vector", I think. The rest of the fields are optional and don't need updating.

  1. Change this line of code in approach.py to "text_vector" (or whatever your embedding field is called) -
return VectorizedQuery(vector=query_vector, k_nearest_neighbors=50, fields="embedding")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants