Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Es vector store basic implementation #21

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Es vector store basic implementation #21

wants to merge 18 commits into from

Conversation

noamoss
Copy link
Collaborator

@noamoss noamoss commented Feb 7, 2025

closes #12
closes #13
closes #16

@noamoss noamoss changed the title Es vector store Es vector store basic implementation Feb 7, 2025
@noamoss noamoss requested a review from akariv February 7, 2025 07:28
botnim/sync.py Outdated
tool_resources = None
tools = None
print(f'Updating assistant: {config["name"]}')
# Load context, if necessary
if config.get('context'):
vs = VectorStoreOpenAI(config, config_dir, production, client)
if config.get('context') and replace_context: # Only runs if both conditions are true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is wrong - we need the tools and tools_resources even if we don't replace the context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

# Generate embedding
response = self.openai_client.embeddings.create(
input=content,
model="text-embedding-ada-002",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you choose this older model?
'text-embedding-3-small' is cheaper and more advanced.

https://platform.openai.com/docs/guides/embeddings#embedding-models

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are very right, and actually it was already included in the configurations. Notice we might change this selection later. For now - fixed.

while len(file_streams) > 0:
current = file_streams[:32]

for filename, content_file, content_type in current:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the double loop I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. good catch.


def get_or_create_vector_store(self, context, context_name, replace_context):
ret = None # return value
vs_name = self.env_name(self.config['name']).lower().replace(' ', '_')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to include the context name as well here, as we would want each context to be in a separate index.

return 0

def update_tools(self, context_, vector_store):
if len(self.tools) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tool added needs to be a function, not file-search.

Something like:

{
  "type": "function",
  "name": "search_common_knowledge",
  "description": "Semantic search the 'common-knowledge' vector store",
  "parameters": {
     "type": "object",
     "properties": {
       "query": {
          "type": "string",
          "description": "The query string to use for searching"
       }
     },
     "required": ["query"]
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I did not think of that.

Copy link
Contributor

@akariv akariv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @noamoss overall pretty good - I left some small comments.

),
))

def update_tool_resources(self, context_, vector_store):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no internal vector store ids for elasticsearch

@noamoss noamoss requested a review from akariv February 9, 2025 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants