Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribute advanced Hybrid search example in OpenAI Cookbook (python) #7

Open
tylerhutcherson opened this issue May 2, 2023 · 5 comments
Labels
enhancement New feature or request python

Comments

@tylerhutcherson
Copy link
Collaborator

The existing cookbook just touches the surface:
https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/redis/getting-started-with-redis-and-openai.ipynb

Contribute a Python notebook that demonstrates complex Hybrid queries with Redis VSS and other search features (an ecommerce dataset might work nicely) including

  • Numeric range filters
  • Tag filters
  • Full text search "filters"
  • Client-side hybrid scoring combing both BM25 lexical AND semantic search. This could be done in a pipeline to send 1 redis call to fetch both search results (top K) and then merge the sets. Show performance improvement with this technique over pure lexical or pure semantic?
@tylerhutcherson tylerhutcherson added enhancement New feature or request python labels May 2, 2023
@tylerhutcherson tylerhutcherson changed the title Add advanced Hybrid search example in OpenAI Cookbook Contribute advanced Hybrid search example in OpenAI Cookbook (python) May 2, 2023
@michaelskyuan
Copy link

Submitted PR

@tylerhutcherson
Copy link
Collaborator Author

Initial review submitted from our end >>> openai/openai-cookbook#417

@michaelskyuan at some point we will also want to make an update to this notebook that covers bullet point 4 above. This is a bit "green field" in the sense that we have not yet explicitly tried this. But it's theoretically possible to do true weighted hybrid search using a redis pipeline command and "merging" results from the two scoring algorithms (BM25 + KNN/CosineD). I sense the lift will be a bit more on this, and since not immediately pressing, I will spin it off into a separate issue that we can re-prioritize when the time is right, probably in the next month.

@michaelskyuan
Copy link

I agree @tylerhutcherson. And I believe this topic deserves it's own separate notebook with a denser text dataset.
Let's leave OOTB Redis Hybrid search functionality on the current notebook and have a specific notebook that will address normalization of lexical and semantic scoring using a more appropriate dataset instead of an ecommerce dataset.

@Spartee
Copy link
Contributor

Spartee commented May 26, 2023

@michaelskyuan This App/notebook was recently contributed by OpenAI. We could use some of this.

@ladrians
Copy link

I am struggling trying to apply a hybrid filter similar to these examples, the difference I am using langchainjs 0.2.16 from typescript with the standard bindings, using RedisSearch 2.6 from the cloud.

I have the following metatadata associated to a chunk item

{\\\"name\\\"\\:\\\"20030331\\\",\\\"description\\\"\\:\\\"2003/03/31\\\",\\\"year\\\"\\:2003,\\\"month\\\"\\:3,\\\"day\\\"\\:31,\\\"date\\\"\\:20030331,\\\"id\\\"\\:\\\"5251cc41\\-307d\\-4117\\-9b0f\\-9e408eb37011\\\",\\\"doc_id\\\"\\:\\\"5251cc41\\-307d\\-4117\\-9b0f\\-9e408eb37011\\\"}

Would like to filter in a range for the year for example similar to the following:

@metadata:(\\\"year\\\"\\:[(2001 2004])

always returning 0 records. If I use exact match or the negaction the case works. I get started with

filterQuery = `@metadata:(-\\\"year\\\"\\:2001)`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

OK, 3 items returned

@metadata:(-\"year\"\:2001) 3

Then I started trying with a range

filterQuery = `@metadata:(\\\"year\\\"\\:\\[2003 2005\\])`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

should return elements buyt I get

@metadata:(\"year\"\:\[2003 2005\]) 0

Very similar to the previous one

filterQuery = `@metadata:(\\\"year\\\"\\:\\[\\(2003 2005\\])`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

should return elements too but still I get 0

@metadata:(\"year\"\:\[\(2003 2005\]) 0

Trying other filters and using And (&) worked fine:

filterQuery = `@metadata:(\\\"year\\\"\\:2003&\\\"date\\\"\\:20030331)`;
itemList = await client.ft.search(chunk, filterQuery);
console.log(filterQuery, itemList.total);

OK, 1 item filtered

@metadata:(\"year\"\:2003&\"date\"\:20030331) 1

Got some ideas from the referenced notebook but still cannot make it work. any idea?

thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

No branches or pull requests

4 participants