This project is a Cloudflare Page that allows you to perform a semantic search over a set of documents.
Essentially you can search for meaning, not just keywords.
The nice thing about using Cloudflare is that it's incredibly fast and cheap. You can deploy this to a free Cloudflare Page and it will scale to millions of documents.
See below for examples on how to submit and search documents.
For large documents, we support a bulk submit endpoint that will automatically split your document into paragraphs and index each paragraph separately.
curl -X POST -H "Content-Type: application/json" -d '{"text": "cloudflare"}' "http://127.0.0.1:8788/api/submit"
curl -X POST -H "Content-Type: application/json" -d '{"text": "big document"}' "http://127.0.0.1:8788/api/bulk-submit"
curl "http://127.0.0.1:8788/api/search?query=cloudflare"
Which returns:
[
{
"id": "fad58cf0-78dc-4170-9ae3-38f5c62868e3",
"namespace": "default",
"text": "cloudflare",
"metadata": {},
"indexed_at": "2023-05-08T08:02:29.058Z",
"similarity": 0.9999999999999998
}
]
You can pass a namespace
parameter to both the submit and search endpoints to namespace your queries.
curl -X POST -H "Content-Type: application/json" -d '{"text": "cloudflare", "namespace": "my-namespace"}' "http://127.0.0.1:8788/api/submit"
And then:
curl "http://127.0.0.1:8788/api/search?query=cloudflare&namespace=my-namespace"
A namespace could be a user ID, a document ID, or anything else you want to use to group your queries.
You can also pass in a key/value metadata
object to the submit endpoint to store additional data about your query.
Setup the env vars as described below and then run:
yarn
yarn dev
cloudflare-vector-search
uses a Postgres database to store the vectors. We recomment using Neon.
Sign up for a free account and paste in the schema from data/schema.sql
into the SQL editor.
Clone this repo and then set up a Cloudflare Page with their admin.
We expect the following env vars:
DATABASE_URL
: The URL to your Postgres databaseOPENAI_API_KEY
: Your OpenAI API keyAUTH_SECRET
: A secret used to authenticate requests to the API (optional)