-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More elaborate function calls, e.g., batching openai embeddings #57
Comments
You don't include create_statement('call', {'name': 'CALL'}) I can test it out later and get back to you |
It can even be |
@emehrkay Maybe I'm missing something here, but the
How would you set that up? P.S. Whenever I add the "call" the parenthesis are also missing from the resulting cypher query. P.S2. The iterate has this special P.S3. How does yield work with multiple variables? (you can use with YIELD(""), so solved |
I think you can achieve that by extending FuncRaw which doesn't bind its arguments from pypher.builder import FuncRaw
class ApocIterate(FuncRaw):
_CAPITALIZE = False
_ALIASES = ['periodic_iterate', 'apoc_periodic_iterate']
name = 'apoc.periodic.iterate'
class OpenAIEmbedding(FuncRaw):
_CAPITALIZE = False
_ALIASES = ['openai_embedding', 'apoc_ml_openai_embedding']
name = 'apoc.ml.openai.embedding'
p.ApocIterate(
__.MATCH.node("n", labels="Entity").RETURN.n,
__.openai_embedding(__.n.property('category'))
) |
maybe a map would work here |
I'm getting slightly closer, I'm currently using f-strings to format the subqueries for
|
@emehrkay Hi Mark, brief update and thanks for the input earlier. I've landed at my own custom implementation of class Stringify(FuncRaw):
"""Pypher Stringify function.
Custom Pypher function to represent stringification of a Cypher query. This is relevant
for operations such as `apoc.periodic.iterate`, which expects stringified cypher queries
as arguments.
"""
def get_args(self):
"""Function to retrieve args."""
args = []
for arg in self.args:
# NOTE: Allows specifying multiple statements as an array
if isinstance(arg, list):
arg = " ".join([str(el) for el in arg])
if isinstance(arg, (Pypher, Partial)):
arg.parent = self.parent
args.append(f"'{arg}'")
return ", ".join(args)
def __unicode__(self):
"""Unicode function."""
return self.get_args()
def batch_openai_embeddings(api_key, endpoint, attribute, model):
# Register functions
create_function("iterate", {"name": "apoc.periodic.iterate"}, func_raw=True)
create_function("openai_embedding", {"name": "apoc.ml.openai.embedding"}, func_raw=True)
create_function("set_property", {"name": "apoc.create.setProperty"}, func_raw=True)
# Build query
p = Pypher()
# https://neo4j.com/labs/apoc/4.1/overview/apoc.periodic/apoc.periodic.iterate/
p.CALL.iterate(
# Match query
cypher.stringify(cypher.MATCH.node("p", labels="Entity").RETURN.p),
# Query to execute per batch
cypher.stringify(
[
cypher.CALL.openai_embedding(f"[item in $_batch | {'+'.join(f'item.p.{attr}' for attr in features)}]", "$apiKey", "{endpoint: $endpoint, model: $model}").YIELD("index", "text", "embedding"),
cypher.CALL.set_property("$_batch[index].p", "$attribute", "embedding").YIELD("node").RETURN("node"),
]
),
cypher.map(
batchMode="BATCH_SINGLE",
parallel="true",
batchSize=batch_size,
concurrency=50,
params=cypher.map(apiKey=api_key, endpoint=endpoint, attribute=attribute, model=model),
),
).YIELD("batch", "operations")
return p Still feel like there's some hardcoding going on, but did not feel like pushing it any further. Dropping result in-case someone might benefit from it. |
Hello,
I'm trying to codify the following query in Pypher for readability, but I don't seem to get quite far:
I was trying to create custom classes to represent
apoc.ml.openai.embedding
andapoc.periodic.iterate
, but when I do that the "CALL" keyword does not seem to show up in the query. Any recommendations?Returns
Observations:
The text was updated successfully, but these errors were encountered: