Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the canonical way to write queries to knowledge graph #40

Open
astroseger opened this issue Mar 13, 2024 · 44 comments
Open

What is the canonical way to write queries to knowledge graph #40

astroseger opened this issue Mar 13, 2024 · 44 comments
Labels
tutorial Tutorial question

Comments

@astroseger
Copy link

astroseger commented Mar 13, 2024

Let's we have the following "knowledge graph"

(parent Tom Bob)
(parent Pam Bob)
(parent Tom Liz)
(parent Bob Ann)
(parent Bob Pat)
(parent Pat Jim)
(female Pam)
(male Tom)
(male Bob)
(female Liz)
(female Pat)
(female Ann)
(male Jim)

What is the canonical way to write the following request to this knowledge graph:

  • Who is parents of $x
  • Who is mother of $x
  • Who is sister of $x
  • Who is aunt of $x
  • etc...

(it would be great if aunt is somehow be defined via sister and parent_of)

It is possible to write the following implementation inspired by Prolog (@mvpeterson ):

(= (_parent $x $y) (match &self (parent $x $y) $x))
(= (_male $x) (match &self (male $x) $x))
(= (_female $x) (match &self (female $x) $x))

(= (_different $x $y) 
        ( if (== $x $y) 
            Empty
            $x
        )
)
(= (_sister $x $y) 
        ( let* ( ($q (_female $x)) 
                ($r (_parent (_parent $z $q) $y))
             ) 
             (_different $q $y) 
        )
)

(= (_aunt $x $y) (_sister $x (_parent $q $y)))

!(_aunt $x Jim)
; It will return correct answer [Ann]

But this implementation is extremely slow.
So what would be the metta way here?
@Necr0x0Der @vsbogd

@astroseger
Copy link
Author

astroseger commented Mar 13, 2024

additional request: how to efficiently write the function "predecessor", which might be more difficult to wrap in a single match in comparison to "aunt".

The non-efficient, and rather slow implementation:

(parent Tom Bob)
(parent Pam Bob)
(parent Tom Liz)
(parent Bob Ann)
(parent Bob Pat)
(parent Pat Jim)
(parent Jim Lil)


(= (_parent $x $y) (match &self (parent $x $y) $x))
(= (_predecessor $x $z) (_parent $x $z))
(= (_predecessor $x $z) (_predecessor $x (_parent $y $z)))

!(_predecessor $x Lil)

@astroseger astroseger added the tutorial Tutorial question label Mar 13, 2024
@noskill
Copy link
Contributor

noskill commented Mar 13, 2024

This works much faster:

(= (_aunt $aunt $y) (match &self (, (parent $q $y)
                                 (female $aunt)
                                 (parent $p $q)
                                 (parent $p $aunt)) 
                                 
                                 (_different $aunt $q )
                                 )
                                 )

@astroseger
Copy link
Author

astroseger commented Mar 13, 2024

This works much faster:

(= (_aunt $aunt $y) (match &self (, (parent $q $y)
                                 (female $aunt)
                                 (parent $p $q)
                                 (parent $p $aunt)) 
                                 
                                 (_different $aunt $q )
                                 )
                                 )

Yes. The single match should be much faster. The remaining questions are as following:

  1. How to write _predecessor in this way?
  2. Is it possible to somehow define "aunt" via "sister" and "parent"? To have some sort of composition?
  3. How to most gracefully remove the unnecessary second argument for "aunt"?
  4. And last but not least. Is it a canonical and proposed way to write such queries? I think this question is for @Necr0x0Der and @vsbogd .

@noskill
Copy link
Contributor

noskill commented Mar 14, 2024

@astroseger what is _predecessor?

@mvpeterson
Copy link

mvpeterson commented Mar 14, 2024

  1. How to most gracefully remove the unnecessary second argument for "aunt"?

Why do you think it's unnecessary? How will you define who's aunt you're going to get without it?
Or you mean the first argument?

@astroseger
Copy link
Author

@astroseger what is _predecessor?

@noskill please see the second comment in this issue.

@astroseger
Copy link
Author

astroseger commented Mar 14, 2024

  1. How to most gracefully remove the unnecessary second argument for "aunt"?

Why do you think it's unnecessary? How will you define who's aunt you're going to get without it? Or you mean the first argument?

I meant that probably we should have one argument function get_aunt($x) which return the list of aunts for person $x.

@mvpeterson
Copy link

mvpeterson commented Mar 14, 2024

(= (_sister $x $y)
( let* ( ($q (_female $x))
($r (_parent (_parent $z $q) $y))
)
(_different $q $y)
)
)

The idea of using two arguments was to create a kind of relation between objects, i.e. a function should work in both ways:

;Who are Jim's aunts?
(_aunt $x Jim)

;Who are Ann's nephews?
(_aunt Ann $x)

Though it's not working in the current implementation

@astroseger
Copy link
Author

(= (_sister $x $y)
( let* ( ($q (_female $x))
($r (_parent (_parent $z $q) $y))
)
(_different $q $y)
)
)

The idea of using two arguments was to create a kind of relation between objects, i.e. a function should work in both ways:

;Who are Jim's aunts? (_aunt $x Jim)

;Who are Ann's nephews? (_aunt Ann $x)

Though it's not working in the current implementation

Yes, I understand. But I think it is slightly misleading. For example what do we expect for _aunt(Bill_a_real_man_not_a_female $x) ? I think it is better to have two one argument functions.

@mvpeterson
Copy link

mvpeterson commented Mar 14, 2024

_aunt(Bill_a_real_man_not_a_female $x)

This should not reduce, or return Empty atom

@astroseger
Copy link
Author

astroseger commented Mar 14, 2024

_aunt(Bill_a_real_man_not_a_female $x)

This should not reduce, or return Empty atom

I understand. But I think it is sligthly misleading. I propose to return to the main questions:

  1. How to write _predecessor in this way?
  2. Is it possible to somehow define "aunt" via "sister" and "parent"? To have some sort of composition?
  3. How to most gracefully remove the unnecessary second argument for "aunt"? (@mvpeterson thinks it is not required)
  4. And last but not least. Is it a canonical and proposed way to write such queries? I think this question is for Alexey and Vitaly.

@mvpeterson
Copy link

3. How to most gracefully remove the unnecessary second argument for "aunt"? (@mvpeterson thinks it is not required)

I would propose this

(= (_get_aunt $x) (match &self (, (parent $q $x)
(parent $z $q)
(parent $z $a)
(female $a)
)
(_different $a $q)
))

@vsbogd
Copy link
Contributor

vsbogd commented Mar 14, 2024

  1. How to most gracefully remove the unnecessary second argument for "aunt"? (@mvpeterson thinks it is not required)

(= (_aunt $y) ...)

@vsbogd
Copy link
Contributor

vsbogd commented Mar 14, 2024

  1. Is it possible to somehow define "aunt" via "sister" and "parent"? To have some sort of composition?

Nice idea! It is not possible at the moment. It can be emulated by introducing additional ...-query definitions and composing them in a final query. But it is not elegant and in this simple example looks excessive:

(= (parent-query $x $y) (parent $x $y))
(= (_parent $x $y) (match &self (parent-query $x $y) $x))
(= (_aunt $aunt $y) (match &self (, (parent-query $q $y) ...)

@vsbogd
Copy link
Contributor

vsbogd commented Mar 14, 2024

  1. How to write _predecessor in this way?

There was plan to add "union" into query language then predecessor would look like (union (_parent $y $z) (_predecessor $x (_parent $y $z)))

@vsbogd
Copy link
Contributor

vsbogd commented Mar 14, 2024

  1. And last but not least. Is it a canonical and proposed way to write such queries? I think this question is for Alexey and Vitaly.

In context of MeTTa "canonical" to me is a way to right down some knowledge in a way which allows converting it into any form which can be useful for the further inference. To me it looks like both implementations yours and @mvpeterson write the knowledge in a good enough way. Analyzing queries a bit simple than analyzing matches but difference is not critical to me.

@astroseger
Copy link
Author

  1. And last but not least. Is it a canonical and proposed way to write such queries? I think this question is for Alexey and Vitaly.

In context of MeTTa "canonical" to me is a way to right down some knowledge in a way which allows converting it into any form which can be useful for the further inference. To me it looks like both implementations yours and @mvpeterson write the knowledge in a good enough way. Analyzing queries a bit simple than analyzing matches but difference is not critical to me.

But the implementation with nested match (see first comment in this issue) is highly inefficient with the current metta interpreter. So I assume we should not put it into tutorial.

@astroseger
Copy link
Author

  1. How to write _predecessor in this way?

There was plan to add "union" into query language then predecessor would look like (union (_parent $y $z) (_predecessor $x (_parent $y $z)))

I'm not sure, but it seems it will have multiple matches as well, so it might have the same performance issue as the initial implementation (see the second comment to this issue).

In order to efficiently implement _predecessor we need some sort of graph walk. So very roughly speaking the complexity to go from person to his/her parents should be O(1), not O(N) as in implementation with multiple matches.

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

@astroseger

I rewritten the code into more compositional form, please check if it's good enough.
Works reasonably fast on a KB with 100 families (5 seconds on my computer)

(: and-seq (-> Atom Atom Bool))
(= (and-seq $first $second)
    (if $first $second False))

                    
                    
(= (_parent $x $y) (unify &self (parent $x $y) True False))
(= (_male $x) (unify &self (male $x) True False))
(= (_female $x) (unify &self (female $x) True False))
(= (_mother $x $y) (and-seq
                            (_parent $x $y)
                            (_female $x)))
(= (_father $x $y) (and-seq
                            (_parent $x $y)
                            (_male $x)))

(= (_sister $x $y) 
    (and-seq
            (_female $x)
            (and-seq 
                (_parent $p $y)
                (and-seq (_parent $p $x)
                         (not (eq $x $y))
                )
            )
    )
)



(= (same $X $X) True)
(= (eq $X $Y)
    (let $C (same $X $Y) (if (== $C True) True False)))

(= (_aunt $aunt $y)
    (and-seq 
        (_parent $p $y)
        (_sister $aunt $p))
)
                                 
   
                                 
                                 
!(let $r (_aunt $x Jim0) (if $r $x (empty)))

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

predecessor predicate

(= (_predecessor $x $z) (_parent $x $z))
(= (_predecessor $x $z) (and-seq (_parent $p $z)
                                 (_predecessor $x $p)))

@astroseger
Copy link
Author

@astroseger

I rewritten the code into more compositional form, please check if it's good enough. Works reasonably fast on a KB with 100 families (5 seconds on my computer)

(: and-seq (-> Atom Atom Bool))
(= (and-seq $first $second)
    (if $first $second False))

                    
                    
(= (_parent $x $y) (unify &self (parent $x $y) True False))
(= (_male $x) (unify &self (male $x) True False))
(= (_female $x) (unify &self (female $x) True False))
(= (_mother $x $y) (and-seq
                            (_parent $x $y)
                            (_female $x)))
(= (_father $x $y) (and-seq
                            (_parent $x $y)
                            (_male $x)))

(= (_sister $x $y) 
    (and-seq
            (_female $x)
            (and-seq 
                (_parent $p $y)
                (and-seq (_parent $p $x)
                         (not (eq $x $y))
                )
            )
    )
)



(= (same $X $X) True)
(= (eq $X $Y)
    (let $C (same $X $Y) (if (== $C True) True False)))

(= (_aunt $aunt $y)
    (and-seq 
        (_parent $p $y)
        (_sister $aunt $p))
)
                                 
   
                                 
                                 
!(let $r (_aunt $x Jim0) (if $r $x (empty)))

predecessor predicate

(= (_predecessor $x $z) (_parent $x $z))
(= (_predecessor $x $z) (and-seq (_parent $p $z)
                                 (_predecessor $x $p)))

This almost for sure will have the same performance issue as an initial code. Any implementation with multiply matches will have problem problems, if I understand correctly. @vsbogd am I right?

@vsbogd
Copy link
Contributor

vsbogd commented Mar 15, 2024

Any implementation with multiply matches will have problem problems, if I understand correctly. @vsbogd am I right?

I don't see any performance issue with Anatoly's code above. First query will find all parents, second fill find all parents of the parents. It is a graph traversal implementation.

@astroseger
Copy link
Author

astroseger commented Mar 15, 2024

@vsbogd @noskill about performance issues. I think there is some misunderstanding. I'm not sure exactly how the match is implemented, but in the worst case each match in the query has O(N) complexity. If it is true then the query with 4 matches will have O(N^4) complexity. It is difficult to verify just because execution time is way too long. Let me demonstrate my point with experiments.

Let's take the following base:

(parent Pam0 Bob0)
(parent Tom0 Liz0)
(parent Bob0 Ann0)
(parent Bob0 Pat0)
(parent Pat0 Jim0)
....
(parent Tom9 Bob9)
(parent Pam9 Bob9)
(parent Tom9 Liz9)
(parent Bob9 Ann9)
(parent Bob9 Pat9)
(parent Pat9 Jim9)
(female Pam9)
(male Tom9)
(male Bob9)
(female Liz9)
(female Pat9)
(female Ann9)
(male Jim9)

So we simply take the initial knowledge base 10 times.
We calculate the following expression

!(_aunt $x Jim0)

Let's calculate the execution time for three implementation of aunt function.

  1. Initial implementation from @mvpeterson (What is the canonical way to write queries to knowledge graph #40 (comment)) - 24.610s
  2. First Implementation from @noskill with one match (What is the canonical way to write queries to knowledge graph #40 (comment)) - 0.499s
  3. Second implementation from @noskill with compositionality but with mulitply matches (What is the canonical way to write queries to knowledge graph #40 (comment)) - 36.029s

I'm afraid for large knowledge graphs the time for "1" and "3" will grow as O(N^3) (because it contains 3 matches).

@vsbogd I think it also explains why there is a performance issue with _predecessor.
See also my comments in here, about what I think we need for efficient implementation of _predesessor #40 (comment)

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

@astroseger

16 families = 0,942
32 families = 0m1,801s
64 families = 3,545s
128 families = 7,583s
256 families = 15,412s

So 2x data = 2x more computations, it's O(1) complexity, no?

@astroseger
Copy link
Author

@astroseger

16 families = 0,942 32 families = 0m1,801s 64 families = 3,545s 128 families = 7,583s 256 families = 15,412s

So 2x data = 2x more computations, it's O(1) complexity, no?

What are you testing? From number it is your implementation with one match. I'm talking about problem with multiple matches.

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

I am testing on files generated with this code:

def generate_parents(N):
    for i in range(N):
        print(f"(parent Tom{i} Bob{i})")
        print(f"(parent Pam{i} Bob{i})")
        print(f"(parent Tom{i} Liz{i})")
        print(f"(parent Bob{i} Ann{i})")
        print(f"(parent Bob{i} Pat{i})")
        print(f"(parent Pat{i} Jim{i})")
        print(f"(female Pam{i})")
        print(f"(male Tom{i})")
        print(f"(male Bob{i})")
        print(f"(female Liz{i})")
        print(f"(female Pat{i})")
        print(f"(female Ann{i})")
        print(f"(male Jim{i})")
        
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('N', type=int, help='Size')
args = parser.parse_args()

N = args.N
generate_parents(N)
s = """


(: and-seq (-> Atom Atom Bool))
(= (and-seq $first $second)
    (if $first $second False))

                    
                    
(= (_parent $x $y) (unify &self (parent $x $y) True False))
(= (_male $x) (unify &self (male $x) True False))
(= (_female $x) (unify &self (female $x) True False))
(= (_mother $x $y) (and-seq
                            (_parent $x $y)
                            (_female $x)))
(= (_father $x $y) (and-seq
                            (_parent $x $y)
                            (_male $x)))

(= (_sister $x $y) 
    (and-seq
            (_female $x)
            (and-seq 
                (_parent $p $y)
                (and-seq (_parent $p $x)
                         (not (eq $x $y))
                )
            )
    )
)



(= (same $X $X) True)
(= (eq $X $Y)
    (let $C (same $X $Y) (if (== $C True) True False)))

(= (_aunt $aunt $y)
    (and-seq 
        (_parent $p $y)
        (_sister $aunt $p))
)
                                 
   
                                 
                                 
!(let $r (_aunt $x Jim0) (if $r $x (empty)))
"""

print(s)

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

I generated aunt2.metta ..aunt256.metta

python3 generate_parents.py 256 > aunt256.metta

and measured the execution time.

@astroseger
Copy link
Author

astroseger commented Mar 15, 2024

I am testing on files generated with this code:

def generate_parents(N):
    for i in range(N):
        print(f"(parent Tom{i} Bob{i})")
        print(f"(parent Pam{i} Bob{i})")
        print(f"(parent Tom{i} Liz{i})")
        print(f"(parent Bob{i} Ann{i})")
        print(f"(parent Bob{i} Pat{i})")
        print(f"(parent Pat{i} Jim{i})")
        print(f"(female Pam{i})")
        print(f"(male Tom{i})")
        print(f"(male Bob{i})")
        print(f"(female Liz{i})")
        print(f"(female Pat{i})")
        print(f"(female Ann{i})")
        print(f"(male Jim{i})")
        
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('N', type=int, help='Size')
args = parser.parse_args()

N = args.N
generate_parents(N)
s = """


(: and-seq (-> Atom Atom Bool))
(= (and-seq $first $second)
    (if $first $second False))

                    
                    
(= (_parent $x $y) (unify &self (parent $x $y) True False))
(= (_male $x) (unify &self (male $x) True False))
(= (_female $x) (unify &self (female $x) True False))
(= (_mother $x $y) (and-seq
                            (_parent $x $y)
                            (_female $x)))
(= (_father $x $y) (and-seq
                            (_parent $x $y)
                            (_male $x)))

(= (_sister $x $y) 
    (and-seq
            (_female $x)
            (and-seq 
                (_parent $p $y)
                (and-seq (_parent $p $x)
                         (not (eq $x $y))
                )
            )
    )
)



(= (same $X $X) True)
(= (eq $X $Y)
    (let $C (same $X $Y) (if (== $C True) True False)))

(= (_aunt $aunt $y)
    (and-seq 
        (_parent $p $y)
        (_sister $aunt $p))
)
                                 
   
                                 
                                 
!(let $r (_aunt $x Jim0) (if $r $x (empty)))
"""

print(s)

It seems you are using old rust implementation. It is true that old metta does not have this issue. I'm not sure how it manages to execute multiple matches in linear time, maybe because of caching (which causes some problems by the way). But what I'm talking about is the minimal metta, sorry for misunderstanding.

@astroseger
Copy link
Author

astroseger commented Mar 15, 2024

@noskill @vsbogd I agree that in principle the complexity of query with three nested matches might be only O(3*N) not O(N^3), for such a structure for a graph. But execution time tells another story.

It is execution times for the first implementation of aunt, for different number of families (numbers for size 10 were discussed here #40 (comment)).

1 6.863
2 11.313
3 15.766
4 20.521
10 52.280
20 118.611
40 304.049
80 1311.03
160 3407.684

You see that it is not quite O(N^3) but not linear for large N as well. And N=160 is not a huge size of graph.

I'm talking about minimal metta. For rust version numbers are much smaller.

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

i got with minima metta:

16 10
32 26
64 75
128 248
256 1025

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

I fit a parabola with these numbers 14.15686 - 0.2071869*x + 0.01622045 * x**2

with x=512 estimation is 4160 and real time 4142

@noskill
Copy link
Contributor

noskill commented Mar 15, 2024

Also metta took 8GB of RAM on 512 families

@astroseger
Copy link
Author

I fit a parabola with these numbers 14.15686 - 0.2071869*x + 0.01622045 * x**2

with x=512 estimation is 4160 and real time 4142

So good news that it is not O(N^3) :)

@astroseger
Copy link
Author

@vsbogd @noskill I have a very surprising result. Now I can say that I completely do not understand the reason why the “_aunt” (and "_sister" as well) query takes so long. I've tried to measure time for recursive _predecessor defined in #40 (comment) . And it is not quadratic, it is not linear it seems to be close to O(1) ! So the performance problem is not directly caused by the nested match.

I have the following metta file (example for N=10):

(parent g1_0 g2_0)
(parent g2_0 g3_0)
(parent g3_0 g4_0)
(parent g4_0 g5_0)
....
(parent g1_9 g2_9)
(parent g2_9 g3_9)
(parent g3_9 g4_9)
(parent g4_9 g5_9)

(= (_parent $x $y) (match &self (parent $x $y) $x))
(= (_predecessor $x $z) (_parent $x $z))
(= (_predecessor $x $z) (_predecessor $x (_parent $y $z)))

!(_predecessor $x g5_0)

The execution time for different N:

1  1.351
10 1.358
100 1.357
1000 1.562
10000 4.033
100000 33.215

If we calculate the time for simply reading metta file with N=100000 (by replacing the call of _predessesor with print) we will get 28.968...

Adam-Vandervorst added a commit to Adam-Vandervorst/metta-examples that referenced this issue Mar 15, 2024
@Adam-Vandervorst
Copy link
Collaborator

Adam-Vandervorst commented Mar 16, 2024

Some real ancestor graphs

I took the JSONs, converted them in a generic fashion to MeTTa files, used a MeTTa script to convert that into the above format, added @astroseger implementation, and added some simplified implementations:
metta-examples/aunt-kg

Note this is similar to my work in
metta-examples/traverse

This basic stuff also works in FormalMeTTa:
FormalMeTTa/aunt-kg

CZ2 is built for this kind of stuff, and it executes every query on every person in the popular royal92 ancestral graph in 100ms. It took some more effort to write the queries, but it is able to mimic custom graph databases with optimal traversals, and in theory, we can generate that from a match statement:
CZ2 spacewide ops code and complexity analysis
CZ2 benchmark results

@vsbogd
Copy link
Contributor

vsbogd commented Mar 18, 2024

I've tried to measure time for recursive _predecessor defined in #40 (comment) . And it is not quadratic, it is not linear it seems to be close to O(1)

The time of the match doesn't depend on the size of the base in this simplistic case when query result can be found using prefix. Thus it is expected that time of the _predecessor doesn't depend on a size of irrelevant data but depend on a number of predecessors in a family tree.

Complexity of the original example #40 (comment) is big because of complexity of the _sister:

(= (_sister $x $y) 
  (let* (
    ($q (_female $x))
    ($r (_parent (_parent $z $q) $y))
  ) (_different $q $y) ))

It is called with assigning specific value to $y. First condition in let is (_female $x) it means we are collecting all females from knowledge base and it gives us * N to complexity. Then (_parent $z $q) gives us * 2, and finally we checking all parents of all females against parents of the person under the question.

For example re-writing it in this way:

(= (_parent $x $y) (match &self (parent $x $y) $x))
(= (_child $x $y) (match &self (parent $x $y) $y))
(= (_female $x) (match &self (female $x) $x))
(= (_different $x $y) ( if (== $x $y) Empty $x))
(= (_sister $x $y)
  (let* (
    ($r (_parent $_ $y))
    ($q (_child $r $__))
    ($q (_female $q))
  ) (_different $q $y) ))

!(_aunt $x Jim0)

gives us complexity which depends only on a number of siblings of $y:

N=1 time=2.639s
N=10 time=2.634s

@vsbogd
Copy link
Contributor

vsbogd commented Mar 18, 2024

CZ2 benchmark results

@Adam-Vandervorst do you have benchmark results for the sergey_rodionov_formulation.metta on the databases of different sizes?

@vsbogd
Copy link
Contributor

vsbogd commented Mar 18, 2024

I really appreciate idea to represent such metta code as a composition of the queries and thus be able to optimize it on the atomspace level. In general to make querying code more effective one should prefer the order of queries which decreases amount of data to be processed after each query. E.g. _child | _female instead of _female | _child.

@Adam-Vandervorst
Copy link
Collaborator

@vsbogd I let it run for over a day, but then I canceled the HE benchmark of royal92.
On the Lord of the Rings graph, HE took 55 minutes to complete the basic_formulation, IIRC.
I was running these tests manually, but I will slightly change the tests so they're easier to do automatically.

Specifically, I was looping over all people and all queries, but I can rewrite it in a compound query that says (= (person) (match &self (male $x) $x)); (= (person) (match &self (female $x) $x)); and chain that into the different ancestor relations.

@vsbogd
Copy link
Contributor

vsbogd commented Mar 18, 2024

For example re-writing it in this way:

One more suggestion for the code in the description of the issue. Using unify one can use single predicate _parent to return both parents and children via arguments. For example _aunt version using unify:

(= (_parent $x $y) (unify &self (parent $x $y) $x Empty))
(= (_female $x) (unify &self (female $x) $x $Empty))
(= (_different $x $y) ( if (== $x $y) Empty $x))
(= (_sister $x $y) ( let* (($r (_parent $r $y)) ($r (_parent $r $q)) ($q (_female $q)) ) (_different $q $y) ))
(= (_aunt $x $y)  (_sister $x (_parent $q $y)))

!(_aunt $x Jim0)

@vsbogd
Copy link
Contributor

vsbogd commented Mar 18, 2024

@vsbogd I let it run for over a day, but then I canceled the HE benchmark of royal92.
On the Lord of the Rings graph, HE took 55 minutes to complete the basic_formulation, IIRC.

Thanks @Adam-Vandervorst , I asked because it is interesting if CZ2 can optimize complexity in case of ineffective query like _female filtered by _parent. I think the simplest way to find out is to run examples provided by @astroseger and see how time changes with N growing.

@noskill
Copy link
Contributor

noskill commented Mar 18, 2024

indeed moving (_female $x) to the last condition helps:

(= (_sister $x $y) 
    (and-seq
            ;  (_female $x)
            (and-seq 
                (_parent $p $y)
                (and-seq (_parent $p $x)
                         (not (eq $x $y))
                )
            )
            (_female $x)
    ))

@Adam-Vandervorst
Copy link
Collaborator

Working on it @vsbogd ! The intuition for why CZ2 is able to optimize (beyond just the complexity, also taking into account the effective occurrences) is that it can access the counts of prefix compressed occurrences. This is definitely not what differentiates it. In fact, a good optimization for HE would be to exploit the Trie index to provide the same knowledge. E.g. getting the total count of the star * in (female *) is "free".

@Adam-Vandervorst
Copy link
Collaborator

Adam-Vandervorst commented Mar 20, 2024

Three benchmarks are still running, but here are the results that are already in @vsbogd
https://github.com/Adam-Vandervorst/metta-examples/tree/main/aunt-kg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tutorial Tutorial question
Projects
None yet
Development

No branches or pull requests

5 participants