-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the canonical way to write queries to knowledge graph #40
Comments
additional request: how to efficiently write the function "predecessor", which might be more difficult to wrap in a single match in comparison to "aunt". The non-efficient, and rather slow implementation:
|
This works much faster:
|
Yes. The single match should be much faster. The remaining questions are as following:
|
@astroseger what is _predecessor? |
Why do you think it's unnecessary? How will you define who's aunt you're going to get without it? |
@noskill please see the second comment in this issue. |
I meant that probably we should have one argument function get_aunt($x) which return the list of aunts for person $x. |
The idea of using two arguments was to create a kind of relation between objects, i.e. a function should work in both ways: ;Who are Jim's aunts? ;Who are Ann's nephews? Though it's not working in the current implementation |
Yes, I understand. But I think it is slightly misleading. For example what do we expect for _aunt(Bill_a_real_man_not_a_female $x) ? I think it is better to have two one argument functions. |
This should not reduce, or return Empty atom |
I understand. But I think it is sligthly misleading. I propose to return to the main questions:
|
I would propose this (= (_get_aunt $x) (match &self (, (parent $q $x) |
|
Nice idea! It is not possible at the moment. It can be emulated by introducing additional
|
There was plan to add "union" into query language then predecessor would look like |
In context of MeTTa "canonical" to me is a way to right down some knowledge in a way which allows converting it into any form which can be useful for the further inference. To me it looks like both implementations yours and @mvpeterson write the knowledge in a good enough way. Analyzing queries a bit simple than analyzing |
But the implementation with nested match (see first comment in this issue) is highly inefficient with the current metta interpreter. So I assume we should not put it into tutorial. |
I'm not sure, but it seems it will have multiple matches as well, so it might have the same performance issue as the initial implementation (see the second comment to this issue). In order to efficiently implement _predecessor we need some sort of graph walk. So very roughly speaking the complexity to go from person to his/her parents should be O(1), not O(N) as in implementation with multiple matches. |
I rewritten the code into more compositional form, please check if it's good enough.
|
predecessor predicate
|
This almost for sure will have the same performance issue as an initial code. Any implementation with multiply matches will have problem problems, if I understand correctly. @vsbogd am I right? |
I don't see any performance issue with Anatoly's code above. First query will find all parents, second fill find all parents of the parents. It is a graph traversal implementation. |
@vsbogd @noskill about performance issues. I think there is some misunderstanding. I'm not sure exactly how the match is implemented, but in the worst case each match in the query has O(N) complexity. If it is true then the query with 4 matches will have O(N^4) complexity. It is difficult to verify just because execution time is way too long. Let me demonstrate my point with experiments. Let's take the following base:
So we simply take the initial knowledge base 10 times.
Let's calculate the execution time for three implementation of aunt function.
I'm afraid for large knowledge graphs the time for "1" and "3" will grow as O(N^3) (because it contains 3 matches). @vsbogd I think it also explains why there is a performance issue with _predecessor. |
16 families = 0,942 So 2x data = 2x more computations, it's O(1) complexity, no? |
What are you testing? From number it is your implementation with one match. I'm talking about problem with multiple matches. |
I am testing on files generated with this code:
|
I generated aunt2.metta ..aunt256.metta
and measured the execution time. |
It seems you are using old rust implementation. It is true that old metta does not have this issue. I'm not sure how it manages to execute multiple matches in linear time, maybe because of caching (which causes some problems by the way). But what I'm talking about is the minimal metta, sorry for misunderstanding. |
@noskill @vsbogd I agree that in principle the complexity of query with three nested matches might be only O(3*N) not O(N^3), for such a structure for a graph. But execution time tells another story. It is execution times for the first implementation of aunt, for different number of families (numbers for size 10 were discussed here #40 (comment)).
You see that it is not quite O(N^3) but not linear for large N as well. And N=160 is not a huge size of graph. I'm talking about minimal metta. For rust version numbers are much smaller. |
i got with minima metta: 16 10 |
I fit a parabola with these numbers 14.15686 - 0.2071869*x + 0.01622045 * x**2 with x=512 estimation is 4160 and real time 4142 |
Also metta took 8GB of RAM on 512 families |
So good news that it is not O(N^3) :) |
@vsbogd @noskill I have a very surprising result. Now I can say that I completely do not understand the reason why the “_aunt” (and "_sister" as well) query takes so long. I've tried to measure time for recursive _predecessor defined in #40 (comment) . And it is not quadratic, it is not linear it seems to be close to O(1) ! So the performance problem is not directly caused by the nested match. I have the following metta file (example for N=10):
The execution time for different N:
If we calculate the time for simply reading metta file with N=100000 (by replacing the call of _predessesor with print) we will get 28.968... |
I took the JSONs, converted them in a generic fashion to MeTTa files, used a MeTTa script to convert that into the above format, added @astroseger implementation, and added some simplified implementations: Note this is similar to my work in This basic stuff also works in FormalMeTTa: CZ2 is built for this kind of stuff, and it executes every query on every person in the popular royal92 ancestral graph in 100ms. It took some more effort to write the queries, but it is able to mimic custom graph databases with optimal traversals, and in theory, we can generate that from a match statement: |
The time of the Complexity of the original example #40 (comment) is big because of complexity of the
It is called with assigning specific value to For example re-writing it in this way:
gives us complexity which depends only on a number of siblings of
|
@Adam-Vandervorst do you have benchmark results for the sergey_rodionov_formulation.metta on the databases of different sizes? |
I really appreciate idea to represent such metta code as a composition of the queries and thus be able to optimize it on the atomspace level. In general to make querying code more effective one should prefer the order of queries which decreases amount of data to be processed after each query. E.g. |
@vsbogd I let it run for over a day, but then I canceled the HE benchmark of royal92. Specifically, I was looping over all people and all queries, but I can rewrite it in a compound query that says |
One more suggestion for the code in the description of the issue. Using
|
Thanks @Adam-Vandervorst , I asked because it is interesting if |
indeed moving (_female $x) to the last condition helps:
|
Working on it @vsbogd ! The intuition for why CZ2 is able to optimize (beyond just the complexity, also taking into account the effective occurrences) is that it can access the counts of prefix compressed occurrences. This is definitely not what differentiates it. In fact, a good optimization for HE would be to exploit the Trie index to provide the same knowledge. E.g. getting the total count of the star |
Three benchmarks are still running, but here are the results that are already in @vsbogd |
Let's we have the following "knowledge graph"
What is the canonical way to write the following request to this knowledge graph:
(it would be great if aunt is somehow be defined via sister and parent_of)
It is possible to write the following implementation inspired by Prolog (@mvpeterson ):
But this implementation is extremely slow.
So what would be the metta way here?
@Necr0x0Der @vsbogd
The text was updated successfully, but these errors were encountered: