Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LG dictionary is represented incorrectly (SQL bug #650) #664

Closed
linas opened this issue Feb 24, 2016 · 1 comment
Closed

LG dictionary is represented incorrectly (SQL bug #650) #664

linas opened this issue Feb 24, 2016 · 1 comment

Comments

@linas
Copy link
Member

linas commented Feb 24, 2016

The current design for representing LG disjuncts is incorrect, for several reasons, explained below. The most urgent of these is that it results in bug #650, and SQL table overflow bug.

To see the issue, do this:

(use-modules (opencog) (opencog nlp lg-dict))
(cog-arity (gdr (gar (lg-get-dict-entry (WordNode "drink")))))

which reports a link with an outgoing set with 8210 members in it. It looks like this, if you print it out:

(SetLink
   (LgWordCset
      (WordNode "drink")
      (LgOr
         (LgConnector
            (LgConnectorNode "XXXGIVEN")
            (LgConnDirNode "+")
         )
         (LgConnector
            (LgConnectorNode "A")
            (LgConnDirNode "-")
            (LgConnMultiNode "@")
         )
         (LgAnd ... ; ... 8208 more entries ....

Besides the SQL bug, there are two other problems, one of which is extremely serious: first, imagine an agent that is trying to learn a language. This requires learning new disjuncts, and adding them to the set; and possibly removing others. For the above, that requires deleting the 8210-element of the LgOr and replacing with with another one that is 8211 or 8209 or whatever. Yuck!

Yucky, but there is also a show-stopper here: the above format makes it completely impossible to track disjunct usage statistics! For example, suppose we want to know how often the connector A- is used with "drink". Where can we store this count? Not in the A- connector, because it is used in many, many different words. And not in the LgWordCset because that doesn't tell you which disjunct is being used. Thus, the above representation must be replaced. I propose the below ...

   (LgDisjunct
      (WordNode "drink")
      (LgConnector
         (LgConnectorNode "XXXGIVEN")
         (LgConnDirNode "+") ))

   (LgDisjunct
      (WordNode "drink")
      (LgConnector
         (LgConnectorNode "A")
         (LgConnDirNode "-")
         (LgConnMultiNode "@") ))

   (LgDisjunct
      (WordNode "drink")
      (LgAnd ...))

; ... 8207 more entries ....

This avoids the huge outgoing set. Adding and removing new disjuncts is trivial. And finally, there is a perfect place to store per-disjunct statistics. (Side comment: it also illustrates why a fast cog-fetch-incoming-set is important: to fetch the above from SQL, one would (cog-fetch-incoming-set (WordNode "drink")) and get the 8210 entries back.)

Also a different problem: I'm guessing that each and every use of the LgOr link type is probably wrong. First, it should have been called LgChoice because its not a logical-OR, its a choice. Next, to be usable, the choices need to be in "disjunctive normal form" (DNF) and, philosophically speaking, the atomspace is a "DNF engine" -- every top-level link is a DNF of all other links. Its what the atomspace is all about. That's a more abstract argument as to why LdDisjunct is "correct", while LgOr is "wrong".

@linas
Copy link
Member Author

linas commented Feb 24, 2016

Oops, wrong component ... see instead bug opencog/opencog#2050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant