Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use SpaceServer to find near objects quickly #2206

Open
vsbogd opened this issue Jun 6, 2019 · 5 comments
Open

How to use SpaceServer to find near objects quickly #2206

vsbogd opened this issue Jun 6, 2019 · 5 comments

Comments

@vsbogd
Copy link
Contributor

vsbogd commented Jun 6, 2019

I have the following problem: there are many objects (few thousands) on the scene. It is required to find closest red object for each object on the scene.

After reading SpaceServer wiki, one could wrote straightforward request:

(Get
  (VariableList (Variable "item") (Variable "other"))
  (And
    (Member (Variable "item") (ConceptNode "scene"))
    (Member (Variable "other") (ConceptNode "scene"))
    (Not (Equal (Variable "item") (Variable "other")))
    (Evaluation (DefinedPredicate "is-red") (Variable "other"))
    (Evaluation (DefinedPredicate "is-near") (Variable "other"))
  )
)

But this query leads to brute-force search through all pairs of item and other and will work slowly for big number of objects.

@linas, what is your vision how this query could be written in effective way?

@vsbogd vsbogd added the question label Jun 6, 2019
@linas
Copy link
Member

linas commented Jul 4, 2019

Sorry for late reply; I just found this now. There are several possibilities, I guess.

One possibility would be to have the SpaceServer support a DefinedSchema that returns a sorted list of objects, from near to far. The search pattern would then use GlobNodes to bracket the tested item in the list: so: (List (Glob "$front") (Variable "$item")(Glob "rest")) (Evaluation (DefinedPredicate "is-red") (Variable "$item")) would apply the is-red predicate to each item in the list. There are at least three problems with this:

  1. Overhead from inserting results of the DefinedSchema into the atomspace, when that's not really needed.
  2. Glob search order is not guaranteed -- the query (List (Glob "$front") (Variable "$item")(Glob "rest")) will eventually explore all possible splits of the list, but its not guaranteed what order they are searched in.
  3. There's no particular way to halt the search after the first match is found.

One easy, but ugly solution is to alter InitiateSearchCB.cc to know about special sorted lists and search them in order. This is "easy" only because InitiateSearchCB.cc already controls the top-level search loop -- it figures out where to start the search, and then runs the very first loop starting at that starting point. This would solve 1,2,3 all at once, but is ... not very pretty.

A better, but harder, more complex, riskier solution is to solve each of these problems individually. The overhead of 1. could probably be avoided with the AtomSpaceLink, issue #1967 but that itself is a huge work item.

Issue 2. could be partly solved by introducing "greedy globs", so if (Glob "rest") is greedy, but (Glob "$front") is not, then (Glob "$front") would become as short as possible, and (Glob "rest") would be as long as possible. There was some work done on greedy globs, before it was realized that they are not needed.

Issue 3. There is currently an API to halt search after N items are found, but its .. hacky. The correct solution would be to implement #1502 but that is also a large complex change.

continued, next post ...

@linas
Copy link
Member

linas commented Jul 4, 2019

So, the above proposal obviously has a lot of problems. So here's another possibility...

Create a new link type NearestLink havng the "typical" syntax (Nearest (Concept "item") (Variable "other")) The pattern matcher is altered to handle this as a special link, similar to the 3 other special cases it handles. These special cases are: ChoiceLink-- explore all possible choices, but match only one. UnorderedLink -- explore all possible permutations of the unordered link, but pick only one. Glob -- explore all possible linear subsequences. -- All three of these special cases have a custom search procedure, consisting of a loop that iterates over the possibilities.

So, with NearestLink, there would again be a special loop, with this loop trying all possible choices, but doing so in order, iterating from nearest to farthest. To obtain this sorted list, it would need to talk to the SpaceServer. How might it talk to the SpaceServer, given that it lives in a different git repo? Well, through Values. Like so -- instead of Nearest we have SmallestLink:

(SmallestLink (Concept "item") (Predicate "key-of-sorted-other-items")(Variable "other"))

During the search, the pattern matcher calls item->Atom::getValue("key-of-sorted-list") which returns a ListValue of Atoms in some sorted order. The list is processed, in order, from first to last. The SpaceServer would need to implement this key -- so that, for any Atom in the SpaceServer, it always has a key called (Predicate "sorted-by-distance") and that getting the Value for that returns a sorted list, valid for that instant in time. The SpaceServer can do this: the octree can find that list in O(log N) time as opposed to point-clouds which need O(N^2) time. -- that's why octrees are used. (As usual, that sorted list is only generated when it is asked for. That is, it's generated on the fly when the code calls ListValue::value() to get that list.)

The nice thing about SmallestLink is that it can work for anything that has a concept of sorted order, and not just SpaceServer. The Key-value access system provides the generic API.

@linas
Copy link
Member

linas commented Jul 4, 2019

Note BTW that the AttentionBank has this same problem: it wants to search atoms in sorted order, according to "attentional focus" (AF), and then halt the search when atoms are below the AF boundary. So itt seems like SmallestLink can solve both the SpaceServer and the AttentionBank problem.

The SmallestLink has some problems. One is to halt pattern search after the first match is found. But this is problem 3. above, and we "already have" a way of fixing that. (Besides #1502 there are also other "nice" ways we could halt the pattern matcher, after enough solutions have been found, e.g. by inventing new predicates that act on the solution-set, as each new solution is found.)

Another problem with SmallestLink, as defined above, is that it expects to get the complete, full sorted list, even though it might not need to search that list. So a better design would be some kind of iterator, that says "get me next-closest item after this one". Since I already claim that Values provide a great way of passing around dynamic data, the question is "how can I do an iterator using values?" ... next post.

@linas
Copy link
Member

linas commented Jul 4, 2019

So, the question is "how can I do iterators in the atomspace?"

It would work like this: SpaceServer provides a special key (PredicateNode "distance-iterator") and when you call item->Atom::getValue("distance-iterator") you get back an IteratorValue, which, in C++ becomes class SpaceServerIteratorValue : public LinkValue {...}; This value is stateful: it remembers it's starting point (held in some private member) and its current location (another private member), and whenever you call iterator->value() it internally increments it's current location, and returns that.

The way you'd use it with the SmallestLink would be (SmallestLink (Concept "item") (PredicateNode "iterate-by-disance") (Variable "$next)) Inside of PatternMatchEngine.cc is something like this:

if (SMALLEST_LINK == h->getType()) {
    Handle start = h->getOutgoing (0);
    Handle key = h->getOutgoing(1);
    Handle var = h->getOutgoing(2);
    LinkValuePtr iter = start->getValue(key)
    while(1) {
           Handle next = iter->value();
            if (nullptr = next) break;
           explore_grounding (var, next);
    }
}

I think that's exactly enough info to allow the space server to iterate by distance. So:

  • We already know (a priori) that start is an Atom in the SpaceServer, so it is guaranteed to have a key "iterate by distance". That is, the SpaceServer called start->setValue(key, new DistanceIterator()) to set it up.
  • Actually the DistanceIterator constructor takes args: DistanceIterator(Handle start, SpaceServer*) so that it can cache everything it needs.
  • Because DistanceIterator already knows which space server to ask, and what it's last value was, it can iterate just fine.

If you are diligent with coding, you could even support patterns with (SmallestLink (Variable "item") (Variable "iter") (Variable "$next)) and have that work, and have patterns with more than one SmallestLink (maybe. ... I guess searching over two SmallestLinks would be ... tricky.) There are probably several details I missed, but I think the above can be made to work. For proof-of-concept/tutorial, RandomValue and the current implementation of the SpaceServerValue (that returns three floats for position).

@vsbogd
Copy link
Contributor Author

vsbogd commented Jul 5, 2019

Thanks @linas, a lot of information here, I need some time to read it carefully and answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants