Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Account for
FILTER
s when considering greedy query planning (#1705)
Since #1442, QLever switches to greedy query planning for large connected components. A connected component is considered large when the number of connected subgraphs is above the threshold determined by the runtime parameter `query-planning-budget`. So far, `FILTER`s were simply ignored when counting the number of subgraphs. However, `FILTER`s can add significant complexity to the standard query planning because for each subplan, our query planner considers either adding all applicable `FILTER`s to it or none of them. As a result, for certain queries with a medium-sized component but a significant number of `FILTER`s, the query planning complexity was underestimated and the query was not planned greedily and the standard query planning took very long. This is now fixed by replacing, for the purpose of query planning, each `FILTER` by a dummy `VALUES` clause which uses the set of distinct variables from the `FILTER`. A `FILTER` that has many variables in common with other triples will then increase the subgraph count substantially. If multiple `FILTER`s have the same set of distinct variables, the dummy `VALUES` clause is added only once (because our query planner either adds all applicable `FILTER`s at a certain point or none of them). Note that this trick overestimates the true query planning complexity. That is, the worst that can happen now is that with many `FILTER`s, we switch to greedy planning even though standard query planning would have still been feasible,
- Loading branch information