Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better transitive path planning #1824

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
fc06556
Start basic query plan optimization
RobinTF Feb 14, 2025
de35715
Avoid stacking Sorts on top of each other
RobinTF Feb 19, 2025
5f1bf1c
Clone tree before double planning
RobinTF Feb 20, 2025
6d60f23
Add unit tests
RobinTF Feb 20, 2025
756c2c7
Fix column mapping issue
RobinTF Feb 21, 2025
6557abb
Separate concerns again
RobinTF Feb 21, 2025
1896f7f
Apply some PR comments
RobinTF Feb 21, 2025
e86c080
Expand tests to check multiple budgets
RobinTF Feb 21, 2025
9699068
Add comments and `JoinColumns` alias
RobinTF Feb 21, 2025
1b75e84
Extract to helper function
RobinTF Feb 21, 2025
764b15c
Fix typo
RobinTF Feb 21, 2025
aa01e0b
Fix sonarcloud issue
RobinTF Feb 21, 2025
2809c52
Merge remote-tracking branch 'ad-freiburg/master' into better-transit…
RobinTF Feb 26, 2025
0d20eb4
Make member const
RobinTF Feb 26, 2025
80b2972
Refactor code to be more readable
RobinTF Feb 26, 2025
1e1e570
Also move into join if variables are distinct
RobinTF Feb 26, 2025
7c2edf7
Return const references
RobinTF Feb 26, 2025
8ad0d88
Only apply optimization for transitive path
RobinTF Feb 26, 2025
a5f4fd9
Correct id merge
RobinTF Feb 26, 2025
a39ca0e
Only check `UNION` for children
RobinTF Feb 26, 2025
1285bae
Add mirrored test case to increase coverage
RobinTF Feb 26, 2025
944bff4
Merge remote-tracking branch 'ad-freiburg/master' into better-transit…
RobinTF Mar 3, 2025
4ab8289
Add alias
RobinTF Mar 5, 2025
b26adfa
Merge remote-tracking branch 'ad-freiburg/master' into better-transit…
RobinTF Mar 5, 2025
1824974
Address PR comments and add warning
RobinTF Mar 5, 2025
1b53c2f
Add another unit test for bound case
RobinTF Mar 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/engine/QueryExecutionTree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,14 @@ std::shared_ptr<QueryExecutionTree> QueryExecutionTree::createSortedTree(
return qet;
}

// Unwrap sort to avoid stacking sorts on top of each other.
if (auto sort = std::dynamic_pointer_cast<Sort>(qet->getRootOperation())) {
AD_LOG_WARN << "Tried to re-sort a subtree that will already be sorted "
"with `Sort` with a different sort order. This is a bug."
<< std::endl;
Comment on lines +169 to +171
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a bug per se but
The reason is possibly a missed optimization in the query planner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please rephrase this, so we don't get that many complaints of innocent users.

qet = sort->getSubtree();
}

QueryExecutionContext* qec = qet->getRootOperation()->getExecutionContext();
auto sort = std::make_shared<Sort>(qec, std::move(qet), sortColumns);
return std::make_shared<QueryExecutionTree>(qec, std::move(sort));
Expand Down
245 changes: 183 additions & 62 deletions src/engine/QueryPlanner.cpp

Large diffs are not rendered by default.

39 changes: 26 additions & 13 deletions src/engine/QueryPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ class QueryPlanner {
std::optional<Variable> activeGraphVariable_;

public:
using JoinColumns = std::vector<std::array<ColumnIndex, 2>>;

explicit QueryPlanner(QueryExecutionContext* qec,
CancellationHandle cancellationHandle);

Expand Down Expand Up @@ -330,30 +332,42 @@ class QueryPlanner {
const vector<SubtreePlan>& b,
const TripleGraph& tg) const;

std::vector<QueryPlanner::SubtreePlan> createJoinCandidates(
// Create `SubtreePlan`s that join `a` and `b` together. The columns are
// computed automatically.
std::vector<SubtreePlan> createJoinCandidates(
const SubtreePlan& a, const SubtreePlan& b,
boost::optional<const TripleGraph&> tg) const;

// Create `SubtreePlan`s that join `a` and `b` together. The columns are
// configured by `jcs`.
std::vector<SubtreePlan> createJoinCandidates(const SubtreePlan& a,
const SubtreePlan& b,
const JoinColumns& jcs) const;

// Whenever a join is applied to a `Union`, add candidates that try applying
// join to the children of the union directly, which can be more efficient if
// one of the children has an optimized join, which can happen for
// `TransitivePath` for example.
std::vector<SubtreePlan> applyJoinDistributivelyToUnion(
const SubtreePlan& a, const SubtreePlan& b, const JoinColumns& jcs) const;

// Used internally by `createJoinCandidates`. If `a` or `b` is a transitive
// path operation and the other input can be bound to this transitive path
// (see `TransitivePath.cpp` for details), then returns that bound transitive
// path. Else returns `std::nullopt`
// path. Else returns `std::nullopt`.
static std::optional<SubtreePlan> createJoinWithTransitivePath(
SubtreePlan a, SubtreePlan b,
const std::vector<std::array<ColumnIndex, 2>>& jcs);
const SubtreePlan& a, const SubtreePlan& b, const JoinColumns& jcs);

// Used internally by `createJoinCandidates`. If `a` or `b` is a
// `HasPredicateScan` with a variable as a subject (`?x ql:has-predicate
// <VariableOrIri>`) and `a` and `b` can be joined on that subject variable,
// then returns a `HasPredicateScan` that takes the other input as a subtree.
// Else returns `std::nullopt`.
static std::optional<SubtreePlan> createJoinWithHasPredicateScan(
SubtreePlan a, SubtreePlan b,
const std::vector<std::array<ColumnIndex, 2>>& jcs);
const SubtreePlan& a, const SubtreePlan& b, const JoinColumns& jcs);

static std::optional<SubtreePlan> createJoinWithPathSearch(
const SubtreePlan& a, const SubtreePlan& b,
const std::vector<std::array<ColumnIndex, 2>>& jcs);
const SubtreePlan& a, const SubtreePlan& b, const JoinColumns& jcs);

// Helper that returns `true` for each of the subtree plans `a` and `b` iff
// the subtree plan is a spatial join and it is not yet fully constructed
Expand All @@ -364,9 +378,9 @@ class QueryPlanner {
// if one of the inputs is a spatial join which is compatible with the other
// input, then add that other input to the spatial join as a child instead of
// creating a normal join.
static std::optional<SubtreePlan> createSpatialJoin(
const SubtreePlan& a, const SubtreePlan& b,
const std::vector<std::array<ColumnIndex, 2>>& jcs);
static std::optional<SubtreePlan> createSpatialJoin(const SubtreePlan& a,
const SubtreePlan& b,
const JoinColumns& jcs);

vector<SubtreePlan> getOrderByRow(
const ParsedQuery& pq,
Expand All @@ -391,8 +405,7 @@ class QueryPlanner {
bool connected(const SubtreePlan& a, const SubtreePlan& b,
const TripleGraph& graph) const;

static std::vector<std::array<ColumnIndex, 2>> getJoinColumns(
const SubtreePlan& a, const SubtreePlan& b);
static JoinColumns getJoinColumns(const SubtreePlan& a, const SubtreePlan& b);

string getPruningKey(const SubtreePlan& plan,
const vector<ColumnIndex>& orderedOnColumns) const;
Expand Down
4 changes: 2 additions & 2 deletions src/engine/TransitivePathImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ struct TableColumnWithVocab {
// See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103909 for more info.
TableColumnWithVocab(const IdTable* table, ColumnType column,
LocalVocab vocab)
: table_{table}, column_{std::move(column)}, vocab_{std::move(vocab)} {};
: table_{table}, column_{std::move(column)}, vocab_{std::move(vocab)} {}
};
}; // namespace detail

Expand All @@ -50,7 +50,7 @@ class TransitivePathImpl : public TransitivePathBase {
TransitivePathSide leftSide, TransitivePathSide rightSide,
size_t minDist, size_t maxDist)
: TransitivePathBase(qec, std::move(child), std::move(leftSide),
std::move(rightSide), minDist, maxDist){};
std::move(rightSide), minDist, maxDist) {}

/**
* @brief Compute the transitive hull with a bound side.
Expand Down
7 changes: 7 additions & 0 deletions src/engine/Union.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,13 @@ std::vector<ColumnIndex> Union::computePermutation() const {
return permutation;
}

// _____________________________________________________________________________
std::optional<ColumnIndex> Union::getOriginalColumn(
bool leftChild, ColumnIndex unionColumn) const {
ColumnIndex column = _columnOrigins.at(unionColumn).at(!leftChild);
return column == NO_COLUMN ? std::nullopt : std::optional{column};
}

// _____________________________________________________________________________
IdTable Union::transformToCorrectColumnFormat(
IdTable idTable, const std::vector<ColumnIndex>& permutation) const {
Expand Down
16 changes: 16 additions & 0 deletions src/engine/Union.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,22 @@ class Union : public Operation {
return {_subtrees[0].get(), _subtrees[1].get()};
}

// Provide access the the left child of this union.
const std::shared_ptr<QueryExecutionTree>& leftChild() const {
return _subtrees[0];
}

// Provide access the the right child of this union.
const std::shared_ptr<QueryExecutionTree>& rightChild() const {
return _subtrees[1];
}

// Return the original index of the column in the left or right child that the
// respective column of this union maps to. If the index does not map to the
// respective child, std::nullopt is returned.
std::optional<ColumnIndex> getOriginalColumn(bool leftChild,
ColumnIndex unionColumn) const;

private:
std::unique_ptr<Operation> cloneImpl() const override;

Expand Down
187 changes: 187 additions & 0 deletions test/QueryPlannerTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3050,3 +3050,190 @@ TEST(QueryPlanner, UnconnectedComponentsInGraphClause) {
h::CartesianProductJoin(h::IndexScanFromStrings("?s1", "?p1", "?o1"),
h::IndexScanFromStrings("?s2", "?p2", "?o2")));
}

// _____________________________________________________________________________
TEST(QueryPlanner, testDistributiveJoinInUnion) {
auto* qec = ad_utility::testing::getQec();
TransitivePathSide left1{std::nullopt, 0,
Variable("?_QLever_internal_variable_qp_0"), 0};
TransitivePathSide left2{std::nullopt, 0,
Variable("?_QLever_internal_variable_qp_7"), 0};
TransitivePathSide right{std::nullopt, 1, Variable("?type"), 1};
std::string query =
"SELECT * WHERE {\n"
" <Q11629> <P279>/(<P279>*|<P31>*) | <P31>/(<P279>*|<P31>*) ?type .\n"
"}";

h::expectWithGivenBudgets(
std::move(query),
h::Union(
h::Union(
h::TransitivePath(
left1, right, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("<Q11629>", "<P279>",
"?_QLever_internal_variable_qp_0"),
h::IndexScanFromStrings("?_QLever_internal_variable_qp_2",
"<P279>",
"?_QLever_internal_variable_qp_3")),
h::TransitivePath(
left1, right, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("<Q11629>", "<P279>",
"?_QLever_internal_variable_qp_0"),
h::IndexScanFromStrings("?_QLever_internal_variable_qp_4",
"<P31>",
"?_QLever_internal_variable_qp_5"))),
h::Union(
h::TransitivePath(
left2, right, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("<Q11629>", "<P31>",
"?_QLever_internal_variable_qp_7"),
h::IndexScanFromStrings("?_QLever_internal_variable_qp_9",
"<P279>",
"?_QLever_internal_variable_qp_10")),
h::TransitivePath(
left2, right, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("<Q11629>", "<P31>",
"?_QLever_internal_variable_qp_7"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_11", "<P31>",
"?_QLever_internal_variable_qp_12")))),
qec, {4, 16, 64'000'000});

TransitivePathSide left3{std::nullopt, 0, Variable("?s"), 0};
TransitivePathSide right2{std::nullopt, 1, Variable("?y"), 1};

h::expectWithGivenBudgets(
"SELECT * WHERE { ?s <P31> ?o . { ?s <P279>+ ?y } UNION { VALUES ?x { 1 "
"} }}",
h::Union(
h::TransitivePath(left3, right2, 1,
std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("?s", "<P31>", "?o"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_0", "<P279>",
"?_QLever_internal_variable_qp_1")),
h::CartesianProductJoin(h::IndexScanFromStrings("?s", "<P31>", "?o"),
h::ValuesClause("VALUES (?x) { (1) }"))),
qec, {4, 16, 64'000'000});

h::expectWithGivenBudgets(
"SELECT * WHERE { { VALUES ?x { 1 } } UNION { ?s <P279>+ ?y } . "
"?s <P31> ?o }",
h::Union(
h::CartesianProductJoin(h::ValuesClause("VALUES (?x) { (1) }"),
h::IndexScanFromStrings("?s", "<P31>", "?o")),
h::TransitivePath(std::move(left3), std::move(right2), 1,
std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("?s", "<P31>", "?o"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_0", "<P279>",
"?_QLever_internal_variable_qp_1"))),
qec, {4, 16, 64'000'000});
}

// _____________________________________________________________________________
TEST(QueryPlanner, ensurePlanningIsSkippedWhenNoTransitivePathIsPresent) {
auto qp = makeQueryPlanner();
{
auto query = SparqlParser::parseQuery(
"SELECT * WHERE { ?x <P31> ?o ."
"{ VALUES ?x { 1 } } UNION { VALUES ?x { 1 } }}");
auto plans = qp.createExecutionTrees(query);
ASSERT_EQ(plans.size(), 1);
EXPECT_TRUE(
std::dynamic_pointer_cast<Join>(plans.at(0)._qet->getRootOperation()));
}
{
auto query = SparqlParser::parseQuery(
"SELECT * WHERE { ?x <P31> ?o . "
"{ { VALUES ?x { 1 } } UNION { VALUES ?x { 1 } } } "
"UNION "
"{ { VALUES ?x { 1 } } UNION { VALUES ?x { 1 } } } }");
auto plans = qp.createExecutionTrees(query);
ASSERT_EQ(plans.size(), 1);
EXPECT_TRUE(
std::dynamic_pointer_cast<Join>(plans.at(0)._qet->getRootOperation()));
}
}

// _____________________________________________________________________________
TEST(QueryPlanner, ensurePlanningIsSkippedWhenTransitivePathIsAlreadyBound) {
auto qp = makeQueryPlanner();
auto query = SparqlParser::parseQuery(
"SELECT * { { VALUES ?x { 1 } } UNION { ?s <P279>+ 1 } . ?s <P31> ?o }");
auto plans = qp.createExecutionTrees(query);
ASSERT_EQ(plans.size(), 1);
EXPECT_TRUE(
std::dynamic_pointer_cast<Join>(plans.at(0)._qet->getRootOperation()));
}

// _____________________________________________________________________________
TEST(QueryPlanner, testDistributiveJoinInUnionRecursive) {
auto* qec = ad_utility::testing::getQec(
"<a> <P279> <b> . <c> <P279> <d> . <e> <P279> <f> . <g> <P279> <h> ."
" <i> <P279> <j> . <a> <P31> <b> . <c> <P31> <d> . <e> <P31> <f> ."
" <g> <P31> <h> . <i> <P31> <j> .");
TransitivePathSide left1{std::nullopt, 2,
Variable("?_QLever_internal_variable_qp_0"), 0};
TransitivePathSide left2{std::nullopt, 0,
Variable("?_QLever_internal_variable_qp_4"), 0};
TransitivePathSide left3{std::nullopt, 0,
Variable("?_QLever_internal_variable_qp_13"), 0};
TransitivePathSide right1{std::nullopt, 1, Variable("?type"), 1};
TransitivePathSide right2{std::nullopt, 1,
Variable("?_QLever_internal_variable_qp_3"), 1};
TransitivePathSide right3{std::nullopt, 1,
Variable("?_QLever_internal_variable_qp_12"), 1};
std::string query =
"SELECT * WHERE {\n"
" <Q11629> "
" <P279>/((<P279>/(<P279>*|<P31>*))*|(<P31>/(<P279>*|<P31>*))*)"
" ?type .\n"
"}";

h::expectWithGivenBudgets(
std::move(query),
h::Union(h::TransitivePath(
left1, right1, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("<Q11629>", "<P279>",
"?_QLever_internal_variable_qp_0"),
h::Sort(h::Union(
h::TransitivePath(
left2, right2, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_2", "<P279>",
"?_QLever_internal_variable_qp_4"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_6", "<P279>",
"?_QLever_internal_variable_qp_7")),
h::TransitivePath(
left2, right2, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_2", "<P279>",
"?_QLever_internal_variable_qp_4"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_8", "<P31>",
"?_QLever_internal_variable_qp_9"))))),
h::TransitivePath(
left1, right1, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings("<Q11629>", "<P279>",
"?_QLever_internal_variable_qp_0"),
h::Sort(h::Union(
h::TransitivePath(
left3, right3, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_11", "<P31>",
"?_QLever_internal_variable_qp_13"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_15", "<P279>",
"?_QLever_internal_variable_qp_16")),
h::TransitivePath(
left3, right3, 0, std::numeric_limits<size_t>::max(),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_11", "<P31>",
"?_QLever_internal_variable_qp_13"),
h::IndexScanFromStrings(
"?_QLever_internal_variable_qp_17", "<P31>",
"?_QLever_internal_variable_qp_18")))))),
qec, {4, 16, 64'000'000});
}
20 changes: 12 additions & 8 deletions test/QueryPlannerTestHelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,16 @@ void expectWithGivenBudget(std::string query, auto matcher,
EXPECT_THAT(qet, matcher);
}

// Same as `expectWithGivenBudget` but allows multiple budgets to be tested.
void expectWithGivenBudgets(std::string query, auto matcher,
std::optional<QueryExecutionContext*> optQec,
std::vector<size_t> queryPlanningBudgets,
source_location l = source_location::current()) {
for (size_t budget : queryPlanningBudgets) {
expectWithGivenBudget(query, matcher, optQec, budget, l);
}
}

// Same as `expectWithGivenBudget` above, but always use the greedy query
// planner.
void expectGreedy(std::string query, auto matcher,
Expand All @@ -505,13 +515,7 @@ void expectDynamicProgramming(
void expect(std::string query, auto matcher,
std::optional<QueryExecutionContext*> optQec = std::nullopt,
source_location l = source_location::current()) {
auto e = [&](size_t budget) {
expectWithGivenBudget(query, matcher, optQec, budget, l);
};
e(0);
e(1);
e(4);
e(16);
e(64'000'000);
expectWithGivenBudgets(std::move(query), std::move(matcher),
std::move(optQec), {0, 1, 4, 16, 64'000'000}, l);
}
} // namespace queryPlannerTestHelpers
Loading