Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support standard UNION/INTERSECT syntax #248

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dinodork
Copy link

The SQL standard allows parentheses around all query expressions (SELECTs) in a UNION. The Bison parser cannot currently handle queries like (SELECT 1) UNION (SELECT 1), let alone ((SELECT 1) UNION (SELECT 1)).

@dey4ss
Copy link
Member

dey4ss commented Sep 3, 2024

Just approved to run the actions, I will review your PR in the next week. I noticed that the gcc-6 stage fails because manually cloning does not work with forked repositories, I will see if I can provide a fix for that.

@dinodork
Copy link
Author

dinodork commented Sep 3, 2024

Just approved to run the actions, I will review your PR in the next week. I noticed that the gcc-6 stage fails because manually cloning does not work with forked repositories, I will see if I can provide a fix for that.

Thanks a lot. Since you're on it, I removed to last commit where I was trying to fix that issue, not doing a very good job at it.

@dey4ss
Copy link
Member

dey4ss commented Sep 12, 2024

The CI stage should work now if you merge the lastest update to master.

@dinodork
Copy link
Author

The CI stage should work now if you merge the lastest update to master.

Thanks. All green now!

Copy link
Member

@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick pass with some easy-to-fix things, e.g., code style. I will need some more time for an in-depth review because the changes are quite heavy. I appreciate that you kept the naming and grammar closer to the SQL standard, which should make it easier to review.

@@ -70,7 +70,7 @@ $(SRCPARSER)/flex_lexer.o: $(SRCPARSER)/flex_lexer.cpp $(SRCPARSER)/bison_parser
$(CXX) $(LIB_CFLAGS) -c -o $@ $< -Wno-sign-compare -Wno-unneeded-internal-declaration -Wno-register

%.o: %.cpp $(PARSER_CPP) $(LIB_H)
$(CXX) $(LIB_CFLAGS) -c -o $@ $<
$(CXX) $(LIB_CFLAGS) $(CXXFLAGS) -c -o $@ $<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use additonal flags in our base repo builds. Unless there's a reason to have them here, I'd ask you to remove them.

Copy link
Author

@dinodork dinodork Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: CFLAGS/CXXFLAGS is a Makefile convention, for example for a program

https://en.wikipedia.org/wiki/CFLAGS
https://www.gnu.org/software/make/manual/make.html#index-CFLAGS

#include <iostream>

int main() {
#ifdef FOO
  std::cout << "Foo\n";
#endif
}

and, using make's default rule:

$ CXXFLAGS=-DFOO make -B cflagtest && ./cflagtest 
g++ -DFOO    cflagtest.cpp   -o cflagtest
Foo

It's not that common to have hand-written Makefiles that call other hand-written Makefiles - you normally let a build system such as CMake or meson do that for you - but when you do, you need to make to sure you follow this convention and pass along CFLAGS/CXXFLAGS. You can of course choose your own names for them if you want, but that would be confusing IMHO.

@@ -137,7 +137,7 @@ test: $(TEST_BUILD)

$(TEST_BUILD): $(TEST_ALL) $(LIB_BUILD)
@mkdir -p $(BIN)/
$(CXX) $(TEST_CFLAGS) $(TEST_CPP) -o $(TEST_BUILD) -lsqlparser -lstdc++
$(CXX) $(CXXFLAGS) $(TEST_CFLAGS) $(TEST_CPP) -o $(TEST_BUILD) -lsqlparser -lstdc++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see above)

@@ -63,6 +63,8 @@
// %output "bison_parser.cpp"
// %defines "bison_parser.h"

%expect 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html

In other words, bails out with an error if there's a shift/reduce conflict.

$$->setOperations->back()->resultLimit = $5;
$$->lockings = $6;
};
select_statement:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pick here: The indentation does not match the other code. Could ypu please fix that (e.g., by calling the format.sh script)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the formatting script has a bug where it places a space after % in a %prec declaration (see last commit)

$$ = $2;
}

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knit-picky: We either use C-style comments with an asterisk * on each line or C++-style single line comments.

Copy link
Author

@dinodork dinodork Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*nit-pick ;)

Done.

Comment on lines 883 to 885
query_term:
query_primary
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knit-picky: we usually put short declarations on a single line. I also see no reason to have a non-terminal that only forwards to another one.

Comment on lines 887 to 892
subquery:
'(' query_expression ')'
{
$$ = $2;
}
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer the following style:

Suggested change
subquery:
'(' query_expression ')'
{
$$ = $2;
}
;
subquery: '(' query_expression ')' {
$$ = $2;
};

(or even a one-liner)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

((SELECT 1)) UNION ((SELECT 1));
(((SELECT 1)) UNION ((SELECT 1)));
((((SELECT 1)) UNION ((SELECT 1))));
# JOIN# JOIN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# JOIN# JOIN
# JOIN

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fixed.

@@ -18,7 +18,35 @@ SELECT * FROM t WHERE a = ? AND b = ?;
SELECT City.name, Product.category, SUM(price) FROM fact INNER JOIN City ON fact.city_id = City.id INNER JOIN Product ON fact.product_id = Product.id GROUP BY City.name, Product.category;
SELECT SUBSTR(a, 3, 5) FROM t;
SELECT * FROM t WHERE a = DATE '1996-12-31';
# JOIN
SELECT 1 UNION SELECT 1;
Copy link
Member

@dey4ss dey4ss Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added a lot of things that I would love to see tested more properly. Could you please add more complex examples with actual subqueries? I would also be interested in some edge cases that are not on the happy path, e.g., in queries-bad.sql. I also encourage you to think of a thorough test case in select_tests.cpp.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I actually broke union-in-subquery syntax. I added a new commit that addresses it.

Martin Hansson added 5 commits September 19, 2024 16:39
- Renamed most rules starting with select_ to the names used in the SQL
  standard, <query expression>, <query expression body>, <query term>, <query
  primary> and <subquery>, respectively.

- Added %expect 0

- The previous rule for UNION/INTERSECT syntax was right-recursive, and so the
  parse tree is right-deep. This patch only changes the rule, you don't want
  right-recursion in an LALR parser, and moreover it's not standard. It doesn't
  change the parse tree structure, so we need an extra wrinkle to build the
  parse tree top-down.

- Indentation follows the most common standard, found in MySQL and Postgres.

Future work: INTERSECT has higher priority than UNION in the standard, but these
rules give them the same priority.
refactoring, we now have to add only a single rule to support them
all. Quite surprising, there are not shift/reduce conflicts, which
are known to occur in both the MySQL and Postgres parsers for this
construction.
syntax that worked. A union without parentheses was actually
working. Now we're making the whole thing work. This introduces an
ambiguity, though, so we fix that with a %prec declaration.
@dinodork
Copy link
Author

dinodork commented Sep 19, 2024

@dey4ss Conflicts resolved, please feel free to start the CI workflow.

@dinodork
Copy link
Author

dinodork commented Oct 7, 2024

Bump.

@dinodork
Copy link
Author

Bump #2 @dey4ss

@Bouncner
Copy link
Contributor

Bouncner commented Nov 1, 2024

Hi @dinodork, we currently lack the time to review this PR properly. We would like to merge it, but not without proper reviewing and that might take some more time. Please bear with us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants