-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/sql parsing #157
base: main
Are you sure you want to change the base?
Feat/sql parsing #157
Conversation
Fixed pylint issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just working on something else and realized: why would you import parameterized? Why not use "@pytest.mark.parametrize" instead? That is one less dependency?
==> This test would fail because the backquotes are not removed. I solved this one as:
|
Also gives an error. |
And the following 2 do not fail correctly when they should:
|
Wow, this is fantastic @tools4origins . And thanks for the prompt review @svaningelgem . On the SQL sub package, I'll let you two discuss. I am currently very busy. For me the only concerns are changes to setup.py that I would like to understand why they are necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too me, everything looks good.
I have only two comments about dependencies. Once resolved, I'll wait for your ready signal to merge.
setup.py
Outdated
@@ -31,14 +36,15 @@ | |||
'backports.tempfile==1.0rc1', | |||
'cloudpickle>=0.1.0', | |||
'futures>=3.0.1', | |||
'pylint', | |||
'pylint~=2.7', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added restrictive version requirements on all dependencies. I once again had several tests fail due to a change in the behavior of multiple dependencies (both isort and pylint). The tests did not behave the same on my laptop with an older version of isort/pylint and in the CI where the latest versions were downloaded.
Unless there are specific tests for multiple versions of a dependency, I don't think we should assume we support it.
setup.py
Outdated
'pylzma', | ||
'memory-profiler>=0.47', | ||
'pycodestyle', | ||
'pytest', | ||
'pytest-cov', | ||
'isort', | ||
'tornado>=4.3', | ||
'parameterized>=0.7.4', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question as @svaningelgem . Is it possible to avoid this extra dependency using pytest mechanisms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around and did not find something that would do what I was trying to achieve. It is very useful as there are a lot of cases to test for many SQL functions so I really appreciate the added value of the lib
This PR implements an ANTLR-based logic to parse SQL.
ANTLR is a parser generator used by Apache Spark. As such, we are able to use the exact spark SQL syntax.
SQL strings are converted into an abstract syntax tree (AST) by this project https://github.com/pysparkling/python-sql-parser.
These AST are then converted into Pysparkling object using the pysparkling/sql/ast/ast_to_python.py module, in particular via its entry points parse_xxx, e.g.
parse_sql
orparse_schema
.This PR only exposes SQL parsing on SQL schemas via a refactoring of
StructType.fromDDL
.It also contains part of the logic that will be used to handle other types of SQL statements.