Parse your SQL queries and represent their structure as a graph.
Currently, we implement the ability of representing how each of the tables in a set of SQL query scripts depend on each other.
MATCH p=(r:Reporting)-[:HAS_TABLE_DEPENDENCY]->()-[:HAS_TABLE_DEPENDENCY]->()
WHERE r.table_name='user_activity'
RETURN p
To run the code in here, ensure your system meets the following requirements:
- Unix-like operating system (macOS, Linux, ...) - though it might work on Windows;
- Python 3.8 or above; and
- Poetry installed.
direnv
installed, including shell hooks;
For quickstart set-up of the project, run the below in your shell/terminal:
# 1. read project-specific environment variables
direnv allow
# 2. activate virtual environment and install package dependencies
poetry shell
poetry install
# 3. check adherence to good standards on every commit
pre-commit install
To then extract the tables and their dependencies from the example SQL scripts in the sql/
directory, run the below in your shell/terminal. It will generate .csv
files of the tables and their dependencies. It will also generate .cypher
files to enable you to import the data into neo4j, after you have added the .csv
files to the database.
python sqlquerygraph.py -sd 'sql' -ed 'neo4j' -rd '<datasets, individually quoted and separated by commas, of tables in sql/ scripts>'
We use neo4j for this project to visualise the dependencies between tables. To install neo4j locally using Docker Compose, follow the below instructions:
-
Install and open Docker (if already installed, just open the program).
- For Mac OSX, install Docker and Docker Compose together here.
- For Linux, install Docker here and then follow these instructions to install docker-compose.
- For Windows, install Docker and Docker Compose together here.
-
Create a new file,
.secrets
, in the directory where thisREADME.md
file sits, and store the following in there. This allows you to set the password for your local neo4j instance without exposing it.export NEO4J_AUTH=neo4j/<your_password> export NEO4J_USERNAME=neo4j export NEO4J_PASSWORD=<your_password>
Then update your
.env
file to take in the new.secrets
file you created by entering the below in your shell/terminal:direnv allow
-
Build the Docker image and launch the container. Within this directory that has the
docker-compose.yml
file, run the below in your shell/terminal:docker-compose up
You will know when it's ready when you get the following message in your terminal:
app | [INFO wait] Host [neo4j:7687] is now available! app | [INFO wait] -------------------------------------------------------- app | [INFO wait] docker-compose-wait - Everything's fine, the application can now start! app | [INFO wait] --------------------------------------------------------
Then launch neo4j locally via opening your web browser and entering the following web address:
The username and password will those specified in your
.secrets
file. -
When you have finished playing with your local neo4j instance, remember to stop it running by executing the below in your shell/terminal:
docker-compose down # option if you want to delete data too docker-compose down --volumes
This builds on the excellent moz-sql-parser package.
With thanks also to the Google Cloud Public Dataset Program for which the SQL queries in this repo are based off the program's GitHub repos dataset.