Bachelor thesis: PL/SQL Analysis

Matěj Čermák

Abstract

The goal of this thesis is to build standalone platform to facilitate PL/SQL query analysis.
Core of the project is a ANTLR based parser that has an ability to extract useful metadata (tables, columns, keywords used) from the queries.
The resulting objects are then indexed to Elasticsearch which allows one to explore the data in Kibana using its features like dashboards, search, filtering and much more.

Possible use cases

Project can be integrated in your solution, consult the Architecture section for more.

Datastore analysis - Large amounts of data can be indexed into Elasticsearch (with proper cluster management). Therefore one can load large datasets and explore the data in a structured way.
Optimizing queries - When information about execution time of the query is provided to the API, one can discover queries with poor performance and optimize/avoid using them.
Time-based analysis - Query objects in Elastic carries information about time as well, that can be valuable when tracking down the source of high database load or simply just to visualize the amount of incoming queries over time. Useful as well when trying to locate queries that were executed in the certain time period.

Features

Kibana > Dashboards

Some of the projects default visualizations & dashboards are loaded to Kibana automatically, but various other can be made easily.
Dasboards can be listed by clicking [Dashboards] in the left panel.

Kibana > Discover

Discover allows you to explore your data more in-depth, various searches can be made on the data in no time. Discover as well offers filtering based on categorical columns or time. Try it by clicking [Discover] on the left panel.
Refer to elastic documentation here, which might help you to customize your searches in Kibana. Here are some examples of useful searches.

employee - basic fulltext search

"FROM employee" - search literal part of the query

tokens.metadata :"uses_SortBy" - queries that use SORT BY clause

tokens.columns : "employee_id" - queries that use column

Prerequisites

Project utilizes Docker compose to run all the services needed. See the individual components description if you want to use it for your own project.

Docker installation with at least 3GB of RAM - see Docker docs.

Usage

cd ./src
docker-compose up -d

After all services starts, they can be accessed here:

API endpoint - localhost:8080
Kibana - localhost:5601
Elasticsearch - localhost:9200

Architecture

Parser API

Data input was left for one to implement for particular use-case, therefore project provides an API endpoint where the raw queries can be send. That means you can easily integrate it in your application. Feel free just to run the parser and integrate it with your running ES cluster by starting only its service via docker compose or use APIs dockerfile included. API was done with the help of Swagger and its code-generation tools and its running on Flask. Documentation for the API can be found here.

Elasticsearch

Elasticsearch is a distributed search-engine based on Lucene and together with Kibana it happened to be a perfect tool for this purpose. I left ES with default settings and only a single node, but docker files can be easily extended with cluster of nodes. Consult sebp/elk documentation for more information.

Development process

Optimizing parser

Parser is based on ANTLR PL/SQL project which utilizes PL/SQL grammar for ANTLR and builds relation trees from queries. Goal of the project was to process resulting tree and extract only the valuable information. Since any grammar is not 100% accurate (non-standard syntax used, unsupported keywords when query language develop, combined queries etc.) the has been some pitfalls when processing the tree and a lot of logic had to be implemented. Some complex queries took really long time to process and were blocking the pipeline, therefore I confirmed on the testing dataset my assumption that there is a correlation between length and processing time, so I decided to drop

Testing

After I made the first prototype I started writing tests using pytest. I simply made up various queries and the defined the expected value of the metadata for them. Later when I tested sufficient amount of queries I used larger datasets containing various queries to test the parser and checking the output manually.

Dataset crawler

Finding perfect dataset to test the parser, one that contains variety of queries as well as large enough dataset for analysis purposes, was not a easy task. For that purpose I made simple web crawler, that can crawl all the query examples from whole PL/SQL Oracle documentation and clean the data before sending bulk to API for testing purposes. For cleaning I used basic methods for string maniplation (striping, joining, removing) and then filtering the dataset using predefined keywords.

Developer notes

Links

Building ANTLR grammar

Antlr JAR library

java -jar antlr-4.8-complete.jar -Dlanguage=Python3 -o antlr_py/ -visitor plsql.g4

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
doc/images		doc/images
src		src
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bachelor thesis: PL/SQL Analysis

Abstract

Possible use cases

Features

Kibana > Dashboards

Kibana > Discover

Prerequisites

Usage

Architecture

Parser API

Elasticsearch

Development process

Optimizing parser

Testing

Dataset crawler

Developer notes

Links

Building ANTLR grammar

About

Releases

Packages

Languages

cermatej/plsql-analysis

Folders and files

Latest commit

History

Repository files navigation

Bachelor thesis: PL/SQL Analysis

Abstract

Possible use cases

Features

Kibana > Dashboards

Kibana > Discover

Prerequisites

Usage

Architecture

Parser API

Elasticsearch

Development process

Optimizing parser

Testing

Dataset crawler

Developer notes

Links

Building ANTLR grammar

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages