#MuSQLE ##Multi-Engine SQL Query Execution Over Spark SQL
#Instructions ##Installation
- Clone the project
git clone https://github.com/gsvic/MuSQLE.git
- Build it with maven
mvn install
- Start a Spark Shell by including the MuSQLE's jar
spark-shell --jars MuSQLE-1.0-SNAPSHOT-jar-with-dependencies.jar
- Create a MuSQLEContext instance and run an example TPC-DS query
import gr.cslab.ece.ntua.musqle.MuSQLEContext
import gr.cslab.ece.ntua.musqle.benchmarks.tpcds.FixedQueries
val mc = new MuSQLEContext(spark)
val q = FixedQueries.queries(0)._2
val mq = mc.query(q)
/** See the execution plan */
mq.explain
which results into the following execution plan:
Join [result9, result5] on Set(1) , Engine: [SparkSQL], Cost: [2.6401074374777824], [result10]
Move [result3] from PostgreSQL to SparkSQL, Cost 0.07390589783733907 [result9]
*Scan MuSQLEScan: date_dim, Engine: [PostgreSQL], Cost: [0.07390589783733907], [result3]
Join [result4, result1] on Set(2) , Engine: [SparkSQL], Cost: [2.5662015396404434], [result5]
Move [result2] from PostgreSQL to SparkSQL, Cost 2.5662015396404434 [result4]
*Scan MuSQLEScan: store_sales, Engine: [PostgreSQL], Cost: [2.5662015396404434], [result2]
*Scan MuSQLEScan: item, Engine: [SparkSQL], Cost: [0.0], [result1]
##Adding tables to Catalog
- Create a folder named 'catalog' in the project's parent directory.
- A table can be defined as a file with .mt extension. Two example files are provided:
PostgreSQL
{"tableName":"date_dim","tablePath":"date_dim","engine":"postgres"}
Parquet file in HDFS
{"tableName":"item","tablePath":"hdfs://master:9000/tpcds/1/item","engine":"spark","format":"parquet"}
Also, a table can be added by calling the add
method of MuSQLEContext
mc.add("customer", "hdfs://master:9000/tpcds1/customer", "spark", "parquet")