Skip to content

Squall Cluster Configs

Aleksandar Vitorovic edited this page Jan 9, 2015 · 9 revisions

We will explain the content of a config file on squall-$VERSION/test/squall/confs/cluster/1G_hyracks:

DIP_DISTRIBUTED true
DIP_QUERY_NAME hyracks

DIP_TOPOLOGY_NAME_PREFIX username
DIP_DATA_ROOT /export/home/squalldata/tpchdb/
DIP_SQL_ROOT ../test/squall/sql_queries/
DIP_SCHEMA_PATH ../test/squall/schemas/tpch.txt

# DIP_DB_SIZE is in GBs
DIP_DB_SIZE 1

########################################
#DIP_OPTIMIZER_TYPE INDEX_SIMPLE
#DIP_MAX_SRC_PAR 1

#DIP_OPTIMIZER_TYPE INDEX_RULE_BUSHY
#DIP_MAX_SRC_PAR 1

#DIP_OPTIMIZER_TYPE NAME_MANUAL_PAR_LEFTY
#DIP_PLAN CUSTOMER:2,ORDERS:3:4

#DIP_OPTIMIZER_TYPE NAME_MANUAL_COST_LEFTY
#DIP_PLAN CUSTOMER,ORDERS
#DIP_TOTAL_SRC_PAR 20

#DIP_OPTIMIZER_TYPE NAME_RULE_LEFTY
#DIP_TOTAL_SRC_PAR 20

DIP_OPTIMIZER_TYPE NAME_COST_LEFTY
DIP_TOTAL_SRC_PAR 20

########################################

#below are unlikely to change
DIP_EXTENSION .tbl
DIP_READ_SPLIT_DELIMITER \|
DIP_GLOBAL_ADD_DELIMITER |
DIP_GLOBAL_SPLIT_DELIMITER \|

DIP_ACK_EVERY_TUPLE true
DIP_KILL_AT_THE_END true

# Storage manager parameters
# Storage directory for local runs
STORAGE_LOCAL_DIR /tmp/ramdisk
# Storage directory for cluster runs
STORAGE_DIP_DIR /export/home/squalldata/storage 
STORAGE_COLD_START true
MEMORY_SIZE_MB 4096

Config file 1G_hyracks is the same as in 0_01G_hyracks_ncl, except:

  1. DIP_DISTRIBUTED is set to true.

  2. DIP_TOPOLOGY_NAME_PREFIX is an optional parameter. It is used for distinguishing different users possibly running the same query at the same time on the cluster.

  3. DIP_DATA_ROOT refers to a location on the cluster.

  4. There is no DIP_RESULT_ROOT, because in Cluster Mode the results are not automatically merged and compared against a file.

Thus, in order to change database size, only the DIP_DB_SIZE has to be changed, and for changing the query, we have to modify DIP_QUERY_NAME. You can find more examples of config files in squall-$VERSION/test/squall/confs, or you can write new ones from scratch.

Keep in mind that for in each config file you need to set DIP_DATA_ROOT. In addition, DIP_QUERY_NAME must correspond to a query from squall-$VERSION/test/squall/sql_queries/.

You can run Squall with a desired config file as follows:

cd squall-$VERSION/bin
./squall_cluster.sh $CONFIG_FILE_PATH

where $CONFIG_FILE_PATH is relative or full path to a config file.

Due to the constrained main memory, you cannot run arbitrary large database with small component parallelism. For information on detecting this behavior, please consult Squall query plans vs Storm topologies, section How to know we run out of memory?.