-
Notifications
You must be signed in to change notification settings - Fork 96
A high level overview
Squall is an online distributed query engine that supports complex query processing and achieves high throughput and low latency. It puts together state-of-the-art partitioning schemes, local query operators, and techniques for scalable online query processing. Squall is built on top of Storm. Similar to how Hive provides SQL syntax on top of Hadoop for doing batch processing, Squall executes SQL queries on top of Storm for doing online processing. Squall translates an SQL query into a query plan consisting of a DAG of Relational Algebra operators. More information on query optimization is available here. Each operator is implemented as a Storm component (Spout or Bolt) and executes on multiple nodes. The query plan is executed as a Storm topology. In addition to SQL interface, Squall also provides Functional, Interactive and Imperative interfaces.
Skew-resilient join operators. Existing open-source online systems provide distribution primitives (e.g., communication patterns over DAG of tasks, fault tolerance, elasticity etc.) and low-level performance optimizations (e.g., by using efficient communication libraries). Unfortunately, these systems provide only vanilla database operators, such as hash-based equi-joins (and general UDFs), which do not perform well in the case of skew. In contrast, Squall provides efficient join operators, that is, skew-resilient partitioning schemes and efficient local join operators. Each operator performs the best for a certain data distribution. We focus on operators with large state, with full-history or window semantics. Our operators scale very well (we tested it on clusters with up to 220 hardware threads).
SQL interface with resource-aware query optimizer. Squall provides a query optimizer that optimizes for latency, throughput and resource utilization. The optimizer decides on the optimal join order and the parallelism for each operator. This is very important, as an insufficient parallelism leads to overloading machines, which hurts both latency and throughput. Whereas, too big parallelism keeps machines idle, wasting the resources. The optimizer also performs optimizations such as pushing up selections and projections, and common subexpression elimination.
Squall architecture is illustrated below. An example query plan has selections (σ), projections (π), joins (⋈) and aggregations (Agg).
The basic processing unit is component. Each component contains multiple operators and executes in parallel on multiple nodes. A component is defined by the partitioning scheme on its input, and the chain of operators it executes.
Squall contains the following operators: Selection, Projection, Distinct, Aggregation and Join. All the operators works “on-the-fly”, as soon as a tuple arrives it is processed. The stateful operators (Distinct, Aggregation and Join) operators need to continuously maintain a state. The parallel nodes running the same operator work independently and do not communicate at all. This provides for scalability.
Squall offers multiple join operators. The HashJoin operator uses hash partitioning (tuples with the same key are processed within a single node), and performs local processing as follows. When a tuple arrives on a node, it is appended to the materialized state of the corresponding relation and joined with the that of the opposite relation. The materialized state is a hash table, where hash keys are relation's join keys. This algorithm is essentially a version of Symmetric Hash Join SHJ [1] executing simultaneously at multiple nodes. Our version of SHJ provides correct results for tuples seen so far (tuples which are fully propagated). This also permits extending the aggregation process to include evaluating approximate estimates with confidence bounds through sampling (e.g. Hash Ripple Join [2]).
However, the HashJoin supports only equi-joins on uniform data. To address this problem, Squall provides ThetaJoin operator, which supports arbitrary join conditions. It also works well for equi-joins on skewed data. The ThetaJoin operator is presented in our paper "Scalable and Adaptive Online Joins" [3]. It works as follows. The tuples are randomly distributed among the nodes, while respecting a certain mapping that ensures consistency, correctness and completeness of the results. Due to randomization, the operator balances the load over all nodes, independently of the data distribution. Thus, the ThetaJoin operator outperforms the HashJoin in the case of data skew. For more information on the ThetaJoin, please visit this page.
Squall also offers multi-way joins, which are very efficient when intermediate relations are large. We denote a multi-way join as HypercubeJoin. For more information, please consult our Whitepaper.
To reduce lookup time, Squall's local join implementations use indexes that are built on the fly, based on the join condition. For example: having two relations R(A,B,C)
and S(C,D,E)
and the following join condition:
R.C = S.C AND 3 * R.A < S.D
In such case, hash-indexes are created on R.C
and S.C
and B+ trees (or binary trees) indexes are created on R.A
and S.D
. Each new arriving tuple is stored in the corresponding index for future lookup and the indexes are updated continuously.
[1] W. Hong and M. Stonebraker. Optimization of Parallel Query Execution Plans in XPRS. Distributed and Parallel Databases, 1(1):9–32, 1993.
[2] P. J. Haas and J. M. Hellerstein, “Ripple joins for online aggregation” in Proceedings of the 1999 ACM SIGMOD international conference on Management of data, SIGMOD ’99, (New York, NY, USA), pp. 287–298, ACM, 1999.
[3] M. Elseidy, A. Elguindy and A. Vitorovic. "Scalable and Adaptive Online Joins", in VLDB 2014.