Skip to content
avitorovic edited this page May 8, 2012 · 24 revisions

Squall aims to support online analytics expressed in SQL. “Online“ means that the final result is constantly updated as new tuples arrive in the system. At each step, the system represents an eventually correct query result for the tuples seen so far.

The Squall engine is meant to support, when ready, three kinds of query processing problems:

  1. Data stream processing: data arrives in streaming fashion; the state of the system is part of the recently seen data (say, a sliding or tumbling window over the stream), and we do queries over the stream using window semantics.

  2. Incremental query evaluation: We materialize a view (expressed as a query) of a database. Whenever an update to the database arrives, we want to quickly refresh the materialized view. The challenge here is to avoid recomputing the view from scratch every time an update arrives.

  3. Online aggregation: There is a large, conceptually static database or data warehouse and we want to evaluate a query on it and see a continuously improving approximation of the query result, ideally with error bounds, while the computation of the query result is executing. Typically, the data would be read out of the database and fed into the Storm topology in random order to allow us to compute error bounds using statistical machinery (conceptually by sampling, but for for performance reasons, in practice starting from a database whose entries have been randomly reshuffled offline to allow for an efficient scan of it at query processing time).

Currently, we do not have approximation on top of the query engine. This is different from Online Aggregation paper. State of the art online aggregation paper DBO does all the processing on a single node. Compared to incremental processing, this can be applied, but we essentially do not have the result for the previous database, and then update it as tuples arrive. We have to build it from the beginning. Compared to Streaming, we need to implemented deletion and expiration of tuples.