-
Notifications
You must be signed in to change notification settings - Fork 96
Home
Squall is an online query processing engine built on top of Storm. Similar to how Hive provides SQL syntax on top of Hadoop for doing batch processing, Squall executes SQL queries on top of Storm for doing online processing. Squall supports a wide class of SQL analytics ranging from simple aggregations to more advanced UDF join predicates and adaptive rebalancing of load. It is being actively developed by several contributors from the EPFL DATA lab. Squall is undergoing a continuous process of development, currently it supports the following:
- SQL (Select-Project-Join) query processing over continuous streams of data.
- Full-history stateful computation (e.g. for approximate query processing: Online Aggregation).
- Time based Window Semantics for infinite data streams, e.g., sliding window, tumbling window, and landmark window semantics.
- Theta Joins: arbitrary complex join predicates, including inequality, band, and arbitrary UDF join predicates. This gives a more comprehensive support and flexibility to data analytics. For example, Hive plans to support theta joins in response to user requests.
- Usability: Squall exposes three interfaces for programming. A SQL interface that directly translates a sql query to a running topology, a functional interface that leverages the syntactic sugar of Scala, and an imperative interface that exposes additional control on toplogy design.
- Out-of-Core Processing: HDFS connectivity. Can operate efficiently under limited memory resources through efficient disk based datastructures and indexes.
- Throughput rates in millions of tuples/second and latencies in milliseconds measured on a 16 machine cluster. Scalable to large cluster settings.
- Guarantees: At least-once or at most-once semantics. No support for exactly-once semantics yet, however it is planned for.
- Elasticity: Scaling out according to the load.
- DashBoard: Integrating support for real time visualizations. Work in progress.
- Continuous load balance and adaptation to data skew.
- White paper
- Rationale
- [A High-Level Overview](https://github.com/epfldata/squall/wiki/A high-level overview)
- [Quick Start: Local Mode](https://github.com/epfldata/squall/wiki/Quick Start: Local Mode)
- [Quick Start: Cluster Mode](https://github.com/epfldata/squall/wiki/Quick Start: Cluster Mode)
- Query optimization
- Using the Squall REPL
- Imperative Squall interface
- Squall Query Plans to Storm Topologies
- Theta-Join Usage & Implementation
- Squall Configurations: Local Mode Configurations and Cluster Mode Configurations
- Programming Guide
- Code Recompilation
- [Supported Features](https://github.com/epfldata/squall/wiki/Supported features)
- Stream Query Applications
- Troubleshooting
- Extension Ideas
We'd love to have your help in making Squall better. If you're interested, please communicate with us your suggestions and get your name to the Contributors list. Here is a list of some current interests we have. All questions and suggestions are welcomed.