Table of Contents generated with DocToc
The code is laid out following the different parts and components.
Provides an upper-level, client-facing interface for the core components, and manages the scheduling around the MemTable, Reprojector, flushes, and column store. Handles different types of streaming data ingestion.
Clients such as Spark and the CLI implement source actors that extend the RowSource trait. RowSource
communicates with the NodeCoordinatorActor - one per node - to establish streaming ingestion and send rows of data. RowSource
retries rows for which it does not receive an Ack
such that ingestion is at-least-once, and has built-in backpressure not to try to send too many unacked rows. The NodeCoordinatorActor
in turn creates a DatasetCoordinatorActor to handle the state of memtables for each (dataset, version) pair, backpressure, and flushing the memtables to the column store.
These components form the core part of FiloDB and are portable across data stores.
Ingested rows come from the DatasetCoordinatorActor
into a MemTable, of which there is only one implementation currently, the FiloMemTable. MemTables hold enough rows so they can be chunked efficiently. MemTables are flushed to the columnstore using the Reprojector based on scheduling and policies set by the DatasetCoordinatorActor
. The rows in the MemTable
form Segments (see Segment.scala) and are appended to the ColumnStore.
The core module has an InMemoryColumnStore, a full ColumnStore
implementation used for both testing and low-latency in-memory Spark queries.
On the read side, the ColumnStoreScanner contains APIs for reading out segments and rows using various ScanMethod
s - there are ones for single partition queries, queries that span multiple partitions using custom filtering functions, etc. Helper functions in KeyFilter help compose functions for filtered scanning.
All ColumnStore
and MetaStore
APIs are Scala Future based to take maximum advantage of CPU and nonblocking behavior.
FiloDB datasets consists of one or more projections, each of which contains columns. The MetaStore defines an API for concurrent reads/writes/updates on dataset, projection, and column metadata. Each Column has a ColumnType
, which has a KeyType. KeyType
is a fundamental type class defining serialization and extraction for each type of column/key. Most of FiloDB depends heavily on RichProjection, which contains the partition, row, and segment key columns and their KeyType
s.
An implementation of ColumnStore and MetaStore for Apache Cassandra.
Contains the Spark input source for ingesting and querying data from FiloDB.