FiloDB Architecture and Code Overview

The code is laid out following the different parts and components.

Coordinator

Provides an upper-level, client-facing interface for the core components, and manages the scheduling around the MemTable, Reprojector, flushes, and column store. Handles different types of streaming data ingestion.

Clients such as Spark and the CLI implement source actors that extend the RowSource trait. RowSource communicates with the NodeCoordinatorActor - one per node - to establish streaming ingestion and send rows of data. RowSource retries rows for which it does not receive an Ack such that ingestion is at-least-once, and has built-in backpressure not to try to send too many unacked rows. The NodeCoordinatorActor in turn creates a DatasetCoordinatorActor to handle the state of memtables for each (dataset, version) pair, backpressure, and flushing the memtables to the column store.

Core

These components form the core part of FiloDB and are portable across data stores.

Ingested rows come from the DatasetCoordinatorActor into a MemTable, of which there is only one implementation currently, the FiloMemTable. MemTables hold enough rows so they can be chunked efficiently. MemTables are flushed to the columnstore using the Reprojector based on scheduling and policies set by the DatasetCoordinatorActor. The rows in the MemTable form Segments (see Segment.scala) and are appended to the ColumnStore.

The core module has an InMemoryColumnStore, a full ColumnStore implementation used for both testing and low-latency in-memory Spark queries.

On the read side, the ColumnStoreScanner contains APIs for reading out segments and rows using various ScanMethods - there are ones for single partition queries, queries that span multiple partitions using custom filtering functions, etc. Helper functions in KeyFilter help compose functions for filtered scanning.

All ColumnStore and MetaStore APIs are Scala Future based to take maximum advantage of CPU and nonblocking behavior.

FiloDB datasets consists of one or more projections, each of which contains columns. The MetaStore defines an API for concurrent reads/writes/updates on dataset, projection, and column metadata. Each Column has a ColumnType, which has a KeyType. KeyType is a fundamental type class defining serialization and extraction for each type of column/key. Most of FiloDB depends heavily on RichProjection, which contains the partition, row, and segment key columns and their KeyTypes.

Cassandra

An implementation of ColumnStore and MetaStore for Apache Cassandra.

Spark

Contains the Spark input source for ingesting and querying data from FiloDB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

architecture.md

architecture.md

FiloDB Architecture and Code Overview

Coordinator

Core

Cassandra

Spark

Files

architecture.md

Latest commit

History

architecture.md

File metadata and controls

FiloDB Architecture and Code Overview

Coordinator

Core

Cassandra

Spark