-
Notifications
You must be signed in to change notification settings - Fork 0
Home
For the design motivation of ToplingDB SidePlugin, please refer to Motivation To Solution.
Migrate existing code using RocksDB to ToplingDB, please refer to Using ToplingDB from Scratch.
The ToplingDB SidePlugin configuration system defines configuration items in json/yaml format, and includes all meta-objects of ToplingDB/RocksDB into this configuration system. Overall, the ToplingDB configuration system achieves the following goals:
- All configuration requirements for ToplingDB/RocksDB
- Dynamic, open and decoupled plugin solution
- User code can use third-party modules (such as ToplingZipTable) without modification
- Write new plugins without introducing irrelevant dependencies (similar to silly code: if (is BlockBasedTable) ... else if (is PlainTable) ...)
- Visualization: Display the internal state of the engine through Web Service (off-site documentation)
- Monitoring: export engine metrics to Prometheus through Web Service, and then use Grafana to visualize them
- Modify configuration online using REST API through Web Service
- Simplify multilingual Binding (only need bind conf object)
The root configuration objects of ToplingDB/RocksDB are DBOptions and ColumnFamilyOptions. Additional Options objects are a combination of DBOptions and ColumnFamilyOptions (CFOptions for short) (inherited from the latter two).
DBOptions and CFOptions contain secondary configuration objects, and some secondary objects further contain tertiary configuration objects. All these objects are defined as sub-objects of the first-level json object named after its base class name in json. In addition, there are several other special first-level json objects (http, setenv, databases, open) in json. You can refer to other json objects in json objects, and these references will be converted into reference relationships between C++ objects.
DBOptions and CFOptions also support template to specify a DBOptions/CFOptions object as a template, which is copied from the template and then modified.
{
"http": {
"document_root": "/path/to/dbname",
"listening_ports": "8081"
},
"setenv": {
"DictZipBlobStore_zipThreads": 8,
"StrSimpleEnvNameNotOverwrite": "StringValue",
"IntSimpleEnvNameNotOverwrite": 16384,
"OverwriteThisEnv": { "overwrite": true,
"value": "overwrite is default to false, can be manually set to true"
}
},
"permissions": { "web_compact": true },
"Cache": {
"lru_cache": {
"class": "LRUCache",
"params": {
"capacity": "4G", "num_shard_bits": -1, "high_pri_pool_ratio": 0.5,
"strict_capacity_limit": false, "use_adaptive_mutex": false,
"metadata_charge_policy": "kFullChargeCacheMetadata"
}
}
},
"WriteBufferManager" : {
"wbm": {
"class": "Default",
"params": {
"//comment": "share mem budget with cache object ${lru_cache}",
"buffer_size": "512M", "cache": "${lru_cache}"
}
}
},
"Statistics": { "stat": "default" },
"TableFactory": {
"bb": {
"class": "BlockBasedTable",
"params": { "block_cache": "${lru_cache}" }
},
"fast": {
"class": "SingleFastTable",
"params": { "indexType": "MainPatricia" }
},
"zip": {
"class": "ToplingZipTable",
"params": {
"localTempDir": "/dev/shm/tmp",
"sampleRatio": 0.01, "entropyAlgo": "kNoEntropy"
}
},
"dispatch" : {
"class": "DispatcherTable",
"params": {
"default": "fast",
"readers": { "SingleFastTable": "fast", "ToplingZipTable": "zip", "BlockBased": "bb" },
"level_writers": ["fast", "fast", "fast", "zip", "zip", "zip", "zip"]
}
}
},
"CFOptions": {
"default": {
"max_write_buffer_number": 4, "write_buffer_size": "128M",
"target_file_size_base": "16M", "target_file_size_multiplier": 2,
"table_factory": "dispatch", "ttl": 0
}
},
"databases": {
"db1": {
"method": "DB::Open",
"params": {
"options": {
"write_buffer_manager": "${wbm}",
"create_if_missing": true, "table_factory": "dispatch"
}
}
},
"db_mcf": {
"method": "DB::Open",
"params": {
"db_options": {
"create_if_missing": true,
"create_missing_column_families": true,
"write_buffer_manager": "${wbm}",
"allow_mmap_reads": true
},
"column_families": {
"default": "$default",
"custom_cf" : {
"max_write_buffer_number": 4,
"target_file_size_base": "16M",
"target_file_size_multiplier": 2,
"table_factory": "dispatch", "ttl": 0
}
},
"path": "'dbname' passed to Open. If not defined, use 'db_mcf' here"
}
}
},
"open": "db_mcf"
}
In this example, the first json sub-object is:
"http": {
"document_root": "/", "listening_ports": "8081"
}
This http object defines the Http Web Server configuration used for web presentation. For complete http parameters, please refer to: CivetWeb UserManual.
"setenv": {
"DictZipBlobStore_zipThreads" : 8
}
Each sub-object of setenv defines an environment variable.
Each sub-object of permissions defines a permission.
Multiple database objects can be defined under databases, and database objects are divided into two categories:
- DB containing only the default ColumnFamily
- DB with multiple ColumnFamily (
DB_MultiCF
)
These two types of databases are distinguished by whether they contain the child object column_families
. Even if a database actually has only one ColumnFamily, but it defines the ColumnFamily in the sub-object column_families
, it is also DB_MultiCF
.
The database object is opened by the function specified by the method. The method in the C++ code is overloaded, and the method in the json is also overloaded. The same method is overloaded for DB and DB_MultiCF
respectively.
Although we can define multiple databases in json, in many cases, we will only open one of the databases. When using the OpenDB API without a database name, this open object is used to specify which database to open. When the user uses the OpenDB api with the db name, the open object is ignored.
Among the first-level objects of json, except for the above four special objects, the others are general objects. The name of each level-one general object is the class name of the base class of such objects in ToplingDB/RocksDB. For example, "Cache", "Statistics", "TableFactory" in the example, these first-level objects themselves are equivalent to a container, and each sub-object defines a real C++ object. Each such "container" is equivalent to a namespace, and there can be objects with the same name under different namespaces.
The C++ object corresponding to the json object contains the class name and parameters, expressed by "class" and "params" respectively. Careful users can find that the json object named "stat" is the string "default", which is for simplification. For a class without parameters, you can directly use the string of its class name to define (here "default" is The registered class name of stat, the corresponding C++ class is StatisticsImpl), of course, this kind of object can also be defined by a complete and regular json object containing "class" and "params".
DBOptions and CFOptions are special general objects, because their "class" is determined, so "class" and "params" are omitted, and the members in "params" are directly promoted to the outer layer.
In C++ objects, one object refers to another object through pointers. In json, it is realized through object names. The formal and complete way of writing object references is "${varname}", and the simplified way of writing can be "$varname" or "varname", where "varname" may lead to ambiguity, because a json string may also express "class_name"
. Our processing strategy is: first check whether the string is a defined object, if it is, it will be processed according to "varname", otherwise it will be processed according to "class_name"
.
In addition to defining named objects and then referencing them by name, we can also define nested objects, as in the example:
"custom_cf" : {
"max_write_buffer_number": 4,
"target_file_size_base": "16M",
"target_file_size_multiplier": 2,
"table_factory": "dispatch", "ttl": 0
}
"custom_cf"
could be defined as a reference to a CFOptions object, but here it is more convenient and concise to define it as an inline object.
There is no ttl member in CFOptions, but we define ttl for it in json, because the "method" of database can be specified as many other functions besides "DB::Open":
"DB::OpenForReadOnly" // Equivalent to defining "read_only": true in params
"DBWithTTL::Open" // Need CFOptions::ttl
"TransactionDB::Open"
"OptimisticTransactionDB::Open"
"BlobDB::Open"
Users can also extend and define their own Open, for example: MyCustomDB::Open.
"dispatch" : {
"class": "DispatcherTable",
"params": {
"default": "fast",
"readers": {"SingleFastTable": "fast", "ToplingZipTable": "zip"},
"level_writers": ["fast", "fast", "fast", "zip", "zip", "zip", "zip"]
}
}
As the name implies, DispatcherTable is used for actual Table (SST) dispatching and scheduling. For users, the most critical thing is level_writers
: use the corresponding Table at the corresponding level.
default is used as a fallback when level < 0 (level is a member of TableBuilderOptions), or when level_writer
fails to create a builder.
readers are used to define the mapping from class_name
to varname, because in the internal implementation, loading Table is realized through DispatcherTable::NewTableReader
. As a dispather, it is natural to know what kind of Table is loaded, which is distinguished by TableMagicNumber. It is statically determined at compile time, but TableFactory is created at runtime, and each specific TableFactory class can have multiple (params different) objects, so we need to specify which TableFactory object the corresponding TableFactory class uses to load here .
In this DispatcherTable definition, L0~L2 use fast, and L3~L6 use zip.
wamp(write amplification) of ToplingDB is measured by decompressed data size, DispatcherTable will check compaction for conditions for reject or approve, there are similar mechanics for trivial move, a trivial move will be rejected if conditions does not met.
config name | type | default | description |
---|---|---|---|
measure_builder_stats | bool | false | measure and show on html |
always_compact_max_bytes | uint | 64 MiB | always compact when input size is smaller |
auto_compaction_max_wamp | float | 1e9 | max wamp for auto compaction |
mark_for_compaction_max_wamp | float | 1e9 | max wamp for compactions triggered by NeedCompact
|
allow_trivial_move | bool | true | if true, continue to check for more conditions |
trivial_move_always_max_output_level | int | 0 | allow trivial move when output level <= |
trivial_move_max_file_size_multiplier | float | 4.0 | allow trivial move when average input file size multiplier'ed is smaller than output level target file size |
Users who are familiar with Kubernetes may prefer Yaml. As a configuration file, Yaml is more readable, and the ToplingDB configuration system also supports Yaml.
The Enterprise Edition includes ToplingZipTable, which is based on the SST searchable memory compression algorithm. Using multi-instance shared distributed Compact clusters can reduce costs and increase efficiency through scale effects.
The Community Edition does not include ToplingZipTable, otherwise the Enterprise Edition is identical to the Community Edition.
Json | Yaml | explanation |
---|---|---|
etcd_dcompaction.json | yaml | with Distributed Compaction |
lcompact_community.json | yaml | without Distributed Compaction |
db_bench_community.yaml | yaml | db_bench testing,without json |
db_bench_enterprise.yaml | yaml | db_bench testing,without json |
todis-community.json | yaml | todis Community Edition |
todis-enterprise.json | yaml | todis Enterprise Edition |
mytopling.json | no yaml yet | MyTopling Enterprise Edition |
mytopling-2nd.json | no yaml yet | MyTopling Enterprise Edition,shared secondary node |
mytopling-community.json | no yaml yet | MyTopling Community Edition |
kvtopling-community.json | no yaml yet | kvrocks ToplingDB Community Edition |
kvtopling-community-2nd.json | no yaml yet | kvrocks ToplingDB Community Edition,shared secondary node |