-
Notifications
You must be signed in to change notification settings - Fork 35
Cassandra Tuning
Some of the things we use to store 100M metrics on our cluster.
First, you'll need Cassandra 3.11, and if you have a large number of metrics, you may want to add this patch: https://github.com/criteo-forks/cassandra/commit/1e54aae96dff438cb80cc4ce258ac1dc706236db
It's a good idea to create two separate clusters: one for metadata and one for data. This is because the metadata one will use indexes (SASI or Lucene) and will become slower as you add nodes.
We use the following settings:
- max_heap_size: 24G (https://issues.apache.org/jira/browse/CASSANDRA-8150)
- G1 GC with biased_locking disabled, max_gc_pause_millis set to 500, max_parallel_gc_threads set to the number of threads of the machine, max_conc_gc_threads set to 5
- -Dcassandra.allow_unsafe_aggressive_sstable_expiration=true
- -Dio.netty.eventLoopThreads=12 to reduce the CPU used by event loop by batching calls to epoll()
- -Dcassandra.netty_flush_delay_nanoseconds=0 (same)
In Cassandra.yml:
# Do more writes.
concurrent_writes: 128
native_transport_max_threads: 128
# We have a single 'logical' disk, but more physical disks.
memtable_flush_writers: 8,
# Make sure we don't write to disk too much. This drastically
# reduce the number of compactions as we don't write tens of SSTables per minute.
memtable_cleanup_threshold: 0.2
# We do not require compression for our workload.
internode_compression: none'
# Restore default.
inter_dc_tcp_nodelay: false,
# Increase coalescing window to reduce packet number.
otc_coalescing_window_us: 10_000,
# Try to reduce GC pressure.
# http://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1
memtable_allocation_type: 'offheap_objects'
# Try to make good use of the cache.
file_cache_size_in_mb: 4000
disk_optimization_strategy: spinning
# TODO: try this
# 'buffer_pool_use_heap_if_exhausted' => false,
# Make hints faster
max_hints_delivery_threads: 4
# The speed is divided by the number of nodes internally, try to bump it
# a little bit. We currently know that 4_000 works but 40_000 is too much.
# This should really made more dynamic.
# It makes the decommission easier (because hints are stored during
# decom). Don't put it too high, it would break the service.
hinted_handoff_throttle_in_kb: 4000
For the biggraphite
keyspace, we set durable_writes = false
and instead set memtable_flush_period_in_ms = 900000
on the tables. This put us at risk of some data loss if multiple nodes disappear at the same time but greatly enhance write performances.