-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with compression and other ways to save space #129
Comments
Current compression is
The compression ratio also seems to be 0.5 for most of the other tables. |
We started to design such system but on HBase. We used GZip compression and achieved 0.8 compression rate (we store 2 quality fields that might explain the difference). About the readme I have a question:
I know much better HBase than Cassandra, but they should be pretty similar: for each column the rowkey is written again and again. Not one rowkey for many columns. But I just checked a bit and it seems that you can use "COMPACT STORAGE" when creating the table to save the place as you said (not see in your create table statement). Let me know if I missed something :-) |
I believe most of this has been fixed since C* 3.0 (see http://www.datastax.com/2015/12/storage-engine-30 for details). I would have to check exactly how many bytes per point we currently use, but last time I checked it was better than whisper, which was our current requirement. Double-delta encoding might help here, because it could drastically reduce the number of points we store (we could, for example, skip up to 5 points if the double-delta stays the same). Currently our main limitation is really the number of mutations per seconds and the load this puts on Cassandra (currently limited at 70k/s per node). |
Thanks for the feedback! Didn't know whisper before, we want to keep the same accuracy even 20 years in the past. We don't use delta encoding but our pre processing send data only when there is a significant change in the values. Seems HBase is also around the same number of mutations/s. |
Does this mean that you interplate results if there is nothing in the database ? How do you deal with a client that would like to see the last 2-3 minutes of data ? |
Depending on the sensors system we either repeat the same value or do a linear interpolation (as Data Historian do). For now we don't have a real time workflow. Our industrial site buffers data and send them few times per hour as a file. |
Ok, that's slightly easier in this case. Thanks for the details ! |
The text was updated successfully, but these errors were encountered: