Author: Nicola Prezza ([email protected]), Davide Cenzato
This software computes/approximates the measure
The tool delta
computes the exact measure and outputs also related statistics, such as
Tomasz Kociumaka, Gonzalo Navarro, Nicola Prezza, Toward a Definitive Compressibility Measure for Repetitive Sequences. IEEE Trans. Inf. Theory 69(4): 2074-2092 (2023)
The tool delta-stream
computes an approximation of
Ruben Becker, Matteo Canton, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, and Nicola Prezza. Sketching and Streaming for Dictionary Compression. Proceedings of the Data Compression Conference 2024.
To clone the repository, run:
git clone --recursive https://github.com/regindex/substring-complexity
You need the SDSL library installed on your system (https://github.com/simongog/sdsl-lite).
We use cmake to generate the Makefile. Create a build folder in the main folder:
mkdir build
run cmake:
cd build; cmake ..
and compile:
make
To compute the exact measures using linear working space, run
delta file
To compute an approximation of delta on a stream using sublinear working space and store the sketch, run
delta-stream -s -o "output_path" < file
or
some_command_generating_output | delta-stream -s -o "output_path"
To compute an approximation of delta given a stored sketch, run
delta-stream -d "sketch_path"
To merge two sketches and store the result, run
delta-stream -m "sketch_path_1" "sketch_path_2" -o "merged_sketch_path"
To compute the approximation of the delta NCD of two sketches, run
delta-stream -c "sketch_path_1" "sketch_path_2"