Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cutmark concept #199

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

add cutmark concept #199

wants to merge 18 commits into from

Conversation

poettering
Copy link
Member

Please review after #196, since this PR incorporates it.

Let's copy in a new version from systemd
…chunks

This is a low hanging optimization fruit: when we encode a stream, then
pass the chunk compression/storing to a pool of worker threads.

This doesn't speed up encoding as much as I hoped, but still:

The firefox 63 sources (2.2G) are encoded in 1m16,902s instead of
1m24,281s on my 4 cpu system. i.e. roughly a 10% speed-up.
Let's optimize chunking a bit: let's optionally accept a list of
"cutmarks", which are special bit sequences (up to 64bit) that indicate
particularly good chunking cut points. This can be used to optimize
chunking in data streams we know a few semantics about.

The intention is that the object markers of .caidx files are set to be
cutmarks, so that we rather cut between objects than at entirely
arbitrary positions.

This commit only adds logic to find these cutpoints, based on a list of
defined cutpoints, the code calling into the chunker does not make use
of it yet.
This allows us to update the internal state of the chunker with some
data, without necessarily determining where to break.
This way a caidx file is enough to configure the seeder appropriately.
That way we can derive all necessary parameters from the caidx
automatically to tune the seeder so that the chunking matches again.
The test checks if the seed is fully reliable and sufficient as a chunk
store: it packs up a tree, then removes the chunk store, but retaining
hte original tree to us as seed in place ot the chunk store.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant