Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Enable banding for computing k-mer abundance distributions #1767

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

standage
Copy link
Member

Banding stuff has been pretty well battle tested in kevlar, but hasn't made it to the khmer CLI yet. This requires not only adding arguments to support banding, but integrating *table objects into the tooling originally designed only for *graph objects.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make clean diff-cover If it introduces new functionality in
    scripts/ is it tested?
  • make format diff_pylint_report cppcheck doc pydocstyle Is it well
    formatted?
  • Did it change the command-line interface? Only backwards-compatible
    additions are allowed without a major version increment. Changing file
    formats also requires a major version number increment.
  • For substantial changes or changes to the command-line interface, is it
    documented in CHANGELOG.md? See keepachangelog
    for more details.
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Do the changes respect streaming IO? (Are they
    tested for streaming IO?)

@standage
Copy link
Member Author

This depends on consume_seqfile_banding_with_reads_parser from #1753.

@standage
Copy link
Member Author

Some initial benchmarking this weekend: ran abund-dist-single.py with & without banding (--banding 8 1) and with & without threading (--threads 4). Banding does provide a runtime performance improvement, but it appears that the improvement doesn't stack with improvements from threading.

@standage
Copy link
Member Author

Hmm. The cyclic hash is not only faster than the murmur hash (we already knew that), but its speed benefits seem to stack better with threading. The commands I used:

time scripts/abundance-dist-single.py --hash-function cyclic --max-memory-usage 25M --ksize 25 ~/Desktop/kevlar/scratch/nano-trio/proband-reads.fq.gz nobanding-nothreads.abunddist
time scripts/abundance-dist-single.py --hash-function cyclic --banding 8 1 --max-memory-usage 25M --ksize 25 ~/Desktop/kevlar/scratch/nano-trio/proband-reads.fq.gz banding-nothreads.abunddist
time scripts/abundance-dist-single.py --hash-function cyclic --threads 4 --max-memory-usage 25M --ksize 25 ~/Desktop/kevlar/scratch/nano-trio/proband-reads.fq.gz nobanding-threads.abunddist
time scripts/abundance-dist-single.py --hash-function cyclic --threads 4 --banding 8 1 --max-memory-usage 25M --ksize 25 ~/Desktop/kevlar/scratch/nano-trio/proband-reads.fq.gz banding-threads.abunddist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant