-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastmultigather returns matches only when --scaled is equal to scaled of the query #398
Comments
ok, dug into this a little bit. For a sketch against itself, with the original sketch at a scaled of 1000, you can't downsample like so:
In addition, but slightly less? annoying, automatic downsampling doesn't work:
so I think I can reproduce the problem. |
I think I've tracked down the problem - this code in the branchwater plugin, sourmash_plugin_branchwater/src/utils.rs Line 722 in 560dc3d
calls this code in sourmash core, but naively it seems to me like
should be doing the right thing? Here |
It looks like I was wrong - adding some debugging prints, |
so confusing 😆 . Turns out the issue is that the query collection cannot be loaded with scaled=1000, which is set as the default. Possible solution is to set no default, and determine based on first sketch; other possible solutions exist :). Fixing over in #488. |
Hey, I am having some issues with fastmultigather and the scale of signatures:
Everything is in:
/group/ctbrowngrp2/scratch/annie/2023-sourmash-viruses
I created protein sketches with the
custom_sketch.py
, that @ctb wrote.Sorry I cannot find your original repo, but here is the script
It is one sketch per all proteins for 1 organism, in this case a virus.
I created sketches at 3 different ksizes (7,10,12), all at scaled 2. I then concatenated all 3 into one zip file:
/group/ctbrowngrp2/scratch/annie/2023-sourmash-viruses/results/signatures/240703_RefSeq.proteins.zip
When I query this zipfile against the ICTV database using fastmultigather, I only get matches when the scaled is 2. I also want to use scaled=10 and scaled=100. I thought i could use lower scaled querys and that it automatically scales to the scale I am asking for.
Only works when --scaled 2. If 10 or 100, output is:
For all signatures within the zip
I then created protein sketches with custom_sketch.py at scaled=10 and scaled=100, and used those as queries. If I do it that way, I do get matches from fastmultigather.
Is this something that is not yet enabled for fastmultigather?
The text was updated successfully, but these errors were encountered: