-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors if genome-grist run on Marine metagenomes #241
Comments
I am getting the same error, grist cannot download a specific genome. In my case is GCF_006715245.1 When I checked the status of the genome in the (NCBI ftp)[https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/006/715/245/GCF_006715245.1_ASM671524v1/], the genome is missing!. There I found a message saying that the assembly status is suppressed. Therefore it make sense that it fails. My suggestion is to add a line to handle that error. I will try to put some code. This is the relevant section in the Snakefile
|
My solution:
to:
|
another genome missing: |
in this one, it's hard to know what the error is because it occurred above the copy/paste - can you try rerunning with the target
I think I fixed this one in #242 which is now released in v0.9.1! So if you
Thanks @carden24! I have some ideas here - I don't want to just ignore the missing genomes... more in a bit. |
Here's one way I'm thinking of support "missing" genomes - I like the idea of requiring that they be added manually (or at least that manual acknowledgement be made). A different or additional approach would be to suggest downloading them or a replacement manually and making it part of a private database. |
#255 is maturing. I'd be interested in your thoughts @carden24 @jeanzzhao |
I spent some time trying to get the data from other sources and I could not get it from the genbank or Gold but it is available from the JGI portal. I assume that this will not be the case for the other genomes so I am thinking than an alternative is to get another closely related genome based maybe on ANI or some other measurement of genome similarity. |
usually the genome has been removed for a good reason. I would probably go use GTDB or NCBI taxonomy to find another genome from the same species. |
Yes, totally agree, the criteria for removal from the NCBI can vary and there is no way to know programatically. |
I've just released genome-grist v0.9.2. This includes
You can use something like: skip_genomes:
- GCF_000020205.1 to give it a try. |
I upgraded grist to 0.9.2 and run it again but snakemake if failing because it expects to have the genome downloaded as required in the rule output. I used the skip_genomes option in the config file and it was read successfully but cannot handle the missing output. `[Tue Dec 6 08:40:45 2022] samples: ['Mock_T0_3_S3', 'Mock_T0_2_S2', 'Mock_T0_1_S1'] RuleException: |
hi @carden24 just to confirm, did you add it to skip_genomes:
- GCF_006715245.1 |
|
Yes on the config file. but I think I needed to clean files before rerunning. |
@carden24 could you remind me which files you cleaned? thanks |
I removed the genbank_cache folder, the gather one, and the sig one too. Not sure if all of them were required. |
hmm, that's interesting 😓 it should be downstream of those, although removing them will certainly force recalculation of everything downstream! @jeanzzhao wait a few and I'll see if I can figure out something more precise! |
Merged #259 and released genome-grist v0.9.3. Please give it a try:
|
(you shouldn't need to remove or edit any files to get this to work, @jeanzzhao) |
'/home/zyzhao/assloss/grist/marine44/.snakemake/log/2022-12-08T082206.462631.snakemake.log'
|
I am getting an error at the make_gather_notebook_wc step. I run it with a simple sample. `Error in rule make_gather_notebook_wc:
Removing output files of failed job make_gather_notebook_wc since they might be corrupted: This is the folder structure of the grist folder: grist |
Hi Jean, I took a look at
|
Hi Titus,
- I realized that I did not have `rs207` in the folder when I changed
`conf.yml` to `rs207`.
- `curl -L https://osf.io/w4bcm/download -o
gtdb-rs207.genomic-reps.k31.sbt.zip`
- re-run, sbatch #58926209, failed after ~9h with different Error in "rule
make_combined_info_csv_wc"
refer to this for details:
https://hackmd.io/DOWP1qUzTCqdihYOSyp5Zg?view#12922
-Jean
…On Tue, Dec 13, 2022 at 7:46 PM C. Titus Brown ***@***.***> wrote:
pip install -U genome-grist, v0.9.3., did not remove any previous file,
sbatch #58672657, failed
Hi Jean, I took a look at ~assloss/grist/marine44/ and tried running one
of your samples as below - so far it's working. I wonder if you "just" need
to add more skip_genomes? It's annoying to figure out, I know... I'll seek
additional solutions!
samples:
- SRR5915428
outdir: outputs.jean/
sourmash_databases:
- gtdb-rs207.genomic.k31.zip
skip_genomes:
- GCF_000472605.1
- GCF_000504225.1
—
Reply to this email directly, view it on GitHub
<#241 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWMYGGCWL45JVAF4AWMYFJDWNE7I5ANCNFSM6AAAAAASMSV76I>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am still having issues with this error. I think that there are still some rules that need to incorporate a check to ignore genomes that cannot be downloaded. These rules correctly ignore the missing genome specified in the yaml:
The first rule that is creating an error is extract_leftover_reads_wc. I checked its code and it seems that it uses as input the gather_csv file but it does not check for the flagged genomes in the python script substract_gather.py
These other rules also used that csv as input A possible solution would be to pass as an argument the list of flagged genomes (IGNORE_IDENTS) to the python script when it is loading the list of genomes from the csv Line 29:
I don't know enough about python notebooks to suggest a solution there. |
There were a few errors that happened during
genome-grist
run of Marine metagenomes:less ~/assloss/grist/marine21/jobs/grist.j56313129.err
samtools_count_wc
,bam_to_depth_wc
,bam_to_fastq_wc
download_matching_genome_wc
make_mapping_notebook_wc
:make_mapping_notebook_wc
:The text was updated successfully, but these errors were encountered: