Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot write fas.0, unitesetstofasta step died #91

Open
katharinedickson opened this issue Aug 21, 2024 · 9 comments
Open

Cannot write fas.0, unitesetstofasta step died #91

katharinedickson opened this issue Aug 21, 2024 · 9 comments

Comments

@katharinedickson
Copy link

Expected Behavior

FASTA and GFF output from easy-predict.

Current Behavior

When running easy-predict, I get the following notification:

Could not open /groups/diamond/projects/animal/rumen/RuReacBro20203/analysis/RuReacBro2023_Euk/annotation/metaeuk/RuReacBro_20230708_Cow1_RF_metabat.143.fa/dbCAN2/dbCAN2.fas.0 for writing!
Error: unitesetstofasta step died

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Reran easypredict with one database.

MetaEuk Output (for bugs)

Please make sure to also post the complete output of MetaEuk. You can use gist.github.com for large output.
https://gist.github.com/katharinedickson/ad20b5d52183462be4bc8836651610a7

Context

Providing context helps us come up with a solution and improve our documentation for the future.
unsure what additional context is needed

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MetaEuk Version:" when you execute MetaEuk without any parameters):
  • Which MetaEuk version was used (Statically-compiled, self-compiled, Homebrew, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:

Metaeuk version v7.bba0d80 was downloaded and installed from conda. I am using it on a high-performance UNIX-based university computing cluster - not sure of the OS/version.

@milot-mirdita
Copy link
Member

Looks like something is wrong with the target database:

Target database size: 1 type: Aminoacid

Seems like it contains only a single entry.

@katharinedickson
Copy link
Author

That would do it, yeah - the target DB in question is supposed to be dbCAN2, so that should be picking up many more entries. The command I'm running to attempt to create that is mmseqs createdb ${DBFOLDER}/dbcan2/r8/hmm/dbCAN-HMMdb-V8.txt.hmm ${OUTFOLDER}/ref/dbCAN2, with both DBFOLDER and OUTFOLDER defined in two lines of code above. Is this correct? The folder where the dbCAN2 database is installed contains dbCAN-HMMdb-V8.txt.hmm as well as four additional binary files:

  • dbCAN-HMMdb-V8.txt.hmm.h3f
  • dbCAN-HMMdb-V8.txt.hmm.h3i
  • dbCAN-HMMdb-V8.txt.hmm.h3m
  • dbCAN-HMMdb-V8.txt.hmm.h3p

I made the assumption that simply citing dbCAN-HMMdb-V8.txt.hmm would account for these files as well, as I've previously done with invoking other databases with other programs.

If I need to call the dbCAN database in another way to make it into a Metaeuk-formatted DB, how do I do so?

@milot-mirdita
Copy link
Member

Several processing steps are required to get the dbCan2 MSAs into a format the metaeuk/mmseqs can read.

We do ship dbCAN2 (v9) within the MMseqs2 databases downloader though. This is a hidden workflow within MetaEuk, but should still work.

metaeuk databases dbCAN2 dbcandb tmp

will download and setup the dm for you, then you can pass dbcandb path or whatever you name it to metaeuk instead

1 similar comment
@milot-mirdita
Copy link
Member

Several processing steps are required to get the dbCan2 MSAs into a format the metaeuk/mmseqs can read.

We do ship dbCAN2 (v9) within the MMseqs2 databases downloader though. This is a hidden workflow within MetaEuk, but should still work.

metaeuk databases dbCAN2 dbcandb tmp

will download and setup the dm for you, then you can pass dbcandb path or whatever you name it to metaeuk instead

@katharinedickson
Copy link
Author

katharinedickson commented Aug 23, 2024

Thanks for giving me the heads up about dbCAN2's MSAs. Is this the case also for Pfam and the NCBI nr database? I would also like to annotate with Kofam if possible, but I notice MMseqs2 does not ship this database with its downloader.

@katharinedickson
Copy link
Author

katharinedickson commented Aug 26, 2024

I tried again after downloading the dbCAN2 and Pfam datasases via MMseqs2. The same problem occurred - dbCAN2.fas.0 could not be opened for writing and the unitesetstofasta step died. The databases now contain an appropriate number of entries - target database 445013, and query databases ranging in length. The gist for one of my bins can be found at https://gist.github.com/katharinedickson/640e0759f39aa4f50211cdbee98b744f

@elileka
Copy link
Member

elileka commented Aug 27, 2024

Hi,

We are a bit puzzled about this issue. Could it be some permissions issue?

I tried running similar tasks with the same targetDB, using two different contig files and the runs ended successfully.

For the first run I used these toy contigs, which give empty results (but no errors).

For the second run, I downloaded a FASTA file of the NC_000913 record from GenBank. This run ends with some matches.

Perhaps you could place these contig files in your directory and see if you can get it to run? If this fails it strengthens the suspicion that this is due to a permissions problem. If it succeeds, and if you used the same paths, then maybe a closer look at your contigs file is needed.

Here are the steps:

download metaeuk's latest static:

wget https://mmseqs.com/metaeuk/metaeuk-linux-avx2.tar.gz

extract:

tar -xzf metaeuk-linux-avx2.tar.gz

download the dbCAN2 database:

metaeuk/bin/metaeuk databases dbCAN2 dbCAN2_db tmp

run against the toy contigs:

metaeuk/bin/metaeuk easy-predict contigs.fna dbCAN2_db toy_contigs_res toy_tmp

run against the NC_000913 records:

metaeuk/bin/metaeuk easy-predict sequence.fas dbCAN2_db NC_000913_res NC_000913_tmp

@katharinedickson
Copy link
Author

katharinedickson commented Aug 27, 2024

Hi elileka,

I am unfortunately not permitted to install metaeuk from source on my cluster - they prefer we install as a conda environment.

I ran the test with the NC_000913 records with my current install, and it succeeded.

@elileka
Copy link
Member

elileka commented Aug 28, 2024

Hi,

Can you repeat the command that succeeded (exactly, working in the same directory, providing the same paths, etc.) but this time overwrite the sequence.fas file with your contigs file? I.e., keeping everything of the successful run the same except for the content of the contigs file.

If this succeeds then the original issue was some kind of path naming / permissions issue.

If this fails, then I'd be happy if you could try to cut down your contigs file until you get a subset of contigs that reproduces the problem and that you can send to us (half the contigs in your file several times, continuing with the half that causes the error).

Thank you for your patience,
Eli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants