Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in make_nt_freq_mat.pl, caused by empty stop_*.seq #6

Open
hirnc opened this issue Mar 18, 2022 · 1 comment
Open

Error in make_nt_freq_mat.pl, caused by empty stop_*.seq #6

hirnc opened this issue Mar 18, 2022 · 1 comment

Comments

@hirnc
Copy link

hirnc commented Mar 18, 2022

Hello, I am trying to run prothint.py and gmes_petap.pl in a fungus.

The commands I ran were:

prothint.py genome.fa protein.faa --workdir prothint
gmes_petap.pl --EP prothint/prothint.gff --evidence prothint/evidence.gff --seq genome.fa --soft_mask 1000 --verbose

prothint.py finished successfully. Then gmes_petap.pl terminated with the following message:

error, no valid sequences were found
error on call: /path/gmes_linux_64/make_nt_freq_mat.pl --cfg /workdir/run.cfg --section stop_TAA   --format TERM_TAA

The last part of gmes.log is:

/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] /path/gmes_linux_64/parse_ET.pl --section EP_C --cfg  /workdir/run.cfg  --v
/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] /path/gmes_linux_64/make_nt_freq_mat.pl --cfg /workdir/run.cfg --section start_ATG  --format INI
/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] /path/gmes_linux_64/make_nt_freq_mat.pl --cfg /workdir/run.cfg --section stop_TAA   --format TERM_TAA
/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] error

It seems the error happens in Training_E_anchored_C() in make_nt_freq_mat.pl, when running CountFromFile() with run/EP_C_1/stop_taa.seq as the input.

run/EP_C_1/stop_taa.seq exists but is empty. stop_tag.seq and stop_tga.seq are also empty.

What does empty stop_*.seq means, and how can I avoid this problem?
Any suggestions are greatly appreciated!

@tomasbruna
Copy link
Contributor

Hi @hirnc,

Did you run GeneMark in the --ES mode (without proteins) and did that work fine? The error you are observing is usually caused by poor coverage of the supporting proteins (this can happen when there are too few input proteins or when they are too remote).

I noticed that you are not using GeneMark's --fungus flag. Please try a run with this flag, it could also resolve the problem.

Sorry for the late reply,
Tomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants