Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AA sequences ending with * have missing values for molecular_weight and hydrophobicity columns and inconsistent values on the CDS_stop_codon_found column in Ampcombi_summary.tsv #449

Closed
amizeranschi opened this issue Feb 2, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@amizeranschi
Copy link

While testing the AMP screening with Macrel, the Ampcombi_summary.tsv file shows several AA sequences ending with an asterisk, which I'm assuming represents a stop codon, as seen in a recent discussion from elsewhere: bcgsc/AMPlify#17

It appears that all the sequences ending with an asterisk in Ampcombi_summary.tsv have corresponding missing values on the molecular_weight and hydrophobicity columns. In addition, the CDS_stop_codon_found seems to have inconsistent values for these sequences, as some of them report a stop codon, while others do not.

This is the command that I ran:

nextflow run nf-core/funcscan -r add_interproscan_to_amp -profile docker,test_full --run_amp_screening --amp_skip_ampir --amp_skip_amplify --run_bgc_screening false --run_arg_screening false
--amp_ampcombi_parsetables_cutoff 0.4 --save_db --outdir test-funcscan

And attached is the Ampcombi_summary.tsv file:

Ampcombi_summary.tsv.txt

@amizeranschi amizeranschi added the bug Something isn't working label Feb 2, 2025
@jfy133
Copy link
Member

jfy133 commented Feb 5, 2025

This is being addressed in #447 I believe, correct @jasmezz ?

@jasmezz
Copy link
Collaborator

jasmezz commented Feb 5, 2025

It's solved for Pyrodigal in that PR, yes. Maybe I should append a string filtering step for the other annotation tools to get rid of the asterisk, because it cannot be deactivated by Bakta, Prokka, Prodigal.

@amizeranschi
Copy link
Author

In order to prevent issues with Ampcombi's report generation, it might be a good idea to suppress or eliminate stop codons from AA sequences whenever the pipeline is run with --run_amp_screening, regardless of which gene annotation and AMP screening tools are enabled.

@jasmezz
Copy link
Collaborator

jasmezz commented Feb 5, 2025

Agree. I'll test whether it also has an influence on other screening workflows, but this check and * removal should probably be on pipeline level (not AMPcombi only).

@jfy133
Copy link
Member

jfy133 commented Feb 11, 2025

Can this be closed @jasmezz ?

@jasmezz
Copy link
Collaborator

jasmezz commented Feb 11, 2025

Yes! @amizeranschi and I tested today that everything works as expected.

@jasmezz jasmezz closed this as completed Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants