Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agat sp compare two annotations #127

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
- `agat/agat_convert_sp_gff2gtf`: convert any GTF/GFF file into a proper GTF file (PR #76).
- `agat/agat_convert_bed2gff`: convert bed file to gff format (PR #97).
- `agat/agat_convert_embl2gff`: convert an EMBL file into GFF format (PR #99).
- `agat/agat_sp_compare_two_annotations`: compare two annotation of the same assembly (PR #127).
- `agat/agat_convert_sp_gff2tsv`: convert gtf/gff file into tabulated file (PR #102).
- `agat/agat_convert_sp_gxf2gxf`: fixes and/or standardizes any GTF/GFF file into full sorted GTF/GFF file (PR #103).

Expand Down
97 changes: 97 additions & 0 deletions src/agat/agat_sp_compare_two_annotations/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: agat_sp_compare_two_annotations
namespace: agat
description: |
The script aims to compare two annotation of the same assembly. It
provides information about split/fusion of genes between the two
annotations. The most common cases are:

* 1 => 0 ( gene uniq to file1)
* 0 => 1 ( gene uniq to file2)
* 1 => 1 ( 1 gene from file 1 overlaps only 1 gene from file2)
* 1 => <many> ( 1 gene from file 1 overlaps <many> genes
from file2)
* => split case (with file 1 as reference)
* <many> => 1 (<many> genes from file 1 overlap only 1 gene from file2)
* => fusion case (with file 1 as reference)

Then you can get more complex cases:

* <many> => <many> (<many> genes from file 1 overlap <many> genes from file2)

The script output a folder containing a report of number of different
cases as well as a file per case type listing per line the gene
feature's ID involved in each case.
keywords: [gene annotations, GFF]
links:
homepage: https://github.com/NBISweden/AGAT
documentation: https://agat.readthedocs.io/en/latest/tools/agat_sp_compare_two_annotations.html
issue_tracker: https://github.com/NBISweden/AGAT/issues
repository: https://github.com/NBISweden/AGAT
references:
doi: 10.5281/zenodo.3552717
license: GPL-3.0
requirements:
- commands: [agat]
authors:
- __merge__: /src/_authors/leila_paquay.yaml
roles: [ author, maintainer ]
argument_groups:
- name: Inputs
arguments:
- name: --gff1
description: Input GTF/GFF file1.
type: file
required: true
direction: input
example: input1.gff
- name: --gff2
description: Input GTF/GFF file2.
type: file
required: true
direction: input
example: input2.gff
- name: Outputs
arguments:
- name: --output_dir
alternatives: [-o, --out, --output]
description: |
Output folder. It contains a report that summarizes the type and number of cases, as well as a file per case type containing one case per line with the list of gene feature's ID (or other type of feature level1) from file1 then file2 separated by a |.
type: file
direction: output
required: true
example: output_folder
- name: Arguments
arguments:
- name: --debug
alternatives: [-d]
description: Debug option, makes it easier to follow what is going on for debugging purposes.
type: boolean_true
- name: --verbose
alternatives: [-v]
description: Verbose option, makes it easier to follow what is going on.
type: boolean_true
- name: --config
alternatives: [-c]
description: |
AGAT config file. By default AGAT takes the original agat_config.yaml shipped with AGAT. The `--config` option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).
type: file
required: false
example: custom_config.yaml
resources:
- type: bash_script
path: script.sh
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test_data
engines:
- type: docker
image: quay.io/biocontainers/agat:1.4.0--pl5321hdfd78af_0
setup:
- type: docker
run: |
agat --version | sed 's/AGAT\s\(.*\)/agat: "\1"/' > /var/software_versions.txt
runners:
- type: executable
- type: nextflow
86 changes: 86 additions & 0 deletions src/agat/agat_sp_compare_two_annotations/help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
```sh
agat_sp_compare_two_annotations.pl --help
```

------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v1.4.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------


Name:
agat_sp_compare_two_annotations.pl

Description:
The script aims to compare two annotation of the same assembly. It
provided information about split/fusion of genes between the two
annotations. The most common cases are: 1 => 0 ( gene uniq to file1) 0
=> 1 ( gene uniq to file2) 1 => 1 ( 1 gene from file 1 overlaps only 1
gene from file2) 1 => <many> ( 1 gene from file 1 overlaps <many> genes
from file2) => split case (with file 1 as reference) <many> => 1 (
<many> genes from file 1 overlap only 1 gene from file2) => fusion case
(with file 1 as reference)

Then you can get more complex cases: <many> => <many> (<many> genes from
file 1 overlap <many> genes from file2)

The script output a folder containing a report of number of different
cases as well as a file per case type listing per line the gene
feature's ID involved in each case.

Usage:
agat_sp_compare_two_annotations.pl -gff1 infile.gff [ -o outfile ]
agat_sp_compare_two_annotations.pl --help

Options:
-gff1 Input GTF/GFF file1.

-gff2 Input GTF/GFF file2.

-o , --output or --out
Output folder. It contains a report that resume the type and
number of cases, as well as a file per case type containing one
case per line with the list of gene feature's ID (or other type
of feature level1) from file1 then file2 separated by a |.

--debug or -d
Debug option, make it easier to follow what is going on for
debugging purpose.

--verbose or -v
Verbose option, make it easier to follow what is going on.

-c or --config
String - Input agat config file. By default AGAT takes as input
agat_config.yaml file from the working directory if any,
otherwise it takes the orignal agat_config.yaml shipped with
AGAT. To get the agat_config.yaml locally type: "agat config
--expose". The --config option gives you the possibility to use
your own AGAT config file (located elsewhere or named
differently).

-h or --help
Display this helpful text.

Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:

https://github.com/NBISweden/AGAT/issues

Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.

Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
28 changes: 28 additions & 0 deletions src/agat/agat_sp_compare_two_annotations/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

# unset flags
[[ "$par_debug" == "false" ]] && unset par_debug
[[ "$par_verbose" == "false" ]] && unset par_verbose

# Debug statement to check if par_output_dir is set
echo "par_output_dir is set to: ${par_output_dir}"

# Ensure par_output_dir is set
if [ -z "$par_output_dir" ]; then
echo "Error: par_output_dir is not set."
exit 1
fi

# run agat_sp_compare_two_annotations.pl
agat_sp_compare_two_annotations.pl \
-gff1 "$par_gff1" \
-gff2 "$par_gff2" \
${par_output_dir:+-o "${par_output_dir}"} \
${par_debug:+--debug} \
${par_verbose:+--verbose} \
${par_config:+--config "${par_config}"}
78 changes: 78 additions & 0 deletions src/agat/agat_sp_compare_two_annotations/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/bin/bash

set -eo pipefail

## VIASH START
## VIASH END

test_dir="${meta_resources_dir}/test_data"

# create temporary directory and clean up on exit
TMPDIR=$(mktemp -d "$meta_temp_dir/$meta_functionality_name-XXXXXX")
function clean_up {
[[ -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
}
trap clean_up EXIT

echo "> Run $meta_name with test data: identical files"
"$meta_executable" \
--gff1 "$test_dir/1.gff" \
--gff2 "$test_dir/1.gff" \
--output_dir "$TMPDIR/output"

echo ">> Checking output"
[ ! -f "$TMPDIR/output/report.txt" ] && echo "Output file report.txt does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output/report.txt" ] && echo "Output file report.txt is empty" && exit 1

echo ">> Check if output matches expected output"
diff <(tail -n +2 "$TMPDIR/output/report.txt") <(tail -n +2 "$test_dir/agat_sp_compare_two_annotations_1.txt")
if [ $? -ne 0 ]; then
echo "Output file report.txt does not match expected output"
exit 1
fi

rm -rf $TMPDIR/output

echo "> Run $meta_name with test data: different files"
"$meta_executable" \
--gff1 "$test_dir/file1.gff" \
--gff2 "$test_dir/file2.gff" \
--output_dir "$TMPDIR/output"

echo ">> Checking output"
[ ! -f "$TMPDIR/output/report.txt" ] && echo "Output file report.txt does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output/report.txt" ] && echo "Output file report.txt is empty" && exit 1

echo ">> Check if output matches expected output"
diff <(tail -n +2 "$TMPDIR/output/report.txt") <(tail -n +2 "$test_dir/agat_sp_compare_two_annotations_2.txt")
if [ $? -ne 0 ]; then
echo "Output file report.txt does not match expected output"
exit 1
fi

rm -rf $TMPDIR/output

echo "> Run $meta_name with test data: different files"
"$meta_executable" \
--gff1 "$test_dir/file2.gff" \
--gff2 "$test_dir/file1.gff" \
--output_dir "$TMPDIR/output"

echo ">> Checking output"
[ ! -f "$TMPDIR/output/report.txt" ] && echo "Output file report.txt does not exist" && exit 1

echo ">> Check if output is empty"
[ ! -s "$TMPDIR/output/report.txt" ] && echo "Output file report.txt is empty" && exit 1

echo ">> Check if output matches expected output"
diff <(tail -n +2 "$TMPDIR/output/report.txt") <(tail -n +2 "$test_dir/agat_sp_compare_two_annotations_3.txt")
if [ $? -ne 0 ]; then
echo "Output file report.txt does not match expected output"
exit 1
fi

echo "> Test successful"
Loading