Title | Galaxy |
---|---|
Training dataset: | PRJEB43037 - In August 2020, an outbreak of West Nile Virus affected 71 people with meningoencephalitis in Andalusia and 6 more cases in Extremadura (south-west of Spain), causing a total of eight deaths. The virus belonged to the lineage 1 and was relatively similar to previous outbreaks occurred in the Mediterranean region. Here, we present a detailed analysis of the outbreak, including an extensive phylogenetic study. This is one of the outbreak samples. |
Questions: |
|
Objectives: |
|
Estimated time: | 1h |
After performing variant calling, we want to know which is the importance of the variants in the viral genome. In order to give sense to the variants, we need to know in which gene they are, and which are their effects.
- Experiment info: PRJEB43037, WGS, Illumina MiSeq, paired-end
- Fastq R1: ERR5310322_1 - url :
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR531/002/ERR5310322/ERR5310322_1.fastq.gz
- Fastq R2: ERR5310322_2 url :
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR531/002/ERR5310322/ERR5310322_2.fastq.gz
- Reference genome NC_009942.1: fasta -- gff
- Click the
+
icon at the top of the history panel and create a new history with the namemapping 101 tutorial
as explained here
Follow the same instructions here
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR531/002/ERR5310322/ERR5310322_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR531/002/ERR5310322/ERR5310322_2.fastq.gz
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/875/385/GCF_000875385.1_ViralProj30293/GCF_000875385.1_ViralProj30293_genomic.fna.gz
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/875/385/GCF_000875385.1_ViralProj30293/GCF_000875385.1_ViralProj30293_genomic.gff.gz
Follow instructions here
Follow instructions here
Follow instructions here
- Search
snpeff build
in the search toolbox. - Select
SnpEff build: database from Genbank or GFF record
- Select the version icon (three boxes)
- Select the version
4.3+T.galaxy6
- Name of the database: WestNile.
- Input annotations are in: GFF
- GFF dataset to build database from: NC_009942.1 gff
- Choose the source for the reference genome > History
- Genome in FASTA format > NC_009942.1 fasta.
- Click
Run tool
.
- Search
snpeff eff
in the search toolbox. - Select
SnpEff eff: annotate variants
- Select the version icon (three boxes)
- Select the version
4.3+T.galaxy2
- Sequence changes (SNPs, MNPs, InDels): ivar vcf file
- Create CSV report, useful for downstream analysis (-csvStats): Yes.
- Genome source: Custom snpEff database in your history.
- SnpEff4.3 Genome Data > Snpeff build output.
- Click
Run tool
and wait.
- Click the 👁️ icon in the SnpEff html output and check the results.
- Search
SnpSift ExtractFields
in the search toolbox. - Variant input file in VCF format: snpeff eff vcf output.
- Fields to extract:
CHROM POS ID REF ALT FILTER ANN[*].EFFECT ANN[*].GENE ANN[*].FEATURE ANN[*].HGVS_C ANN[*].HGVS_P
- One effect per line: Yes.
- Click execute and wait.
- Click the 👁️ icon in the snpsift output and check the results.
Galaxy history for this exercise: https://usegalaxy.eu/u/smonzon/h/variant-calling-101-tutorial