diff --git a/index.html b/index.html
index 28f8f5a..b864337 100644
--- a/index.html
+++ b/index.html
@@ -19,11 +19,11 @@
@@ -45,168 +45,7 @@
WGS assemblies screened in this project were obtained from the NCBI Genome resource.
DIGS was performed using the DIGS tool, an open software framework available here.
-
-
- Directory Structure
-
- The DIGS-for-EVEs repository is organized to categorize EVE loci based on host species groups, virus subdivisions, and catalog version, as follows:
-
- DIGS-for-EVEs/
- └── eve/
- └── animals/
- └── vertebrates/
- └── nonretroviral/
- └── version-1.0/
- ├── input/
- └── output/
-
-
-
-
- Subdirectories
-
-
- eve/
- Contains versioned catalogs of EVE loci.
-
- animals/
- Subdivision based on host species group.
-
- vertebrates/
- Further subdivision of the host group.
-
- nonretroviral/
- Subdivision by virus group, non-retroviral viruses in this case.
-
- version-1.0/
- Version of the catalog for this host & virus subdivision.
-
- input/
- Contains files used as input for the in silico genome screening process used to generate the catalog.
-
- output/
- Contains the results and summary of the genome screen.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Detailed Contents
-
-
-
-
- input/
Directory
-
-
- - Virus polypeptide probe sequences used for screening (FASTA format).
- - Reference protein sequence library used for classifying hits recovered by screening (FASTA format).
- - Details of the WGS assemblies screened in this project (assembly files are not included due to their large sizes).
- - Control file used with the DIGS tool to implement systematic in silico genome screening.
-
-
-
-
- output/
Directory
-
-
- - Tables exported from screening databases (includes
digs_results
table with nucleotide sequences of EVE loci).
- - Summary statistics describing screening results.
- - A catalog of endogenous viral element loci identified within this host group.
-
-
-
-
-
-
- Standardised Nomenclature for EVE loci
-
- In DIGS-for-EVEs we have applied a systematic approach to naming of non-retroviral EVEs, following a previously developed convention. Each element is assigned a unique identifier (ID) constructed from three components, separated by hyphens:
- e.g. EBLG-Carbovirus.2-Boreoeutheria
-
- - The first component identifies the type of EVE (EBLG). Please see below for a glossary of EVE types.
- - The second component, a combination of two distinct subcomponents separated by a period, defines:
-
- - (i) The name of the taxonomic group from which the EVE derives (Carbovirus).
- - (ii) A numeric ID (2) that uniquely identifies the insertion within the EVE category and taxonomic group to which it has been assigned. Orthologous copies in different species share the same number.
-
-
- - The third component of the ID specifies the host species or species group in which the EVE occurs (magnorder Boreoeutheria). For EVEs only known to occur in a single species, the Latin binomial species name is given. Where EVEs are shared across multiple species, we provide a taxonomic group name to capture that range of species.
-
- This systematic naming approach facilitates clear identification and comparison of EVEs across different species and research contexts.
-
- Please note the following:
-
- - EVEs were assigned to virus taxonomic groups as accurately as possible based on phylogenetic/genomic analysis. For EVEs that could not be confidently assigned to a subgroup, the lowest taxonomic rank possible for the EVE type is given (i.e. family).
- - We grouped sets of orthologous EVEs using shared numeric IDs. However, some orthologous relationships might have been missed, and some EVEs may have been incorrectly grouped as orthologs when they are actually distinct, paralogous loci. The 'digs_results' table includes information on how well each locus matched its assigned ortholog group via BLAST, providing a way to assess the confidence in these orthology designations.
- - When EVEs occur in a single species, the corresponding Latin binomial species name is provided. When EVEs occur as orthologs in multiple species, we provide the taxonomic name of the species group. If the species set corresponds to an unranked clade, we use the name of the closest named group at a lower rank and add the abbreviation 'UR' (unranked) to indicate that no named clade perfectly captures the range of species in which the EVE is found.
- - Although the naming convention used here was originally developed for ERVs, we have not yet applied it to ERV loci recovered via DIGS. Given the vast number of ERV loci present in vertebrate genomes, this will inevitably pose more significant challenges and require a longer-term effort compared to non-retroviral EVEs.
-
-
-
-
- Glossary of EVE Types:
-
-
-
-
- DNA viruses & Retroviruses
-
-
- - ECV: Endogenous Circovirus-like Element
- - EPV: Endogenous Parvovirus-like Element
- - ERV: Endogenous Retrovirus
- - eHBV: Endogenous Hepatitis B Virus
- - ciHV: Chromosomally-Integrated Herpesvirus
-
-
-
-
- RNA viruses
-
-
- - EBLG: Endogenous Borna-like Glycoprotein
- - EBLL: Endogenous Borna-like L Protein
- - EBLN: Endogenous Borna-like Nucleoprotein
- - EBLM: Endogenous Borna-like Matrix protein
- - EFLH: Endogenous Filo-like VP30
- - EFLN: Endogenous Filo-like Nucleoprotein
- - EFLL: Endogenous Filo-like L Protein
- - EFLP: Endogenous Filo-like Phosphoprotein
- - ECLL: Endogenous Chu-like L Protein
- - ECLN: Endogenous Chu-like Nucleoprotein
- - ECLM: Endogenous Chu-like Matrix protein
- - EPLL: Endogenous Paramxyo-like L Protein
- - EPLN: Endogenous Paramxyo-like Nucleoprotein
- - EPLH: Endogenous Paramxyo-like Hemagglutinin (HA)-Neuraminidase Protein
- - EFL: Endogenous Flavivirus-like Element
-
+ For more information please see the Project Documentation.