Skip to content

Commit

Permalink
Add script template and modules
Browse files Browse the repository at this point in the history
  • Loading branch information
robjgiff committed Oct 15, 2024
1 parent ae39458 commit 6761599
Show file tree
Hide file tree
Showing 8 changed files with 1,986 additions and 166 deletions.
171 changes: 5 additions & 166 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
<section class="page-header">
<h1 class="project-name">DIGS for EVEs</h1>
<h2 class="project-tagline">Database-integrated genome screening (DIGS) for endogenous viral elements (EVEs)</h2>
<a target="_blank" href="https://github.com/giffordlabcvr/DIGS-for-EVEs" class="btn">GitHub</a>
<a target="_blank" href="https://github.com/giffordlabcvr/DIGS-for-EVEs/wiki" class="btn">User Guide</a>
<a href="https://github.com/giffordlabcvr/DIGS-for-EVEs" class="btn">GitHub</a>
<a href="https://github.com/giffordlabcvr/DIGS-tool/zipball/master" class="btn">Download</a>

<a href="https://twitter.com/DIGS_for_EVEs" class="btn">Twitter</a>
<a href="https://giffordlabcvr.github.io/DIGS-tool/" class="btn">DIGS Tool</a>
<a target="_blank" href="https://twitter.com/DIGS_for_EVEs" class="btn">Twitter</a>
<a target="_blank" href="https://giffordlabcvr.github.io/DIGS-tool/" class="btn">DIGS Tool</a>
<!--<a href="./eve.txt" class="btn">EVE catalog</a>-->
</section>

Expand All @@ -45,168 +45,7 @@ <h3>
<p>WGS assemblies screened in this project were obtained from the <a href="https://www.ncbi.nlm.nih.gov/genome/"><strong>NCBI Genome</strong></a> resource.</p>
<p>DIGS was performed using the DIGS tool, an open software framework available <a href="https://giffordlabcvr.github.io/DIGS-tool/"><strong>here</strong></a>.</p>

<h3>
<a id="header-directory-structure" class="anchor" href="#header-directory-structure" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>Directory Structure</strong>
</h3>
<p>The DIGS-for-EVEs repository is organized to categorize EVE loci based on host species groups, virus subdivisions, and catalog version, as follows:</p>
<pre>
DIGS-for-EVEs/
└── eve/
└── animals/
└── vertebrates/
└── nonretroviral/
└── version-1.0/
├── input/
└── output/
</pre>

<h3>
<a id="header-subdirectories" class="anchor" href="#header-subdirectories" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>Subdirectories</strong>
</h3>
<ul>
<li><code>eve/</code><br>
Contains versioned catalogs of EVE loci.
<ul>
<li><code>animals/</code><br>
Subdivision based on host species group.
<ul>
<li><code>vertebrates/</code><br>
Further subdivision of the host group.
<ul>
<li><code>nonretroviral/</code><br>
Subdivision by virus group, non-retroviral viruses in this case.
<ul>
<li><code>version-1.0/</code><br>
Version of the catalog for this host & virus subdivision.
<ul>
<li><code>input/</code><br>
Contains files used as input for the in silico genome screening process used to generate the catalog.
</li>
<li><code>output/</code><br>
Contains the results and summary of the genome screen.
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>

<h3>
<a id="header-detailed-contents" class="anchor" href="#header-detailed-contents" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>Detailed Contents</strong>
</h3>

<h3>
<a id="header-input-directory" class="anchor" href="#header-input-directory" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong><code>input/</code> Directory</strong>
</h3>
<ul>
<li>Virus polypeptide probe sequences used for screening (FASTA format).</li>
<li>Reference protein sequence library used for classifying hits recovered by screening (FASTA format).</li>
<li>Details of the WGS assemblies screened in this project (assembly files are not included due to their large sizes).</li>
<li>Control file used with the <a href="https://giffordlabcvr.github.io/DIGS-tool/">DIGS tool</a> to implement systematic in silico genome screening.</li>
</ul>

<h3>
<a id="header-output-directory" class="anchor" href="#header-output-directory" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong><code>output/</code> Directory</strong>
</h3>
<ul>
<li>Tables exported from screening databases (includes <code>digs_results</code> table with nucleotide sequences of EVE loci).</li>
<li>Summary statistics describing screening results.</li>
<li>A catalog of endogenous viral element loci identified within this host group.</li>
</ul>

<h3>
<a id="header-standardised-nomenclature" class="anchor" href="#header-standardised-nomenclature" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>Standardised Nomenclature for EVE loci</strong>
</h3>
<p>In DIGS-for-EVEs we have applied a systematic approach to naming of non-retroviral EVEs, following a <a href="https://doi.org/10.1186/s12977-018-0442-1">previously developed</a> convention. Each element is assigned a unique identifier (ID) constructed from three components, separated by hyphens:</p>
<p>e.g. <strong>EBLG-Carbovirus.2-Boreoeutheria</strong></p>
<ul>
<li><strong>The first component</strong> identifies the type of EVE (EBLG). Please see below for a glossary of EVE types.</li>
<li><strong>The second component</strong>, a combination of two distinct subcomponents separated by a period, defines:
<ul>
<li>(i) The name of the taxonomic group from which the EVE derives (Carbovirus).</li>
<li>(ii) A numeric ID (2) that uniquely identifies the insertion within the EVE category and taxonomic group to which it has been assigned. Orthologous copies in different species share the same number.</li>
</ul>
</li>
<li><strong>The third component</strong> of the ID specifies the host species or species group in which the EVE occurs (magnorder Boreoeutheria). For EVEs only known to occur in a single species, the Latin binomial species name is given. Where EVEs are shared across multiple species, we provide a taxonomic group name to capture that range of species.</li>
</ul>
<p>This systematic naming approach facilitates clear identification and comparison of EVEs across different species and research contexts.</p>

<p><strong>Please note the following:</strong></p>
<ol>
<li>EVEs were assigned to virus taxonomic groups as accurately as possible based on phylogenetic/genomic analysis. For EVEs that could not be confidently assigned to a subgroup, the lowest taxonomic rank possible for the EVE type is given (i.e. family).</li>
<li>We grouped sets of orthologous EVEs using shared numeric IDs. However, some orthologous relationships might have been missed, and some EVEs may have been incorrectly grouped as orthologs when they are actually distinct, paralogous loci. The 'digs_results' table includes information on how well each locus matched its assigned ortholog group via BLAST, providing a way to assess the confidence in these orthology designations.</li>
<li>When EVEs occur in a single species, the corresponding Latin binomial species name is provided. When EVEs occur as orthologs in multiple species, we provide the taxonomic name of the species group. If the species set corresponds to an unranked clade, we use the name of the closest named group at a lower rank and add the abbreviation 'UR' (unranked) to indicate that no named clade perfectly captures the range of species in which the EVE is found.</li>
<li>Although the naming convention used here was originally developed for ERVs, we have not yet applied it to ERV loci recovered via DIGS. Given the vast number of ERV loci present in vertebrate genomes, this will inevitably pose more significant challenges and require a longer-term effort compared to non-retroviral EVEs.</li>
</ol>

<h3>
<a id="header-glossary-eve-types" class="anchor" href="#header-glossary-eve-types" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>Glossary of EVE Types:</strong>
</h3>

<h3>
<a id="header-dna-viruses" class="anchor" href="#header-dna-viruses" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>DNA viruses & Retroviruses</strong>
</h3>
<ul>
<li><strong>ECV</strong>: Endogenous Circovirus-like Element</li>
<li><strong>EPV</strong>: Endogenous Parvovirus-like Element</li>
<li><strong>ERV</strong>: Endogenous Retrovirus</li>
<li><strong>eHBV</strong>: Endogenous Hepatitis B Virus</li>
<li><strong>ciHV</strong>: Chromosomally-Integrated Herpesvirus</li>
</ul>

<h3>
<a id="header-rna-viruses" class="anchor" href="#header-rna-viruses" aria-hidden="true">
<span class="octicon octicon-link"></span>
</a>
<strong>RNA viruses</strong>
</h3>
<ul>
<li><strong>EBLG</strong>: Endogenous Borna-like Glycoprotein</li>
<li><strong>EBLL</strong>: Endogenous Borna-like L Protein</li>
<li><strong>EBLN</strong>: Endogenous Borna-like Nucleoprotein</li>
<li><strong>EBLM</strong>: Endogenous Borna-like Matrix protein</li>
<li><strong>EFLH</strong>: Endogenous Filo-like VP30</li>
<li><strong>EFLN</strong>: Endogenous Filo-like Nucleoprotein</li>
<li><strong>EFLL</strong>: Endogenous Filo-like L Protein</li>
<li><strong>EFLP</strong>: Endogenous Filo-like Phosphoprotein</li>
<li><strong>ECLL</strong>: Endogenous Chu-like L Protein</li>
<li><strong>ECLN</strong>: Endogenous Chu-like Nucleoprotein</li>
<li><strong>ECLM</strong>: Endogenous Chu-like Matrix protein</li>
<li><strong>EPLL</strong>: Endogenous Paramxyo-like L Protein</li>
<li><strong>EPLN</strong>: Endogenous Paramxyo-like Nucleoprotein</li>
<li><strong>EPLH</strong>: Endogenous Paramxyo-like Hemagglutinin (HA)-Neuraminidase Protein</li>
<li><strong>EFL</strong>: Endogenous Flavivirus-like Element</li>
</ul>
<p>For more information please see the <a href="https://github.com/giffordlabcvr/DIGS-for-EVEs/wiki"><strong>Project Documentation</strong></a>.</p>

<h3>
<a id="header-references" class="anchor" href="#header-references" aria-hidden="true">
Expand Down
Loading

0 comments on commit 6761599

Please sign in to comment.