diff --git a/index copy.html b/index copy.html deleted file mode 100644 index f4cef8ab..00000000 --- a/index copy.html +++ /dev/null @@ -1,386 +0,0 @@ - - - - - - - - Flavivirus-GLUE by giffordlabcvr - - - - - - - - - - - -
- - -

- Comparative genomic analysis of flavivirids using GLUE -

-
- -

- This is Flavivirid-GLUE, a - GLUE - project for the - flavivirids - (family Flaviviridae). -

- -

- The Flaviviridae comprise enveloped, positive-strand RNA viruses, - many of which pose serious risks to human health on a global scale. - Arthropod-borne flaviviruses such as Zika virus (ZIKV), - Dengue virus (DENV), and - yellow fever virus (YFV) - are the causative agents of large-scale outbreaks that result - in millions of human infections every year, while the bloodborne hepatitis C virus - (HCV) - is a major cause of chronic liver disease. -

- -
- -

Flaviviruses

- - -
- -

- - Projected urbanisation in 2027 (from The Economist magazine). - Urbanisation is often associated with the emergence and spread of mosquito-borne diseases - by creating favourable conditions for the survival of mosquito vector species. - Genome data can directly inform efforts to control diseases caused by mosquito-borne flaviviruses. - -

- -
- -
- - -

- Since the emergence of the SARS-COV2 pandemic, many have become familiar with - the use of virus genome data to track the spread and evolution of pathogenic viruses - - e.g. via tools such as NextStrain. - However, it is less widely appreciated that the same kinds of data sets and comparative genomic approaches - can also be used to explore the structural and functional basis of virus adaptations. -

- -

- The GLUE software framework - provides an extensible platform for implementing computational genomic - analysis of viruses in an efficient, standardised and reproducible way. - GLUE projects can not only incorporate all of the data items typically used in - comparative genomic analysis - (e.g. sequences, alignments, genome feature annotations) but can also represent the complex - semantic links between these data items via a relational database. - This 'poises' sequences and associated data for application in computational - analysis, minimising the requirement for labour-intensive pre-processing of datasets. -

- -

- GLUE projects are equally suited for carrying out exploratory work - (e.g. using virus genome data to investigate structural and functional properties of viruses) - as they are for implementing operational procedures (e.g. producing - standardised reports - in a public or animal health setting). -

- - -

- Hosting of GLUE projects in an online version control system (e.g. GitHub) provides - a mechanism for their stable, collaborative development, as shown below. -

- -

GitHub illustration

- - -
-

- What is a GLUE project? -

-
- -

- GLUE is an open, integrated - software toolkit that provides functionality for storage and interpretation of - sequence data. It supports the development of “projects” containing the data items - required for comparative genomic analysis - (e.g. sequences, multiple sequence alignments, genome feature annotations, - and other sequence-associated data). -

- - -
-

GLUE framework figure

-
- - -

- Projects are loaded into the GLUE "engine", creating a relational database - that represents the semantic relationships between data items. - This provides a robust foundation for the implementation of systematic - comparative analyses and the development of sequence-based resources. - The database schema can be extended to accommodate the idiosyncrasies of different projects. - GLUE provides a scripting layer (based on JavaScript) - for developing custom analysis tools. -

- - - - - -
-

GLUE resources: server deployment illustration

-
- - - -

- Some examples of 'sequence-based resources' built for viruses using GLUE include: - -

- -
- - -

- -

- -

- - - -
-

- What does building the Flavivirid-GLUE project offer? -

-
- - - -

- - Flavivirid-GLUE contains aligned, annotated reference genome sequences for all - flavivirid species and - endogenous viral elements (EVEs) derived from flavivirids. - - It offers a number of advantages for performing comparative sequence - analysis of flavivirids: - - -

    - -
  1. Reproducibility. - For many reasons, bioinformatics analyses are notoriously difficult to reproduce. - The GLUE framework supports the implementation of fully reproducible - comparative genomics through the introduction of data standards and the use - of a relational database to capture the semantic links between data items. -
  2. -
    - -
  3. Reusable data objects and analysis logic. - For many - if not most - comparative genomic analyses, data preparation is nine - tenths of the battle. The GLUE framework has been designed to ensure that - work spent preparing high-value data items such as multiple sequence alignments - need only be performed once. Hosting of GLUE projects in an online version control - system such as GitHub allows for collaborative management of important data items - and community testing of hypotheses. -
  4. -
    - - -
  5. Validation. - Building GLUE projects entails mapping the semantic links between data items - (e.g. sequences, tabular data, multiple sequence alignments). - This process provides an opportunity - for cross-validation, and thereby enforces a high level of data integrity. -
  6. -
    - -
  7. Standardisation of the genomic co-ordinate space. GLUE - projects allow all sequences to utilise the coordinate space of a chosen - reference sequence. Contingencies associated with insertions and deletions - (indels) are handled in a systematic way. -
  8. -
    - -
  9. Predefined, fully annotated reference sequences: - This project includes fully-annotated reference sequences for major lineages - within the Hepadnaviridae family. -
  10. -
    - -
  11. Alignment trees: - GLUE allows linking of alignments constructed at distinct taxonomic levels - via an ""alignment tree" - data structure. In the alignment tree, each alignment - is constrained to a standard reference sequence, thus all multiple sequence - alignments are linked to one another via a standardised coordinate system. -
  12. - -
- - -

- - - -
-

- GLUE project -

-
- - - -

- - On computers with GLUE installed, - the Flavivirid-GLUE project can be instantiated by navigating to the project folder, - initiating GLUE, and issuing the following command in the GLUE shell: - -

-  Mode path: /
-  GLUE> run file buildCompleteProject.glue
- -

- - -
-

- Contributors -

-
- -

- Robert J. Gifford (robert.gifford@glasgow.ac.uk) -

-

- Rhys Parry -

-

- Connor Bamford -

-

- William Marciel de Souza -

- - -
-

- Related Publications -

-
- - - -

- - - Bamford CGG, de Souza WM, Parry R and RJ Gifford - (2021) -
- Comparative analysis of genome-encoded viral sequences reveals the evolutionary history of the Flaviviridae. -
- [preprint] -
-
- - Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford - (2018) -
- GLUE: A flexible software system for virus sequence data. -
- BMC Bioinformatics - [view] -
-
- - Zhu H, Dennis T, Hughes J, and RJ Gifford - (2018) -
- Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. - [preprint] -
-
- - -

- - - -

- License -

-
- -

- This project is licensed under the GNU Affero General Public License v. 3.0. -

- -
-
- - - - - -
- - - - - diff --git a/index.html b/index.html index e1d26e09..b5017739 100644 --- a/index.html +++ b/index.html @@ -21,6 +21,7 @@

Flavivirid-GLUE

Resources for comparative genomic analysis of flavivirids (family Flaviviridae)

+ User Guide News Download @@ -77,6 +78,12 @@


+

+ + Flavivirus-GLUE, built within the GLUE software framework, provides a comprehensive resource for the comparative analysis of flavivirid genomes. This platform integrates sequences, alignments, and genome feature annotations into a cohesive, relational database, allowing for standardized and reproducible genomic analysis. Flavivirus-GLUE minimizes the need for manual data handling, ensuring efficient workflows and facilitating the exploration of genetic diversity and functional genomics in a collaborative, extensible environment. + +

+

License diff --git a/website/assets/images/efv-nomenclature.png b/website/assets/images/efv-nomenclature.png deleted file mode 100644 index d5ee5cb9..00000000 Binary files a/website/assets/images/efv-nomenclature.png and /dev/null differ diff --git a/website/assets/images/eve-header2.png b/website/assets/images/eve-header2.png deleted file mode 100644 index 579fe72c..00000000 Binary files a/website/assets/images/eve-header2.png and /dev/null differ diff --git a/website/assets/images/eve-midlevel1.png b/website/assets/images/eve-midlevel1.png deleted file mode 100644 index 554d2cb8..00000000 Binary files a/website/assets/images/eve-midlevel1.png and /dev/null differ diff --git a/website/assets/images/flavi-header.jpg b/website/assets/images/flavi-header.jpg deleted file mode 100644 index 85436408..00000000 Binary files a/website/assets/images/flavi-header.jpg and /dev/null differ diff --git a/website/assets/images/flavi-msa-tree.jpg b/website/assets/images/flavi-msa-tree.jpg deleted file mode 100644 index 130dc67a..00000000 Binary files a/website/assets/images/flavi-msa-tree.jpg and /dev/null differ diff --git a/website/assets/images/github-hosting.jpg b/website/assets/images/github-hosting.jpg deleted file mode 100644 index b68e1dff..00000000 Binary files a/website/assets/images/github-hosting.jpg and /dev/null differ diff --git a/website/assets/images/glue-framework.jpg b/website/assets/images/glue-framework.jpg deleted file mode 100644 index abe7d264..00000000 Binary files a/website/assets/images/glue-framework.jpg and /dev/null differ diff --git a/website/assets/images/glue-servers.png b/website/assets/images/glue-servers.png deleted file mode 100644 index d70869d5..00000000 Binary files a/website/assets/images/glue-servers.png and /dev/null differ diff --git a/website/assets/images/hcv-genome.png b/website/assets/images/hcv-genome.png deleted file mode 100644 index ebf9e3fc..00000000 Binary files a/website/assets/images/hcv-genome.png and /dev/null differ diff --git a/website/assets/images/jingmenhosts.png b/website/assets/images/jingmenhosts.png deleted file mode 100644 index 4475693e..00000000 Binary files a/website/assets/images/jingmenhosts.png and /dev/null differ diff --git a/website/html/paleoviruses-old.html b/website/html/paleoviruses-old.html deleted file mode 100644 index c0372e15..00000000 --- a/website/html/paleoviruses-old.html +++ /dev/null @@ -1,343 +0,0 @@ - - - - - - - - Flavivirus-GLUE by giffordlabcvr - - - - - - - - - - - -
- -

- Endogenous flavivirid (EFV) data -

-
- -

- We have used GLUE to organise the 'genomic fossil record' of flavivirids. - This page provides a description of Flavivirid-GLUE's paleovirus component, and - quick links to specific data items. -

- - -

- Please note: links to files on GitHub are mainly designed to indicate where - these files are located within the repository. - To investigate files (e.g. tree files) in the appropriate software context we recommend - downloading the entire repository - and browsing locally. - -

- -
-

- How were the EFV data generated? -

-
- - -

- - EFV sequences were recovered from whole genome sequence (WGS) assemblies - via database-integrated genome screening (DIGS) using the - DIGS tool. -

- - - -

- All data pertaining to this screen are included in this repository. -

- - - - - - - -
-

- Nomenclature for EFVs -

-
- - -

- We have applied a systematic approach to naming EFV, following a convention - developed for endogenous retroviruses. - Each element was assigned a unique identifier (ID) constructed from a defined - set of components. -

- - -

EFV Nomenclature

- - - - -

- The first component is the classifier ‘EFV’ (endogenous flavivirid). -

- -

- The second component is a composite of two distinct subcomponents separated by a period: - - (i) the name of EFV group; - (ii) a numeric ID that uniquely identifies the insertion. - The numeric ID is an integer that identifies a unique insertion locus that arose as a - consequence of an initial germline infection. Thus, orthologous copies in different - species are given the same number. -

- -

- The third component of the ID defines the set of host species in which the ortholog occurs. -

- - - - -
-

- Paleovirus-specific schema extensions -

-
- - -

- The paleovirus component of Flavivirid-GLUE extends GLUE's - core schema - to allow the capture of EFV-specific data. - These schema extensions are defined in - this file - and comprise two additional table: 'locus_data' and 'refcon_data'. - Both tables are linked to the main 'sequence' table via the 'sequenceID' field. -

- - -

- The 'locus_data' table contains EFV locus information: e.g. species, assembly, scaffold, location coordinates. -

- -

- The 'refcon_data' table contains summary information for individual - EFV insertions. It refers to the reference sequences constructed to represent - each insertion, which reflect our best efforts to reconstruct progenitor virus - sequences as they might have looked when they initially integrated into - the germline of ancestral species. -

- - - - -
-

- Raw EFV sequences and data -

-
- - -

Species with endogenous flaviviruses

- - -
-

- Some of the species in which we identified novel endogenous flaviviral elements (EFVs) - Left to right: - freshwater jellyfish (Craspedacusta sowerbyi), - long-horned beetle (Anoplophora glabripennis), - tadpole shrimp (Lepidurus arcticus), - tube-eye fish (Stylephorus chordatus). -

-
- -
- - -

- Raw FASTA for EFVs recovered via - database-integrated genome screening (DIGS) - are here. -

- - - -

- Sequence-associated data in tabular format are here. - The tabular files contain information about the genomic locations of EFVs. -

- - - -
- -

- EFV reference sequences and data -

- -
- - -

- We constructed reference sequences for EFVs using alignments of EFV - sequences derived from the same initial germline colonisation event - i.e. - orthologous elements in distinct species, and paralogous - elements that have arisen via intragenomic duplication of EFV sequences. -

- - - -

- EFV consensus/reference FASTA is - here. -

- - - -

- Tabular formatted metadata for EFV reference sequences is - here. -

- - - -
- -

- Phylogenetic trees -

-
- - -

- - We used GLUE to implement an automated process for deriving midpoint rooted, - annotated trees from the alignments included in our project. -

- - -

- Trees were constructed at distinct taxonomic levels: - -

    -
  1. Major lineage-level phylogenies
  2. -
  3. Minor lineage-level phylogenies
  4. -
  5. Genus-level phylogenies
  6. -
  7. Subgenus-level phylogenies
  8. -
- -

- - - -
-

- Related Publications -

-
- - -

- - Bamford CGG, de Souza WM, Parry R and RJ Gifford - (2021) -
- Comparative analysis of genome-encoded viral sequences reveals the evolutionary history of the Flaviviridae. -
- preprint [view] -
-
- - Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford - (2018) -
- GLUE: A flexible software system for virus sequence data. -
- BMC Bioinformatics - [view] -
-
- - Zhu H, Dennis T, Hughes J, and RJ Gifford - (2018) -
- Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. - [preprint] -
-
- - Gifford RJ, Blomberg B, Coffin JM, Fan H, Heidmann T, Mayer J, Stoye J, Tristem M, and WE Johnson - (2018) -
- Nomenclature for endogenous retrovirus (ERV) loci. -
- Retrovirology - [view] -
-
- -

- - - - -
- - - - - - - diff --git a/website/html/viruses-old.html b/website/html/viruses-old.html deleted file mode 100644 index c390de69..00000000 --- a/website/html/viruses-old.html +++ /dev/null @@ -1,549 +0,0 @@ - - - - - - - - Flavivirid-GLUE by giffordlabcvr - - - - - - - - - - - -
- - -

- Virus data included in Flavivirid-GLUE -

-
- - -

- This page provides background information on the virus-associated - data items included in the project - information about endogenous flaviviral elements (EFVs) - can be found here. -

- - -

- - Please note: links to files on GitHub are mainly designed to indicate where - these files are located within the repository. - To investigate files (e.g. tree files) in the appropriate software context we recommend - downloading the entire repository - and browsing locally. - -

- -

- - Those specifically interested in hepatitis C virus (HCV) - may want to investigate - HCV-GLUE - and - NCBI-HCV-GLUE. - - These GLUE projects were developed specifically for HCV and incorporate - a graphical user interface (GUI) - that allows users to browse the underlying GLUE database - via 'point-and-click' methods. - -

- -

- - The MRC-University of Glasgow Centre for Virus Research hosts an instance of the - GUI version of HCV-GLUE. - -

- -
-

- Flavivirid genome features -

-
- - -

- - Currently, four flavivirid genera are recognised: Pegivirus, Pestivirus, Hepacivirus and Flavivirus. - These genera contain viruses that have monopartite genomes ~10 kilobases (Kb) in length and encoding - one or more large polyproteins that are co- and post-translationally cleaved to generate mature virus - proteins. The structural proteins of the virion - capsid (C), premembrane (prM) and envelope (E) - are - encoded toward the 5’ end of the genome, while genes encoding non-structural (NS) proteins are located - further downstream. - -

- - -

HCV genome

- -
-

- A schematic representation of the hepatitis C virus (HCV) genome. C=capsid; E=envelope; NS=non-structural; Kb=kilobases -

- -
- -
- - - -

- - A diverse variety of novel ‘flavivirid-like’ virus species have been described over recent years. - Most of these newly identified viruses have yet to be incorporated into official taxonomy. - They exhibit a much greater range of variation in genome structure than is found among - representatives of officially recognised flavivirid genera, with genome sizes ranging up - to 20Kb, and one novel group – the Jingmenviruses – comprises viruses with genomes - that are segmented rather than monopartite. - -

- - -

- We defined a standard set of genome features - for flavivirids, reflecting current knowledge, and - incorporated this information into Flavivirus-GLUE. -

- - - -
- - -

- Flavivirus sequences and sequence-associated data -

-
- - -

- The sequence data in this project are organised into multiple distinct sources. - Each source contains data in either GenBank XML or plain FASTA format. - The type of data is indicated by the name of the source (all GenBank XML sources - contain 'ncbi' in the name). -

- - -

- GenBank XML files are imported into this project directly from - NCBI GenBank - using - an appropriately configured - version of GLUE's GenBank importer module. - The core Flavivirid-GLUE project contains a single NCBI-derived source - - ncbi-refseqs - - that contains 'master reference' genome sequences for each flavivirid species included in this project. -

- - - -

- - Where possible, we prefer to use sequences obtained via GenBank since it - represents the principle source of published nucleotide sequence data. - However, FASTA sources can also be used in GLUE, - making it straightforward to expand private instances of this GLUE project with - unpublished sequences. -

- - -

- Genbank sequences are uniquely identified within GLUE projects by their - GenBank accession numbers. - Sequences included in this project are linked to - auxiliary data in tabular format. -

-

- Flavivirid sequence-associated data recorded in Flavivirid-GLUE are as follows: -

- - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterTypeDefinition
full_nameVARCHARFull name of the virus this sequence is derived from
nameVARCHARAbbreviated name of the virus this sequence is derived from
subfamilyVARCHARTaxonomy - virus subfamily (proposed)
supergenusVARCHARTaxonomy - virus supergenus (proposed)
genusVARCHARTaxonomy - virus genus
subgenusVARCHARTaxonomy - virus subgenus
cladeVARCHARTaxonomy - virus clade
segmentINTEGERGenome segment (Jingmenviruses only)
host_groupVARCHARTaxonomic group of host species
vector_groupVARCHARTaxonomic group of vector species
reservoir_groupVARCHARTaxonomic group of reservoir species
isolate_nameVARCHARName of the virus isolate this sequence is derived from
isolation_hostVARCHARSpecies (Latin binomial) virus was isolated from
lengthINTEGERLength of the sequence
pubmed_idINTEGERPubMed ID of manuscript associated with sequence
gb_create_dateGenBankGenBank creation date of the sequence
gb_update_dateVARCHARDate of most recent GenBank update
countryVARCHARCountry where virus was isolated
place_sampledVARCHARLocation of sampling (state, region, or city)
collection_yearINTEGERYear virus was isolated
collection_monthVARCHARMonth virus was isolated
collection_month_dayVARCHARDay of month virus was isolated
- -
-

- - - These data are recorded in GLUE's underlying relational database. GLUE's - core database schema - was extended to include these fields, as defined in this - schema extensions file. - -

- - - -
-

- Flaviviridae reference genomes - fully annotated sequences -

-
- -

- - We defined 'master' reference sequences to represent recognised flavivirid genera/subgenera, as follows: - -
-

- Genus Flavivirus -

- - - - -

- Other Flaviviridae genera -

- - - - - -

- - -

- - We explicitly defined the locations of genome features - on master reference sequences using GLUE commands - (see here). - -

- - - -
-

- Multiple sequence alignments - maps of sequence homology -

-
- -

- Multiple sequence alignments (MSAs) are the basic currency of comparative genomic analysis. - MSAs constructed in this study are linked together using - GLUE's constrained MSA tree data structure. -

- - -

- A 'constrained MSA' is an alignment in which the coordinate space is defined by - a selected reference sequence. Where alignment members contain insertions relative - to the reference sequence, the inserted sequences are recorded and stored - (i.e. sequence data is never deleted). -

- - -

- - GLUE projects have the option of using a data structure called an alignment tree - to link constrained MSAs representing different taxonomic levels, - and we've used this approach in Flavivirus-GLUE. -

- - -
- -

Alignment tree concept

- - -
-

- The schematic figure above shows the 'alignment tree' data structure - currently implemented in Flavivirus-GLUE. - For the highest taxonomic levels (i.e. at the root) we aligned only the most - conserved regions of the genome, whereas for the lower - taxonomic levels (i.e. within and below genus level) we aligned complete coding - sequences. - - We used an alignment tree data structure to link these alignments, - via a set of common reference sequences. - The root alignment contains reference sequences for major clades, - whereas all children of the - root inherit at least one reference from their immediate parent. - Thus, all alignments are linked to one another via our chosen set of - master reference sequences. -

- -
- - -
- - - -

- - - Alignments in the project include: - - -

    -
  1. A ‘root’ - alignment (i.e. family-level) constructed to represent homology between the two largest subgroupings in the Flaviviridae. - -
  2. major-lineage’ - alignments constructed to represent proposed homologies between representative members - of major Flaviviridae lineages.
  3. - -
  4. minor-lineage’ - alignments constructed to represent proposed homologies between representative members - of 'minor' Flaviviridae lineages.
  5. - -
  6. genus-level’ - alignments constructed to represent proposed homologies between the genomes of - representative members of specific flavivirid genera.
  7. - -
  8. subgenus-level’ - alignments constructed to represent proposed homologies between the genomes of - representative members of specific flavivirid subgenera.
  9. -
- -

- - - -
-

- Phylogenetic trees - reconstructed evolutionary relationships -

-
- - -

- We used GLUE to implement an automated process for deriving midpoint rooted, - annotated trees from the alignments included in our project. -

- - -

- Trees were constructed at distinct taxonomic levels: - -

    -
  1. major-lineage’ - phylogenetic trees showing reconstructed evolutionary relationships between - representative members of major flavivirid lineages.
  2. - -
  3. minor-lineages’ - phylogenetic trees showing reconstructed evolutionary relationships between - representative members of minor flavivirid lineages.
  4. - -
  5. genus-level’ - phylogenetic trees showing reconstructed evolutionary relationships between - representative members of specific flavivirid genera.
  6. - -
  7. subgenus-level’ - phylogenetic trees showing reconstructed evolutionary relationships between - representative members of specific flavivirid subgenera.
  8. -
- -

- - - - - - - -
- - - - - - - -