+ + +

+ Comparative genomic analysis of flavivirids using GLUE +

+
+ +

+ This is Flavivirid-GLUE, a + GLUE + project for the + flavivirids + (family Flaviviridae). +

+ +

+ The Flaviviridae comprise enveloped, positive-strand RNA viruses, + many of which pose serious risks to human health on a global scale. + Arthropod-borne flaviviruses such as Zika virus (ZIKV), + Dengue virus (DENV), and + yellow fever virus (YFV) + are the causative agents of large-scale outbreaks that result + in millions of human infections every year, while the bloodborne hepatitis C virus + (HCV) + is a major cause of chronic liver disease. +

+ +
+ +

Flaviviruses

+ + +
+ +

+ + Projected urbanisation in 2027 (from The Economist magazine). + Urbanisation is often associated with the emergence and spread of mosquito-borne diseases + by creating favourable conditions for the survival of mosquito vector species. + Genome data can directly inform efforts to control diseases caused by mosquito-borne flaviviruses. + +

+ +
+ +
+ + +

+ Since the emergence of the SARS-COV2 pandemic, many have become familiar with + the use of virus genome data to track the spread and evolution of pathogenic viruses + - e.g. via tools such as NextStrain. + However, it is less widely appreciated that the same kinds of data sets and comparative genomic approaches + can also be used to explore the structural and functional basis of virus adaptations. +

+ +

+ The GLUE software framework + provides an extensible platform for implementing computational genomic + analysis of viruses in an efficient, standardised and reproducible way. + GLUE projects can not only incorporate all of the data items typically used in + comparative genomic analysis + (e.g. sequences, alignments, genome feature annotations) but can also represent the complex + semantic links between these data items via a relational database. + This 'poises' sequences and associated data for application in computational + analysis, minimising the requirement for labour-intensive pre-processing of datasets. +

+ +

+ GLUE projects are equally suited for carrying out exploratory work + (e.g. using virus genome data to investigate structural and functional properties of viruses) + as they are for implementing operational procedures (e.g. producing + standardised reports + in a public or animal health setting). +

+ + +

+ Hosting of GLUE projects in an online version control system (e.g. GitHub) provides + a mechanism for their stable, collaborative development, as shown below. +

+ +

GitHub illustration

+ + +
+

+ What is a GLUE project? +

+
+ +

+ GLUE is an open, integrated + software toolkit that provides functionality for storage and interpretation of + sequence data. It supports the development of “projects” containing the data items + required for comparative genomic analysis + (e.g. sequences, multiple sequence alignments, genome feature annotations, + and other sequence-associated data). +

+ + +
+

GLUE framework figure

+
+ + +

+ Projects are loaded into the GLUE "engine", creating a relational database + that represents the semantic relationships between data items. + This provides a robust foundation for the implementation of systematic + comparative analyses and the development of sequence-based resources. + The database schema can be extended to accommodate the idiosyncrasies of different projects. + GLUE provides a scripting layer (based on JavaScript) + for developing custom analysis tools. +

+ + + + + +
+

GLUE resources: server deployment illustration

+
+ + + +

+ Some examples of 'sequence-based resources' built for viruses using GLUE include: + +

+ +
+ + +

+ +

+ +

+ + + +
+

+ What does building the Flavivirid-GLUE project offer? +

+
+ + + +

+ + Flavivirid-GLUE contains aligned, annotated reference genome sequences for all + flavivirid species and + endogenous viral elements (EVEs) derived from flavivirids. + + It offers a number of advantages for performing comparative sequence + analysis of flavivirids: + + +

    + +
  1. Reproducibility. + For many reasons, bioinformatics analyses are notoriously difficult to reproduce. + The GLUE framework supports the implementation of fully reproducible + comparative genomics through the introduction of data standards and the use + of a relational database to capture the semantic links between data items. +
  2. +
    + +
  3. Reusable data objects and analysis logic. + For many - if not most - comparative genomic analyses, data preparation is nine + tenths of the battle. The GLUE framework has been designed to ensure that + work spent preparing high-value data items such as multiple sequence alignments + need only be performed once. Hosting of GLUE projects in an online version control + system such as GitHub allows for collaborative management of important data items + and community testing of hypotheses. +
  4. +
    + + +
  5. Validation. + Building GLUE projects entails mapping the semantic links between data items + (e.g. sequences, tabular data, multiple sequence alignments). + This process provides an opportunity + for cross-validation, and thereby enforces a high level of data integrity. +
  6. +
    + +
  7. Standardisation of the genomic co-ordinate space. GLUE + projects allow all sequences to utilise the coordinate space of a chosen + reference sequence. Contingencies associated with insertions and deletions + (indels) are handled in a systematic way. +
  8. +
    + +
  9. Predefined, fully annotated reference sequences: + This project includes fully-annotated reference sequences for major lineages + within the Hepadnaviridae family. +
  10. +
    + +
  11. Alignment trees: + GLUE allows linking of alignments constructed at distinct taxonomic levels + via an ""alignment tree" + data structure. In the alignment tree, each alignment + is constrained to a standard reference sequence, thus all multiple sequence + alignments are linked to one another via a standardised coordinate system. +
  12. + +
+ + +

+ + + +
+

+ GLUE project +

+
+ + + +

+ + On computers with GLUE installed, + the Flavivirid-GLUE project can be instantiated by navigating to the project folder, + initiating GLUE, and issuing the following command in the GLUE shell: + +

+  Mode path: /
+  GLUE> run file buildCompleteProject.glue
+ +

+ + +
+

+ Contributors +

+
+ +

+ Robert J. Gifford (robert.gifford@glasgow.ac.uk) +

+

+ Rhys Parry +

+

+ Connor Bamford +

+

+ William Marciel de Souza +

+ + +
+

+ Related Publications +

+
+ + + +

+ + + Bamford CGG, de Souza WM, Parry R and RJ Gifford + (2021) +
+ Comparative analysis of genome-encoded viral sequences reveals the evolutionary history of the Flaviviridae. +
+ [preprint] +
+
+ + Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford + (2018) +
+ GLUE: A flexible software system for virus sequence data. +
+ BMC Bioinformatics + [view] +
+
+ + Zhu H, Dennis T, Hughes J, and RJ Gifford + (2018) +
+ Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. + [preprint] +
+
+ + +

+ + + +

+ License +

+
+ +

+ This project is licensed under the GNU Affero General Public License v. 3.0. +

+ +
+
+ + + + + +