- Comparative genomic analysis of flavivirids using GLUE -
-- -
- This is Flavivirid-GLUE, a - GLUE - project for the - flavivirids - (family Flaviviridae). -
- -- The Flaviviridae comprise enveloped, positive-strand RNA viruses, - many of which pose serious risks to human health on a global scale. - Arthropod-borne flaviviruses such as Zika virus (ZIKV), - Dengue virus (DENV), and - yellow fever virus (YFV) - are the causative agents of large-scale outbreaks that result - in millions of human infections every year, while the bloodborne hepatitis C virus - (HCV) - is a major cause of chronic liver disease. -
- -- - - - -
- -- -- - Projected urbanisation in 2027 (from The Economist magazine). - Urbanisation is often associated with the emergence and spread of mosquito-borne diseases - by creating favourable conditions for the survival of mosquito vector species. - Genome data can directly inform efforts to control diseases caused by mosquito-borne flaviviruses. - -
- -
- - -
- Since the emergence of the SARS-COV2 pandemic, many have become familiar with - the use of virus genome data to track the spread and evolution of pathogenic viruses - - e.g. via tools such as NextStrain. - However, it is less widely appreciated that the same kinds of data sets and comparative genomic approaches - can also be used to explore the structural and functional basis of virus adaptations. -
- -- The GLUE software framework - provides an extensible platform for implementing computational genomic - analysis of viruses in an efficient, standardised and reproducible way. - GLUE projects can not only incorporate all of the data items typically used in - comparative genomic analysis - (e.g. sequences, alignments, genome feature annotations) but can also represent the complex - semantic links between these data items via a relational database. - This 'poises' sequences and associated data for application in computational - analysis, minimising the requirement for labour-intensive pre-processing of datasets. -
- -- GLUE projects are equally suited for carrying out exploratory work - (e.g. using virus genome data to investigate structural and functional properties of viruses) - as they are for implementing operational procedures (e.g. producing - standardised reports - in a public or animal health setting). -
- - -- Hosting of GLUE projects in an online version control system (e.g. GitHub) provides - a mechanism for their stable, collaborative development, as shown below. -
- - - - --
- What is a GLUE project? -
-- -
- GLUE is an open, integrated - software toolkit that provides functionality for storage and interpretation of - sequence data. It supports the development of “projects” containing the data items - required for comparative genomic analysis - (e.g. sequences, multiple sequence alignments, genome feature annotations, - and other sequence-associated data). -
- - -- -
- - -
- Projects are loaded into the GLUE "engine", creating a relational database - that represents the semantic relationships between data items. - This provides a robust foundation for the implementation of systematic - comparative analyses and the development of sequence-based resources. - The database schema can be extended to accommodate the idiosyncrasies of different projects. - GLUE provides a scripting layer (based on JavaScript) - for developing custom analysis tools. -
- - - - - -- -
- - - -
- Some examples of 'sequence-based resources' built for viruses using GLUE include: - -
- -- - -
- -
-
-
-
- - COV-GLUE: - A GLUE resource for tracking genetic variation in SARS-COV2. - CoV-GLUE contains a database of amino acid replacements, insertions and - deletions which have been observed in GISAID hCoV-19 sequences sampled from the pandemic - -
- - RABV-GLUE: - Tailored toward epidemiological tracking of rabies virus (RABV). - Includes a database of RABV sequences and metadata from NCBI, updated daily and arranged into major and minor clades, and - an analysis tool providing genotyping, analysis and visualisation of submitted FASTA sequences. - -
- - HCV-GLUE: - This GLUE resource aims to support analysis of drug resistance and vaccine - escape in hepatitis C virus (HCV). - A database of HCV sequences and metadata from NCBI, updated daily and arranged - into clades (genotypes, subtypes). As well as pre-built multiple-sequence - alignments of NCBI sequences, it includes an analysis tool providing genotyping, - drug resistance analysis and visualisation of submitted FASTA sequences. - - -
- -
- -
-
- What does building the Flavivirid-GLUE project offer? -
-- - - -
- - Flavivirid-GLUE contains aligned, annotated reference genome sequences for all - flavivirid species and - endogenous viral elements (EVEs) derived from flavivirids. - - It offers a number of advantages for performing comparative sequence - analysis of flavivirids: - - -
-
-
-
- Reproducibility. - For many reasons, bioinformatics analyses are notoriously difficult to reproduce. - The GLUE framework supports the implementation of fully reproducible - comparative genomics through the introduction of data standards and the use - of a relational database to capture the semantic links between data items. - -
- Reusable data objects and analysis logic. - For many - if not most - comparative genomic analyses, data preparation is nine - tenths of the battle. The GLUE framework has been designed to ensure that - work spent preparing high-value data items such as multiple sequence alignments - need only be performed once. Hosting of GLUE projects in an online version control - system such as GitHub allows for collaborative management of important data items - and community testing of hypotheses. - -
- Validation. - Building GLUE projects entails mapping the semantic links between data items - (e.g. sequences, tabular data, multiple sequence alignments). - This process provides an opportunity - for cross-validation, and thereby enforces a high level of data integrity. - -
- Standardisation of the genomic co-ordinate space. GLUE - projects allow all sequences to utilise the coordinate space of a chosen - reference sequence. Contingencies associated with insertions and deletions - (indels) are handled in a systematic way. - -
- Predefined, fully annotated reference sequences: - This project includes fully-annotated reference sequences for major lineages - within the Hepadnaviridae family. - -
- Alignment trees: - GLUE allows linking of alignments constructed at distinct taxonomic levels - via an ""alignment tree" - data structure. In the alignment tree, each alignment - is constrained to a standard reference sequence, thus all multiple sequence - alignments are linked to one another via a standardised coordinate system. - - -
- -
- - -
- -
- -
- -
-
- GLUE project -
-- - - -
- - On computers with GLUE installed, - the Flavivirid-GLUE project can be instantiated by navigating to the project folder, - initiating GLUE, and issuing the following command in the GLUE shell: - -
- Mode path: /
- GLUE> run file buildCompleteProject.glue
-
-
-
-
- -
- Contributors -
-- -
- Robert J. Gifford (robert.gifford@glasgow.ac.uk) -
-- Rhys Parry -
-- Connor Bamford -
-- William Marciel de Souza -
- - --
- Related Publications -
-- - - -
-
-
- Bamford CGG, de Souza WM, Parry R and RJ Gifford
- (2021)
-
- Comparative analysis of genome-encoded viral sequences reveals the evolutionary history of the Flaviviridae.
-
- [preprint]
-
-
-
- Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford
- (2018)
-
- GLUE: A flexible software system for virus sequence data.
-
- BMC Bioinformatics
- [view]
-
-
-
- Zhu H, Dennis T, Hughes J, and RJ Gifford
- (2018)
-
- Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database.
- [preprint]
-
-
-
-
-
- License -
-- -
- This project is licensed under the GNU Affero General Public License v. 3.0. -
- --
- - - - - -