+ Comparative genomic analysis of flavivirids using GLUE
+
+
+
+
+ This is Flavivirid-GLUE, a
+ GLUE
+ project for the
+ flavivirids
+ (family Flaviviridae).
+
+
+
+ The Flaviviridae comprise enveloped, positive-strand RNA viruses,
+ many of which pose serious risks to human health on a global scale.
+ Arthropod-borne flaviviruses such as Zika virus (ZIKV),
+ Dengue virus (DENV), and
+ yellow fever virus (YFV)
+ are the causative agents of large-scale outbreaks that result
+ in millions of human infections every year, while the bloodborne hepatitis C virus
+ (HCV)
+ is a major cause of chronic liver disease.
+
+
+
+
+
+
+
+
+
+
+
+ Projected urbanisation in 2027 (from The Economist magazine).
+ Urbanisation is often associated with the emergence and spread of mosquito-borne diseases
+ by creating favourable conditions for the survival of mosquito vector species.
+ Genome data can directly inform efforts to control diseases caused by mosquito-borne flaviviruses.
+
+
+
+
+
+
+
+
+
+ Since the emergence of the SARS-COV2 pandemic, many have become familiar with
+ the use of virus genome data to track the spread and evolution of pathogenic viruses
+ - e.g. via tools such as NextStrain.
+ However, it is less widely appreciated that the same kinds of data sets and comparative genomic approaches
+ can also be used to explore the structural and functional basis of virus adaptations.
+
+
+
+ The GLUE software framework
+ provides an extensible platform for implementing computational genomic
+ analysis of viruses in an efficient, standardised and reproducible way.
+ GLUE projects can not only incorporate all of the data items typically used in
+ comparative genomic analysis
+ (e.g. sequences, alignments, genome feature annotations) but can also represent the complex
+ semantic links between these data items via a relational database.
+ This 'poises' sequences and associated data for application in computational
+ analysis, minimising the requirement for labour-intensive pre-processing of datasets.
+
+
+
+ GLUE projects are equally suited for carrying out exploratory work
+ (e.g. using virus genome data to investigate structural and functional properties of viruses)
+ as they are for implementing operational procedures (e.g. producing
+ standardised reports
+ in a public or animal health setting).
+
+
+
+
+ Hosting of GLUE projects in an online version control system (e.g. GitHub) provides
+ a mechanism for their stable, collaborative development, as shown below.
+
+
+
+
+
+
+
+ What is a GLUE project?
+
+
+
+
+ GLUE is an open, integrated
+ software toolkit that provides functionality for storage and interpretation of
+ sequence data. It supports the development of “projects” containing the data items
+ required for comparative genomic analysis
+ (e.g. sequences, multiple sequence alignments, genome feature annotations,
+ and other sequence-associated data).
+
+
+
+
+
+
+
+
+
+ Projects are loaded into the GLUE "engine", creating a relational database
+ that represents the semantic relationships between data items.
+ This provides a robust foundation for the implementation of systematic
+ comparative analyses and the development of sequence-based resources.
+ The database schema can be extended to accommodate the idiosyncrasies of different projects.
+ GLUE provides a scripting layer (based on JavaScript)
+ for developing custom analysis tools.
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Some examples of 'sequence-based resources' built for viruses using GLUE include:
+
+
+
+
+
+
+
+
+
+
+
+ COV-GLUE:
+ A GLUE resource for tracking genetic variation in SARS-COV2.
+ CoV-GLUE contains a database of amino acid replacements, insertions and
+ deletions which have been observed in GISAID hCoV-19 sequences sampled from the pandemic
+
+
+
+
+ RABV-GLUE:
+ Tailored toward epidemiological tracking of rabies virus (RABV).
+ Includes a database of RABV sequences and metadata from NCBI, updated daily and arranged into major and minor clades, and
+ an analysis tool providing genotyping, analysis and visualisation of submitted FASTA sequences.
+
+
+
+
+ HCV-GLUE:
+ This GLUE resource aims to support analysis of drug resistance and vaccine
+ escape in hepatitis C virus (HCV).
+ A database of HCV sequences and metadata from NCBI, updated daily and arranged
+ into clades (genotypes, subtypes). As well as pre-built multiple-sequence
+ alignments of NCBI sequences, it includes an analysis tool providing genotyping,
+ drug resistance analysis and visualisation of submitted FASTA sequences.
+
+
+
+
+
+
+
+
+
+
+ What does building the Flavivirid-GLUE project offer?
+
+
+
+
+
+
+
+ Flavivirid-GLUE contains aligned, annotated reference genome sequences for all
+ flavivirid species and
+ endogenous viral elements (EVEs) derived from flavivirids.
+
+ It offers a number of advantages for performing comparative sequence
+ analysis of flavivirids:
+
+
+
+
+
Reproducibility.
+ For many reasons, bioinformatics analyses are notoriously difficult to reproduce.
+ The GLUE framework supports the implementation of fully reproducible
+ comparative genomics through the introduction of data standards and the use
+ of a relational database to capture the semantic links between data items.
+
+
+
+
Reusable data objects and analysis logic.
+ For many - if not most - comparative genomic analyses, data preparation is nine
+ tenths of the battle. The GLUE framework has been designed to ensure that
+ work spent preparing high-value data items such as multiple sequence alignments
+ need only be performed once. Hosting of GLUE projects in an online version control
+ system such as GitHub allows for collaborative management of important data items
+ and community testing of hypotheses.
+
+
+
+
+
Validation.
+ Building GLUE projects entails mapping the semantic links between data items
+ (e.g. sequences, tabular data, multiple sequence alignments).
+ This process provides an opportunity
+ for cross-validation, and thereby enforces a high level of data integrity.
+
+
+
+
Standardisation of the genomic co-ordinate space. GLUE
+ projects allow all sequences to utilise the coordinate space of a chosen
+ reference sequence. Contingencies associated with insertions and deletions
+ (indels) are handled in a systematic way.
+
+
+
+
Predefined, fully annotated reference sequences:
+ This project includes fully-annotated reference sequences for major lineages
+ within the Hepadnaviridae family.
+
+
+
+
Alignment trees:
+ GLUE allows linking of alignments constructed at distinct taxonomic levels
+ via an ""alignment tree"
+ data structure. In the alignment tree, each alignment
+ is constrained to a standard reference sequence, thus all multiple sequence
+ alignments are linked to one another via a standardised coordinate system.
+
+
+
+
+
+
+
+
+
+
+
+ GLUE project
+
+
+
+
+
+
+
+ On computers with GLUE installed,
+ the Flavivirid-GLUE project can be instantiated by navigating to the project folder,
+ initiating GLUE, and issuing the following command in the GLUE shell:
+
+
+ Mode path: /
+ GLUE> run file buildCompleteProject.glue
- Since the emergence of the SARS-COV2 pandemic, many have become familiar with
- the use of virus genome data to track the spread and evolution of pathogenic viruses
- - e.g. via tools such as NextStrain.
- However, it is less widely appreciated that the same kinds of data sets and comparative genomic approaches
- can also be used to explore the structural and functional basis of virus adaptations.
-
-
-
- The GLUE software framework
- provides an extensible platform for implementing computational genomic
- analysis of viruses in an efficient, standardised and reproducible way.
- GLUE projects can not only incorporate all of the data items typically used in
- comparative genomic analysis
- (e.g. sequences, alignments, genome feature annotations) but can also represent the complex
- semantic links between these data items via a relational database.
- This 'poises' sequences and associated data for application in computational
- analysis, minimising the requirement for labour-intensive pre-processing of datasets.
-
-
-
- GLUE projects are equally suited for carrying out exploratory work
- (e.g. using virus genome data to investigate structural and functional properties of viruses)
- as they are for implementing operational procedures (e.g. producing
- standardised reports
- in a public or animal health setting).
-
-
-
-
- Hosting of GLUE projects in an online version control system (e.g. GitHub) provides
- a mechanism for their stable, collaborative development, as shown below.
-
-
-
-
-
-
-
- What is a GLUE project?
-
-
-
-
- GLUE is an open, integrated
- software toolkit that provides functionality for storage and interpretation of
- sequence data. It supports the development of “projects” containing the data items
- required for comparative genomic analysis
- (e.g. sequences, multiple sequence alignments, genome feature annotations,
- and other sequence-associated data).
-
-
-
-
-
-
-
-
-
- Projects are loaded into the GLUE "engine", creating a relational database
- that represents the semantic relationships between data items.
- This provides a robust foundation for the implementation of systematic
- comparative analyses and the development of sequence-based resources.
- The database schema can be extended to accommodate the idiosyncrasies of different projects.
- GLUE provides a scripting layer (based on JavaScript)
- for developing custom analysis tools.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Some examples of 'sequence-based resources' built for viruses using GLUE include:
-
-
-
-
-
-
-
-
-
-
-
- COV-GLUE:
- A GLUE resource for tracking genetic variation in SARS-COV2.
- CoV-GLUE contains a database of amino acid replacements, insertions and
- deletions which have been observed in GISAID hCoV-19 sequences sampled from the pandemic
-
-
-
-
- RABV-GLUE:
- Tailored toward epidemiological tracking of rabies virus (RABV).
- Includes a database of RABV sequences and metadata from NCBI, updated daily and arranged into major and minor clades, and
- an analysis tool providing genotyping, analysis and visualisation of submitted FASTA sequences.
-
-
-
-
- HCV-GLUE:
- This GLUE resource aims to support analysis of drug resistance and vaccine
- escape in hepatitis C virus (HCV).
- A database of HCV sequences and metadata from NCBI, updated daily and arranged
- into clades (genotypes, subtypes). As well as pre-built multiple-sequence
- alignments of NCBI sequences, it includes an analysis tool providing genotyping,
- drug resistance analysis and visualisation of submitted FASTA sequences.
-
-
-
-
-
-
-
-
-
-
- What does building the Flavivirid-GLUE project offer?
-
-
-
-
-
-
-
- Flavivirid-GLUE contains aligned, annotated reference genome sequences for all
- flavivirid species and
- endogenous viral elements (EVEs) derived from flavivirids.
-
- It offers a number of advantages for performing comparative sequence
- analysis of flavivirids:
-
-
-
-
Reproducibility.
- For many reasons, bioinformatics analyses are notoriously difficult to reproduce.
- The GLUE framework supports the implementation of fully reproducible
- comparative genomics through the introduction of data standards and the use
- of a relational database to capture the semantic links between data items.
-
-
-
-
Reusable data objects and analysis logic.
- For many - if not most - comparative genomic analyses, data preparation is nine
- tenths of the battle. The GLUE framework has been designed to ensure that
- work spent preparing high-value data items such as multiple sequence alignments
- need only be performed once. Hosting of GLUE projects in an online version control
- system such as GitHub allows for collaborative management of important data items
- and community testing of hypotheses.
-
-
-
-
-
Validation.
- Building GLUE projects entails mapping the semantic links between data items
- (e.g. sequences, tabular data, multiple sequence alignments).
- This process provides an opportunity
- for cross-validation, and thereby enforces a high level of data integrity.
-
-
-
-
Standardisation of the genomic co-ordinate space. GLUE
- projects allow all sequences to utilise the coordinate space of a chosen
- reference sequence. Contingencies associated with insertions and deletions
- (indels) are handled in a systematic way.
-
-
-
-
Predefined, fully annotated reference sequences:
- This project includes fully-annotated reference sequences for major lineages
- within the Hepadnaviridae family.
-
-
-
-
Alignment trees:
- GLUE allows linking of alignments constructed at distinct taxonomic levels
- via an ""alignment tree"
- data structure. In the alignment tree, each alignment
- is constrained to a standard reference sequence, thus all multiple sequence
- alignments are linked to one another via a standardised coordinate system.
-
-
-
-
-
-
-
-
-
-
-
- GLUE project
-
-
-
-
-
-
-
- On computers with GLUE installed,
- the Flavivirid-GLUE project can be instantiated by navigating to the project folder,
- initiating GLUE, and issuing the following command in the GLUE shell:
-
-
- Mode path: /
- GLUE> run file buildCompleteProject.glue