Skip to content

UEA-Cancer-Genetics-Lab/Pancancer_Microbial_Pathways

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pancancer_Microbial_Pathways

This repository provides access to the predicted of microbial pathways of non-human contigs assembled from cancer sequence data.

HTML version with an interactive table is available https://UEA-Cancer-Genetics-Lab.github.io/Pancancer_Microbial_Pathways/

Methods

SEPATH was employed on approximately 10,000 cancer WGS samples from Genomics England 100,000 Genomes Project.

Cancer types include: bladder, breast, childhood, endocrine, endometrial, adult glioma, haematological, hepatopancreatobiliary, lung, melanoma, nasopharyngeal, oral/oropharyngeal, other, ovarian, prostate, renal, sarcoma, sinonasal, testicular, unknown, uppergi.

The resulting non-human reads were pooled by cancer type subject to metagenomic assembly using MEGAHIT. Colorectal cancer has been exluded from the analysis due to technical difficulties and limitations in assembling the data.

Taxonomic classifications were achieved with diamond using the NCBI non-redundant protein database. All contigs with a mammalian genera were removed and all contaminant contigs according to Salter et al.. Prokka was then used to predict proteins from the assembled contigs. The putative proteins were subject to pathway prediction using InterProScan using multiple databases. The number of pathway 'hits' (the number of times a particular function matched a certain protein across all databses) was summed for all cancer types to form the tsv file in the data directory.

Caveats

There are some caveats to this analysis that should be considered. First and foremost the contigs are highly likely to contain sequencing contaminants. On the other hand, biologically informative contigs may have been removed when removing common sequencing contaminants. The number of pathway hits is not necessarily indicative of the amount a function is being carried out and likewise a number of pathway 'hits' is not definitive evidence for the existence of a metabolic pathway in a sample. The data presented in this repository should be considered only for hypothesis generation purposes.

Acknowledgements

This research was made possible through access to the data and findings generated by the 100,000 Genomes Project; http://www.genomicsengland.co.uk.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages