Pancancer_Microbial_Pathways

This repository provides access to the predicted of microbial pathways of non-human contigs assembled from cancer sequence data.

HTML version with an interactive table is available https://UEA-Cancer-Genetics-Lab.github.io/Pancancer_Microbial_Pathways/

Methods

SEPATH was employed on approximately 10,000 cancer WGS samples from Genomics England 100,000 Genomes Project.

Cancer types include: bladder, breast, childhood, endocrine, endometrial, adult glioma, haematological, hepatopancreatobiliary, lung, melanoma, nasopharyngeal, oral/oropharyngeal, other, ovarian, prostate, renal, sarcoma, sinonasal, testicular, unknown, uppergi.

The resulting non-human reads were pooled by cancer type subject to metagenomic assembly using MEGAHIT. Colorectal cancer has been exluded from the analysis due to technical difficulties and limitations in assembling the data.

Taxonomic classifications were achieved with diamond using the NCBI non-redundant protein database. All contigs with a mammalian genera were removed and all contaminant contigs according to Salter et al.. Prokka was then used to predict proteins from the assembled contigs. The putative proteins were subject to pathway prediction using InterProScan using multiple databases. The number of pathway 'hits' (the number of times a particular function matched a certain protein across all databses) was summed for all cancer types to form the tsv file in the data directory.

Caveats

There are some caveats to this analysis that should be considered. First and foremost the contigs are highly likely to contain sequencing contaminants. On the other hand, biologically informative contigs may have been removed when removing common sequencing contaminants. The number of pathway hits is not necessarily indicative of the amount a function is being carried out and likewise a number of pathway 'hits' is not definitive evidence for the existence of a metabolic pathway in a sample. The data presented in this repository should be considered only for hypothesis generation purposes.

Acknowledgements

This research was made possible through access to the data and findings generated by the 100,000 Genomes Project; http://www.genomicsengland.co.uk.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
.DS_Store		.DS_Store
README.md		README.md
index.Rmd		index.Rmd
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pancancer_Microbial_Pathways

Methods

Caveats

Acknowledgements

About

Releases

Packages

Languages

UEA-Cancer-Genetics-Lab/Pancancer_Microbial_Pathways

Folders and files

Latest commit

History

Repository files navigation

Pancancer_Microbial_Pathways

Methods

Caveats

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages