alignment-viewer

purpose

The purpose of this tool is to provide a simplified way to examine and explore coverage and mutations in sequence alignments

implementation

The alignment-viewer tool was built around the metagenome dataset of Lake Washington sediment microbes used in Janet Matsen's thesis dissertation that can be found here. The tool is built as a jupyter notebook to allow visual eploration of different genes and regions.

inputs

The tool looks for one directory per metagenome/reference pair.

Each data directory (base_directory/referenceID/sampleID/) must contain at least:

a .coverage.bed file containing coverage depth data along the genome
a .vcf file containing sequence variant data along the genome

Additionally, the base directory must contain two .csv reference files:

aligned_isolate_genomes.csv - contains the human-readable reference genomes and the names of the corresponding directories
fastq_sample_lookup.csv - contains metadata about the metagenomes included in the analysis

see the miscellaneous directory for examples of the .csv reference files

Details on generating these files and the directory organization can be found below, and in the documentation for the bash scripts in the scripts directory.

usage

dependencies:

The notebook depends on pybedtools, among other packages. I found some dependency conflicts on my machine between samtools and pybedtools. A .yml file with the specifications for the environment that worked for me can be found in the miscellaneous directory.

data organization:

Organized data directories were generated using the generate_align_tasklist.sh script that can be found in the scripts directory. The directories should be organized in a nested fashion as follows:

base_directory
|   aligned_isolate_genomes.csv
|   fastq_sample_lookup.csv
|
└───referenceID (directory named with reference genome ID)
    |
    └───sampleID (directory named with metagenome sample ID)
            sampleID_referenceID.coverage.bed
            sampleID_referenceID.vcf
            sampleID_referenceID.extension (other data files, such as .bam)

file generation:

Alignment output files (.coverage.bed & .vcf) were generated using the align_coverage_call.sh script that can be found in the scripts directory. This workflow runs the following steps:

uses bwa to run a burrows-wheeler alignment, outputting a .sam file
uses samtools to convert the .sam to a .bam file
uses samtools to sort the .bam file, outputting a .sorted.bam file
uses samtools to index the .sorted.bam file
uses bedtools to calculate coverage across the alignment, outputting a .coverage.bed file
uses bcftools to calculate sequence variants, outputting a .vcf file

running the notebook:

Import packages
Point the notebook to the correct base directory for the data by editing the base_directory global variable
Run the code cell to define the core functionality
Render the user input widgets and use them to identify the region of interest
Check that the .coverage.bed and .vcf files exist in the expected location for the data of interest
Use the notebook to render the visualization and explore the data of interest

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
jupyter_notebooks		jupyter_notebooks
miscellaneous		miscellaneous
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alignment-viewer

purpose

implementation

inputs

usage

About

Releases

Packages

Languages

blasks/alignment-viewer

Folders and files

Latest commit

History

Repository files navigation

alignment-viewer

purpose

implementation

inputs

usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages