GitHub - JappyPing/strain_identify: repo for strain identification project

This program contains code for identifying within-host diversity.

The required python3 packages are: collections, copy, pandas, numpy, scipy.sparse, typing, argparse, statistics

Steps to run the program:

obtain the read files, preferably in fastq format
obtain the reference sequence in fasta format 2.1. (optional) Correcting FASTQ reads, see folder ErrorCorrection
run the strain identification process with /path/to/find_sub.sh -r reference.fa -1 read_1.fastq -2 read_2.fastq -m tog if the read files are in fasta format, /path/to/find_sub.sh -r reference.fa -1 read_1.fasta -2 read_2.fasta -m tog -f if the read file is single-ended: find_sub.sh -r reference.fa -0 read_file.fastq

The output consists of the nucleotide sequences of detected strains, named in the format "final_strain_x_reference.fa", x is the numerical label of strains, and "subbed_read_x.fa", a set of reads that belong to this strains and are different from the reference sequence.

3.1. (optional) step of verification. Start by obtaining relevant samples. After that, run verify.py N original_reference.fa -p(paired end reads, for single end use -s) sample1_r1.fastq sample1_r2.fastq sample2_r1.fastq sample2_r2.fastq. N is the numerical labelling of strain from step 3.

optional step for inferring synonymous state. It determines the changes in the nucleotide sequences are synonymous or non-synonymous. python synonymous_stat.py original_reference_sequence.fa subbed_read_N.sam translation_code.txt protein_pos.txt transation_code.txt is the translation table from nucleotide bases to amino acid bases. protein_pos.txt is the position of proteins, in the form of protein_name:start..end

protein position file, with protein start and end positions in the format protein:start1..end1, example in protein_pos.txt

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
ErrorCorrection		ErrorCorrection
bowtie2-2.4.4-linux-x86_64		bowtie2-2.4.4-linux-x86_64
README.md		README.md
build_matrix.py		build_matrix.py
combine_align.sh		combine_align.sh
find_sub.sh		find_sub.sh
get_ori_half.py		get_ori_half.py
identify_strain.py		identify_strain.py
protein_pos.txt		protein_pos.txt
strain_finder.py		strain_finder.py
synonymous_stat.py		synonymous_stat.py
translation_table.txt		translation_table.txt
verify.py		verify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

JappyPing/strain_identify

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages