-
Notifications
You must be signed in to change notification settings - Fork 0
sanger sequence trim.py
Pipeline for automated trimming and QC of Sanger sequences (ab1 files from SeqStudio)
-
biopython
>=1.79
This script is intended for processing a directory of .ab1
files into Mott algorithm-trimmed FASTAs.
For general usage, the following is minimally acceptable to use default settings:
sanger-sequence-trim.py <ab1_path>
The -t/--min_trace
and -p/--min_pup
arguments can be used with -q/--filter_qc
to filter input sequences.
The -c/--concat
flag can be used to concatenate files together based on their primers.
⚠️ WARNING!: Files must use the standardized naming schemesample-name_primer-F/R_rundate_runtime.ab1
.
Check the appendices below for more information.
To use a looser value for the minimum median PUP score and trace score to allow lower quality sequences to be process:
sanger-sequence-trim.py <ab1_path> \
--filter_qc \
--min_pup 15 \
--min_trace 25
usage: sanger-sequence-trim.py [-h] [-o OUTPUT_PATH] [-c] [-s] [-q] [-t MIN_TRACE] [-p MIN_PUP] ab1_path
Process ab1 files and get Mott algorithm-trimmed sequences
positional arguments:
ab1_path Path to ab1 sequence files.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_PATH, --output OUTPUT_PATH
Output path for metadata file and fastas.
-c, --concat Flag to concatenate by primers.
-s, --oneline Flag to output each fasta entry onto one line.
qc_options:
-q, --filter_qc Flag to filter by trace score and median PUP score.
-t MIN_TRACE, --min_trace MIN_TRACE
Minimum trace score for a sequence to be passed.
-p MIN_PUP, --min_pup MIN_PUP
Minimum median PUP score for a sequence to be passed.
V1.0.0
Should we have some kind of footer?
General Guides
- Nomenclature: Data Files; Scripts
- Script Writing
- Organizing Data and Analysis
- Working in GitHub
Useful Links