Skip to content

sanger sequence trim.py

Erick Samera edited this page Mar 10, 2023 · 2 revisions

Pipeline for automated trimming and QC of Sanger sequences (ab1 files from SeqStudio)

Table of Contents

Requirements

  • biopython>=1.79

General usage

This script is intended for processing a directory of .ab1 files into Mott algorithm-trimmed FASTAs.

For general usage, the following is minimally acceptable to use default settings:

sanger-sequence-trim.py <ab1_path>

The -t/--min_trace and -p/--min_pup arguments can be used with -q/--filter_qc to filter input sequences.

The -c/--concat flag can be used to concatenate files together based on their primers.

⚠️ WARNING!: Files must use the standardized naming scheme sample-name_primer-F/R_rundate_runtime.ab1.

Check the appendices below for more information.

Examples

To use a looser value for the minimum median PUP score and trace score to allow lower quality sequences to be process:

sanger-sequence-trim.py <ab1_path> \
    --filter_qc \ 
    --min_pup 15 \
    --min_trace 25

Appendices

Appendix A: sanger-sequence-trim.py arguments

usage: sanger-sequence-trim.py [-h] [-o OUTPUT_PATH] [-c] [-s] [-q] [-t MIN_TRACE] [-p MIN_PUP] ab1_path

Process ab1 files and get Mott algorithm-trimmed sequences

positional arguments:
  ab1_path              Path to ab1 sequence files.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_PATH, --output OUTPUT_PATH
                        Output path for metadata file and fastas.
  -c, --concat          Flag to concatenate by primers.
  -s, --oneline         Flag to output each fasta entry onto one line.

qc_options:
  -q, --filter_qc       Flag to filter by trace score and median PUP score.
  -t MIN_TRACE, --min_trace MIN_TRACE
                        Minimum trace score for a sequence to be passed.
  -p MIN_PUP, --min_pup MIN_PUP
                        Minimum median PUP score for a sequence to be passed.

V1.0.0