Skip to content

Recording Variations in AAV Atlas

Robert J. Gifford edited this page Jan 5, 2025 · 1 revision

AAV Atlas identifies and catalogs amino acid replacements in AAV sequences relative to reference sequences. This process ensures that only biologically relevant coding features are analyzed, maintaining the accuracy and specificity of the data. Below is an explanation of the core principles behind the variation analysis script used in AAV Atlas.

Overview of the Process

The script follows a structured workflow to systematically capture amino acid differences between AAV sequences and reference sequences. The key steps include:

  1. Identifying Relevant Alignments -- The script retrieves a list of tip alignments. Tip alignments are those without child alignments, representing the most granular level of sequence data.
  2. Filtering for Coding Features -- Only coding features are considered, ensuring non-coding regions are excluded. This filtering is essential as amino acid variations occur within coding regions.
  3. Mapping Reference Features -- The script maps each reference sequence to its corresponding coding features, creating a comprehensive list of amino acid locations to analyze.
  4. Processing Alignment Members -- For each alignment, the script processes individual member sequences, comparing their amino acid composition to the reference.
  5. Recording Variations -- Any differences in amino acid sequences between the alignment member and reference are cataloged. Each variation is assigned a unique identifier, ensuring it can be tracked and revisited.

Step-by-Step Breakdown

Step 1: Retrieve Tip Alignments

The script begins by identifying tip alignments. These alignments are crucial because they represent terminal nodes in the alignment hierarchy, directly reflecting raw sequence data.

getTipAlignments(tipAlignments);

The function getTipAlignments ensures that only alignments without child nodes are included, refining the pool of alignments to the most informative subset.

Step 2: Identify Coding Features

To ensure only relevant amino acid replacements are analyzed, the script retrieves coding features marked by the metatag CODES_AMINO_ACIDS.

var featuresList = glue.tableToObjects(
    glue.command(["list", "feature", "-w", "featureMetatags.name = 'CODES_AMINO_ACIDS'"])
);

Each feature is stored in a codingFeaturesMap to facilitate quick lookup.

Step 3: Map Features to References

The script iterates over each alignment and maps coding features to their respective reference sequences. This ensures that the analysis focuses on biologically significant regions.

refFeaturesMap[refseqName] = _.filter(featureLocations, function(featureLoc) {
    return codingFeaturesMap[featureLoc["feature.name"]];
});

Step 4: Identify Amino Acid Replacements

For each alignment member, the script compares amino acids at each codon position to the reference.

if (refAaObj && refAaObj.definiteAas !== memberAaObj.definiteAas) {
    // Record mismatch
}

Mismatched amino acids are flagged as replacements and cataloged for further analysis.

Step 5: Record and Classify Variations

Once replacements are identified, they are stored in a custom table. The script also calculates biochemical distances, such as Grantham and Miyata distances, to classify the nature of the amino acid change.

glue.command(["create", "custom-table-row", "aav_replacement", replacementObj.id]);
glue.command(["set", "field", "grantham_distance_double", grantham_distance_double]);

Why This Matters

Recording amino acid variations in this structured manner allows researchers to:

  • Monitor evolutionary changes in AAV sequences
  • Identify potentially significant mutations
  • Maintain a comprehensive, accessible catalog of sequence variations

By focusing on coding regions and systematically logging replacements, AAV Atlas provides a reliable tool for gene therapy and genomic research. This approach supports the development of improved AAV vectors and enhances our understanding of AAV diversity.