Skip to content
Steve Bond edited this page Sep 24, 2015 · 5 revisions

BuddySuite Module Section

The core Buddy classes (SeqBuddy(), AlignBuddy(), PhyloBuddy(), and DatabaseBuddy()) handle all preprocessing of input data, including format detection, parsing, and organizing into useful data structures; they also contain a number of methods (see below) for transforming and outputting the data. Objects instantiated from these classes are the standardized input/output for all functions in the Suite, which allows 'daisy-chaining' (i.e., data can be passed easily from function to function). File handles, file paths, stdin, lists of the base data type (e.g., SeqRecords, alignments, or trees), or even plain text can all be used as input for the Buddy classes. Examples are only provided for SeqBuddy below, but the same syntax can be used for all Buddy classes.

SeqBuddy

sb_obj = SeqBuddy(_input, _in_format=None, _out_format=None, _alpha=None)
Attribute Description
alpha An IUPAC object from Bio.Alphabet (one of: IUPAC.protein, IUPAC.ambiguous_dna, or IUPAC.ambiguous_rna). Plain text representatives of each (e.g., 'dna', 'prot', 'protein', 'r') will be understood by the SeqBuddy _init_() method, or the alphabet will be guessed if not explicitly set.
in_format The flat file format sequences are read from. If explicitly set, SeqBuddy will only attempt to read the file in the given format (returning no sequences if the wrong format is specified), otherwise it will guess the format.
out_format Controls the format used when SeqBuddy objects are written. By default this will be the same as in_format.
records The list of Bio.SeqRecord objects derived from your input data.
Method Description
to_dict() Return a dictionary of everything in self.records using SeqRecord.id as keys
print() Write all records to stdout using out_format
write(_file_path) Write all records to file using out_format
examples:
# File path
sb_obj = SeqBuddy('/path/to/seq_file.gb')
# File handle
with open('seq_file.fa', 'r') as ifile:
    sb_obj = SeqBuddy(ifile)
# Plain text 
sb_obj1 = SeqBuddy("ATGTCGCTGATGCTAGCTAGATAGCT", 'raw')
sb_obj2 = SeqBuddy('''\
>cds1
ATGCGCTTAGTCGTAGCTGATCGT
>cds2
ATGCCGCTCGCTCGCTAGCTGCTG
''')
# List of biopython SeqIO records
with open('seq_file.fa', 'r') as ifile:
    seq_recs = list(Bio.SeqIO.parse(ifile, "fasta"))
    sb_obj = SeqBuddy(seq_recs)

AlignBuddy

alb_obj = AlignBuddy(_input, _in_format=None, _out_format=None)
Attribute Description
alpha An IUPAC object from Bio.Alphabet, same as in SeqBuddy. The constructor does not accept this explicitly, it is guessed from the sequences in the alignment(s).
in_format The flat file format sequences are read from. If explicitly set, AlignBuddy will only attempt to read the file in the given format (returning no alignments if the wrong format is specified), otherwise it will guess the format.
out_format Controls the format used when AlignBuddy objects are written. By default, this will be the same as in_format.
alignments A list of Bio.Align objects.
Method Description
print() Write all alignments to stdout using out_format
write(_file_path) Write all alignments to file using out_format

PhyloBuddy

pb_obj = PhyloBuddy(_input, _in_format=None, _out_format=None)
Attribute Description
in_format The format of incoming trees (Newick, NEXUS, or NeXML). If explicitly set, PhyloBuddy will only attempt to read the file in the given format (returning no trees if the wrong format is specified), otherwise it will guess the format.
out_format Controls the format used when PhyloBuddy objects are written. By default, this will be the same as in_format.
trees A list of dendropy.datamodel.treemodel.Tree objects.
Method Description
print() Write all trees to stdout using out_format
write(_file_path) Write all trees to file using out_format

DbBuddy

dbb_obj = DbBuddy(_input, _database=None, _out_format="summary")
Attribute Description
databases A list of databases that DbBuddy will query. Valid options include "all", "uniprot", "ensemble", "ncbi".
failures A dictionary of Failure objects, with the key being the hash of the failure. The Failure class is used to track issues encountered while communicating with the public databases.
out_format Controls the format used when DbBuddy objects are written. Valid formats include "summary", "full_summary", "ids", "accessions", and any of the supported BioPython SeqIO formats.
records Dictionary of Record objects (careful, there are not BioPython records!), using accession numbers as keys.
search_terms List of search terms used to query public databases
server_clients List of server client objects
trash_bin Also a dictionary of records, but which have been filtered out of the main records dict (i.e., will not be output by print())
Method Description
filter_records(regex, mode) Move data between 'records' and 'trash_bin' based on 'regex'. Valid 'mode's are 'keep' and 'remove' to move records to the trash_bin, and 'restore' to bring records back from the trash_bin.
record_breakdown() Return a dictionary with counts for 'full', 'partial', and 'accession' only records.
server(_server) Instantiate a server client object and append to the 'server_clients' list. Valid '_server' values are 'ensembl', ncbi', and 'uniprot'
trash_breakdown() Same as 'record_breakdown()', except that the data comes from the trash_bin
print(_num=0, quiet=False, columns=None, destination=None, group='records') Write 'group' ('records' or 'trash_bin') to stdout or a path (set with _destination). The number of records and the columns displayed can be set with their respective arguments.

Main Toolkit Pages





Further Reading

Clone this wiki locally