Skip to content
Kim Rutherford edited this page Mar 13, 2024 · 6 revisions

We have a priority list of evidence codes. Currently:

inferred from physical interaction   # special case: only when missing with
inferred from biological aspect of ancestor
inferred from biological aspect of descendant
inferred by curator
inferred from sequence orthology
inferred from sequence or structural similarity
inferred from sequence model
traceable author statement
non-traceable author statement
inferred from expression pattern
inferred from electronic annotation

As a special case we also remove IPI without a "with".

See process function in GOFilter.pm for the up-to-date evidence code priority order.

For IEA annotations we have this priority:

GO_REF:0000116 RHEA
GO_REF:0000003 EC
GO_REF:0000041 UniPathway
GO_REF:0000002 InterPro
GO_REF:0000104 UniRule
GO_REF:0000043 UniProtKB-KW
GO_REF:0000044 UniProtKB-SubCell
GO_REF:0000117 ARBA
GO_REF:0000118 TreeGrafter

RHEA annotations have priority over EC annotations, etc.

Annotations with evidence codes at the bottom are kept in preference to those nearer the top of the list.

  • we iterate through the evidence codes above in order then:

    • for each annotation with that evidence code, delete the annotation if there is another annotation for the same gene with:

      • a more specific term
      • or same term but a different evidence code
    • if there are duplicate annotations (same term and gene) with that evidence code, delete all but one