Skip to content

Commit

Permalink
Merge pull request #90 from yutanagano/improve-junction-standardisation
Browse files Browse the repository at this point in the history
Improve junction standardisation
  • Loading branch information
yutanagano authored Jan 12, 2025
2 parents 4d21d09 + f91170a commit 6e13561
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 13 deletions.
29 changes: 18 additions & 11 deletions src/tidytcells/junction/_standardize.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,25 +59,25 @@ def standardize(
:return:
If possible, a standardized version of the input string is returned.
If the input string cannot be standardized, the function follows the behaviour as set by ``on_fail``.
If the input string cannot be standardized, the function follows the behaviour as set by `on_fail`.
:rtype:
Union[str, None]
.. topic:: Example usage
Strings that look like junction sequences will be accepted, and returned in capitalised form.
>>> tt.junction.standardize("csadaff")
'CSADAFF'
>>> tt.junction.standardize("csadaf")
'CSADAF'
Strings that are valid amino acid sequences but do not stard and end with the appropriate residues will have a C and an F appended to its beginning and end respectively.
Strings that are valid amino acid sequences but do not stard and end with the appropriate residues will have a C and an F appended to its beginning and end as required.
>>> tt.junction.standardize("sadaf")
'CSADAFF'
>>> tt.junction.standardize("sada")
'CSADAF'
However, setting ``strict`` to ``True`` will cause these cases to be rejected.
However, setting `strict` to ``True`` will cause these cases to be rejected.
>>> result = tt.junction.standardize("sadaf", strict=True)
>>> result = tt.junction.standardize("sada", strict=True)
Input sadaf was rejected as it is not a valid junction sequence.
>>> print(result)
None
Expand All @@ -92,11 +92,11 @@ def standardize(
IF input sequence contains non-amino acid symbols:
set standardization status to failed
IF input sequence does not start with C and end with F:
IF input sequence does not start with C and end with W / F:
IF strict is set to True:
set standardization status to failed
ELSE:
add C to the beginning and F to the end of the input sequence
add C to the beginning and F to the end of the input sequence as required
set standardization status to successful
ELSE:
set standardization status to successful
Expand Down Expand Up @@ -150,10 +150,17 @@ def standardize(
logger.warning(
f"Failed to standardize {original_input}: not a valid junction sequence."
)

if on_fail == "reject":
return None

return original_input
seq = "C" + seq + "F"

if not seq.startswith("C"):
seq = "C" + seq

if not JUNCTION_MATCHING_REGEX.match(seq):
seq = seq + "F"

return seq

Expand Down
4 changes: 2 additions & 2 deletions tests/test_junction.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ def test_various_rejections(self, seq, caplog):
(
("casqyf", "CASQYF"),
("ASQY", "CASQYF"),
("CASQY", "CCASQYF"),
("ASQYF", "CASQYFF"),
("CASQY", "CASQYF"),
("ASQYF", "CASQYF"),
),
)
def test_various_corrections(self, seq, expected):
Expand Down

0 comments on commit 6e13561

Please sign in to comment.