Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve junction standardisation #90

Merged
merged 2 commits into from
Jan 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 18 additions & 11 deletions src/tidytcells/junction/_standardize.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,25 +59,25 @@ def standardize(

:return:
If possible, a standardized version of the input string is returned.
If the input string cannot be standardized, the function follows the behaviour as set by ``on_fail``.
If the input string cannot be standardized, the function follows the behaviour as set by `on_fail`.
:rtype:
Union[str, None]

.. topic:: Example usage

Strings that look like junction sequences will be accepted, and returned in capitalised form.

>>> tt.junction.standardize("csadaff")
'CSADAFF'
>>> tt.junction.standardize("csadaf")
'CSADAF'

Strings that are valid amino acid sequences but do not stard and end with the appropriate residues will have a C and an F appended to its beginning and end respectively.
Strings that are valid amino acid sequences but do not stard and end with the appropriate residues will have a C and an F appended to its beginning and end as required.

>>> tt.junction.standardize("sadaf")
'CSADAFF'
>>> tt.junction.standardize("sada")
'CSADAF'

However, setting ``strict`` to ``True`` will cause these cases to be rejected.
However, setting `strict` to ``True`` will cause these cases to be rejected.

>>> result = tt.junction.standardize("sadaf", strict=True)
>>> result = tt.junction.standardize("sada", strict=True)
Input sadaf was rejected as it is not a valid junction sequence.
>>> print(result)
None
Expand All @@ -92,11 +92,11 @@ def standardize(
IF input sequence contains non-amino acid symbols:
set standardization status to failed

IF input sequence does not start with C and end with F:
IF input sequence does not start with C and end with W / F:
IF strict is set to True:
set standardization status to failed
ELSE:
add C to the beginning and F to the end of the input sequence
add C to the beginning and F to the end of the input sequence as required
set standardization status to successful
ELSE:
set standardization status to successful
Expand Down Expand Up @@ -150,10 +150,17 @@ def standardize(
logger.warning(
f"Failed to standardize {original_input}: not a valid junction sequence."
)

if on_fail == "reject":
return None

return original_input
seq = "C" + seq + "F"

if not seq.startswith("C"):
seq = "C" + seq

if not JUNCTION_MATCHING_REGEX.match(seq):
seq = seq + "F"

return seq

Expand Down
4 changes: 2 additions & 2 deletions tests/test_junction.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ def test_various_rejections(self, seq, caplog):
(
("casqyf", "CASQYF"),
("ASQY", "CASQYF"),
("CASQY", "CCASQYF"),
("ASQYF", "CASQYFF"),
("CASQY", "CASQYF"),
("ASQYF", "CASQYF"),
),
)
def test_various_corrections(self, seq, expected):
Expand Down
Loading