Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

ThatsJustCheesy · 2024-10-23T04:37:01Z

Summary

This aims to resolve #13294 by implementing raw to non-raw string conversion, as suggested by @dscorbett. The conversion is comprehensive (as far as I'm aware), but it does introduce an unfortunate amount of complexity to the rule.

I'm new to this codebase and not a Rust expert, so my code might not be idiomatic.

Test Plan

I have added the following test cases:

raw_single_singlequote = r'\ \' " �'
raw_triple_singlequote = r'''\ ' " �'''
raw_single_doublequote = r"\ ' \" �"
raw_triple_doublequote = r"""\ ' " �"""
raw_single_singlequote_multiline = r'\' \
" \
'
raw_triple_singlequote_multiline = r'''' \
" \
�'''
raw_single_doublequote_multiline = r"' \
\" \
�"
raw_triple_doublequote_multiline = r"""' \
" \
�"""
raw_nested_fstrings = rf'\ {rf'\ {rf'\' '}'}'

…l-sh#13294)

github-actions · 2024-10-23T04:55:46Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dscorbett · 2024-10-23T17:28:45Z

It would also be good to test triple-quoted strings whose final characters match the outer quotation marks:

r"""\""""
r'''\''''

The raw_nested_fstrings tests don’t include any of the relevant control characters so they are not effective tests for these rules’ fixes.

I missed adding the invalid characters to the final test file.

…ings

ThatsJustCheesy · 2024-10-23T18:23:57Z

Good points, thank you!

dscorbett · 2024-10-23T20:40:31Z

My previous example was incomplete: quotation marks in triple-quoted strings need escaping when they precede two more of the same quotation mark. This can also happen in the middle of a string:

r"""\"""..."""
r'''\'''...'''

MichaReiser

Thanks for this contribution.

I remain hesitant adding raw to regular string conversion considering how rare raw strings is. The goal with our fixes is to fix the majority of cases. It's okay if Ruff can't fix all cases. In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.

My suggestion is that we simply don't provide a fix if the string is a raw-string or that we make it an unsafe fix.

MichaReiser · 2024-10-24T08:24:05Z

crates/ruff_linter/src/rules/pylint/rules/invalid_string_characters.rs

+                        && string_content
+                            .as_bytes()
+                            .get(column + 1)
+                            .is_some_and(|c2| char::from(*c2) != c)


Using bytes to get the next characters panics if the next character is a non-ASCII character. We should use the chars iterator instead (they're cheap clonable)

MichaReiser · 2024-10-24T08:39:29Z

crates/ruff_linter/src/rules/pylint/rules/invalid_string_characters.rs

+        for (column, match_) in prefix.match_indices(&['r', 'R']) {
+            let c = match_.chars().next().unwrap();
+
+            let entire_string_range = match kind {
+                TokenKind::String => range,
+                _ => last_fstring_start.unwrap().range(),
+            };
+            let location = entire_string_range.start() + TextSize::try_from(column).unwrap();
+            let range = TextRange::at(location, c.text_len());
+
+            string_conversion_edits.push(Edit::range_deletion(range));
+        }


Nit: there can always only be at most one r or R prefix. That's why it should not be necessaryto iterate, instead you can use find (or is it position?) to get the position of the r or R character

MichaReiser · 2024-10-24T08:40:08Z

crates/ruff_linter/src/rules/pylint/rules/invalid_string_characters.rs

+                TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(),
+            ),
+            _ => (0.into(), text.len().try_into().unwrap()),
+        };


You can use text_len and return a TextRange. Returning a text range has the advantage that you can index directly by range

let string_content = &text[range];

Suggested change

TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(),

),

_ => (0.into(), text.len().try_into().unwrap()),

};

text.text_len() - string_flags.quote_len(),

),

_ => TextRange::new((0.into(), text.text_len()),

};

MichaReiser · 2024-10-24T08:56:48Z

crates/ruff_linter/src/rules/pylint/rules/invalid_string_characters.rs

+            diagnostic: Diagnostic::new(rule, range),
+            // This is integrated with other fixes and attached to the diagnostic below.
+            edit: Edit::range_replacement(replacement.to_string(), range),
+        });


We also need to handle the case where the invalid character was the last character before the quotes in a triple quoted strings:

"""test"<invalid>"""

Let's say <invalid> is the invalid character. Removing <invalid> then results in """test"""" which is not a valid non-raw strings.

ThatsJustCheesy · 2024-10-24T13:32:14Z

In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.

I kind of agree… imho, the fix should be marked unsafe for raw strings. I can open a new PR for that if preferred.

Regardless, this was a good excuse to learn Rust better :)

dscorbett · 2024-10-24T13:38:08Z

I think the rules should either offer safe fixes for raw strings or no fixes. Keeping the current fixes but marking them unsafe doesn’t seem useful, because they are incorrect.

MichaReiser · 2024-10-24T13:43:04Z

I agree. We should not offer fixes if we know they're incorrect. We could consider offering fixes for raw-strings if they don't contain any quotes or backslashes.

Convert raw strings to non-raw when fixes add escape sequences (astra…

5696f1e

…l-sh#13294)

ThatsJustCheesy requested review from MichaReiser and dhruvmanila as code owners October 23, 2024 04:37

ThatsJustCheesy added 2 commits October 23, 2024 14:21

Invalid characters fix: Correct raw fstrings test case

17bd63a

I missed adding the invalid characters to the final test file.

Invalid characters fix: escape quotes at the end of triple-quoted str…

30ae80c

…ings

Invalid characters fix: escape quote triplets in triple-quoted strings

d4051f1

ThatsJustCheesy force-pushed the string-escape-seqs-unraw branch from 705ac51 to d4051f1 Compare October 23, 2024 21:37

MichaReiser reviewed Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

ThatsJustCheesy commented Oct 23, 2024

github-actions bot commented Oct 23, 2024 •

edited

Loading

dscorbett commented Oct 23, 2024 •

edited

Loading

ThatsJustCheesy commented Oct 23, 2024

dscorbett commented Oct 23, 2024

MichaReiser left a comment

MichaReiser Oct 24, 2024

MichaReiser Oct 24, 2024

MichaReiser Oct 24, 2024

MichaReiser Oct 24, 2024

ThatsJustCheesy commented Oct 24, 2024

dscorbett commented Oct 24, 2024

MichaReiser commented Oct 24, 2024

Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

Are you sure you want to change the base?

Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

Conversation

ThatsJustCheesy commented Oct 23, 2024

Summary

Test Plan

github-actions bot commented Oct 23, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

dscorbett commented Oct 23, 2024 • edited Loading

ThatsJustCheesy commented Oct 23, 2024

dscorbett commented Oct 23, 2024

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Oct 24, 2024

Choose a reason for hiding this comment

MichaReiser Oct 24, 2024

Choose a reason for hiding this comment

MichaReiser Oct 24, 2024

Choose a reason for hiding this comment

MichaReiser Oct 24, 2024

Choose a reason for hiding this comment

ThatsJustCheesy commented Oct 24, 2024

dscorbett commented Oct 24, 2024

MichaReiser commented Oct 24, 2024

github-actions bot commented Oct 23, 2024 •

edited

Loading

`ruff-ecosystem` results

dscorbett commented Oct 23, 2024 •

edited

Loading