Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ThatsJustCheesy
Copy link

Summary

This aims to resolve #13294 by implementing raw to non-raw string conversion, as suggested by @dscorbett. The conversion is comprehensive (as far as I'm aware), but it does introduce an unfortunate amount of complexity to the rule.

I'm new to this codebase and not a Rust expert, so my code might not be idiomatic.

Test Plan

I have added the following test cases:

raw_single_singlequote = r'\ \' " �'
raw_triple_singlequote = r'''\ ' " �'''
raw_single_doublequote = r"\ ' \" �"
raw_triple_doublequote = r"""\ ' " �"""
raw_single_singlequote_multiline = r'\' \
" \
​'
raw_triple_singlequote_multiline = r'''' \
" \
�'''
raw_single_doublequote_multiline = r"' \
\" \
�"
raw_triple_doublequote_multiline = r"""' \
" \
�"""
raw_nested_fstrings = rf'\ {rf'\ {rf'\' '}'}'

Copy link
Contributor

github-actions bot commented Oct 23, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dscorbett
Copy link

dscorbett commented Oct 23, 2024

It would also be good to test triple-quoted strings whose final characters match the outer quotation marks:

r"""​\""""
r'''​\''''

The raw_nested_fstrings tests don’t include any of the relevant control characters so they are not effective tests for these rules’ fixes.

@ThatsJustCheesy
Copy link
Author

Good points, thank you!

@dscorbett
Copy link

My previous example was incomplete: quotation marks in triple-quoted strings need escaping when they precede two more of the same quotation mark. This can also happen in the middle of a string:

r"""​\"""..."""
r'''​\'''...'''

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution.

I remain hesitant adding raw to regular string conversion considering how rare raw strings is. The goal with our fixes is to fix the majority of cases. It's okay if Ruff can't fix all cases. In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.

My suggestion is that we simply don't provide a fix if the string is a raw-string or that we make it an unsafe fix.

Comment on lines +273 to +276
&& string_content
.as_bytes()
.get(column + 1)
.is_some_and(|c2| char::from(*c2) != c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bytes to get the next characters panics if the next character is a non-ASCII character. We should use the chars iterator instead (they're cheap clonable)

Comment on lines +241 to +252
for (column, match_) in prefix.match_indices(&['r', 'R']) {
let c = match_.chars().next().unwrap();

let entire_string_range = match kind {
TokenKind::String => range,
_ => last_fstring_start.unwrap().range(),
};
let location = entire_string_range.start() + TextSize::try_from(column).unwrap();
let range = TextRange::at(location, c.text_len());

string_conversion_edits.push(Edit::range_deletion(range));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there can always only be at most one r or R prefix. That's why it should not be necessaryto iterate, instead you can use find (or is it position?) to get the position of the r or R character

Comment on lines +258 to +261
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(),
),
_ => (0.into(), text.len().try_into().unwrap()),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use text_len and return a TextRange. Returning a text range has the advantage that you can index directly by range

let string_content = &text[range];

Suggested change
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(),
),
_ => (0.into(), text.len().try_into().unwrap()),
};
text.text_len() - string_flags.quote_len(),
),
_ => TextRange::new((0.into(), text.text_len()),
};

Comment on lines +223 to +226
diagnostic: Diagnostic::new(rule, range),
// This is integrated with other fixes and attached to the diagnostic below.
edit: Edit::range_replacement(replacement.to_string(), range),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to handle the case where the invalid character was the last character before the quotes in a triple quoted strings:

"""test"<invalid>"""

Let's say <invalid> is the invalid character. Removing <invalid> then results in """test"""" which is not a valid non-raw strings.

@ThatsJustCheesy
Copy link
Author

In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.

I kind of agree… imho, the fix should be marked unsafe for raw strings. I can open a new PR for that if preferred.

Regardless, this was a good excuse to learn Rust better :)

@dscorbett
Copy link

I think the rules should either offer safe fixes for raw strings or no fixes. Keeping the current fixes but marking them unsafe doesn’t seem useful, because they are incorrect.

@MichaReiser
Copy link
Member

I agree. We should not offer fixes if we know they're incorrect. We could consider offering fixes for raw-strings if they don't contain any quotes or backslashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PLE2510, PLE2512, PLE2513, PLE2514, and PLE2515 fixes add escape sequences to raw strings
3 participants