-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882
base: main
Are you sure you want to change the base?
Convert raw strings to non-raw when fixes add escape sequences (#13294) #13882
Conversation
|
It would also be good to test triple-quoted strings whose final characters match the outer quotation marks: r"""\""""
r'''\'''' The |
I missed adding the invalid characters to the final test file.
Good points, thank you! |
My previous example was incomplete: quotation marks in triple-quoted strings need escaping when they precede two more of the same quotation mark. This can also happen in the middle of a string: r"""\"""..."""
r'''\'''...''' |
705ac51
to
d4051f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution.
I remain hesitant adding raw to regular string conversion considering how rare raw strings is. The goal with our fixes is to fix the majority of cases. It's okay if Ruff can't fix all cases. In this case, being able to fix 1-2 raw strings doesn't justify the added complexity.
My suggestion is that we simply don't provide a fix if the string is a raw-string or that we make it an unsafe fix.
&& string_content | ||
.as_bytes() | ||
.get(column + 1) | ||
.is_some_and(|c2| char::from(*c2) != c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bytes to get the next characters panics if the next character is a non-ASCII character. We should use the chars
iterator instead (they're cheap clonable)
for (column, match_) in prefix.match_indices(&['r', 'R']) { | ||
let c = match_.chars().next().unwrap(); | ||
|
||
let entire_string_range = match kind { | ||
TokenKind::String => range, | ||
_ => last_fstring_start.unwrap().range(), | ||
}; | ||
let location = entire_string_range.start() + TextSize::try_from(column).unwrap(); | ||
let range = TextRange::at(location, c.text_len()); | ||
|
||
string_conversion_edits.push(Edit::range_deletion(range)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there can always only be at most one r
or R
prefix. That's why it should not be necessaryto iterate, instead you can use find
(or is it position?) to get the position of the r
or R
character
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(), | ||
), | ||
_ => (0.into(), text.len().try_into().unwrap()), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use text_len
and return a TextRange
. Returning a text range has the advantage that you can index directly by range
let string_content = &text[range];
TextSize::try_from(text.len()).unwrap() - string_flags.quote_len(), | |
), | |
_ => (0.into(), text.len().try_into().unwrap()), | |
}; | |
text.text_len() - string_flags.quote_len(), | |
), | |
_ => TextRange::new((0.into(), text.text_len()), | |
}; |
diagnostic: Diagnostic::new(rule, range), | ||
// This is integrated with other fixes and attached to the diagnostic below. | ||
edit: Edit::range_replacement(replacement.to_string(), range), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to handle the case where the invalid character was the last character before the quotes in a triple quoted strings:
"""test"<invalid>"""
Let's say <invalid>
is the invalid character. Removing <invalid>
then results in """test""""
which is not a valid non-raw strings.
I kind of agree… imho, the fix should be marked unsafe for raw strings. I can open a new PR for that if preferred. Regardless, this was a good excuse to learn Rust better :) |
I think the rules should either offer safe fixes for raw strings or no fixes. Keeping the current fixes but marking them unsafe doesn’t seem useful, because they are incorrect. |
I agree. We should not offer fixes if we know they're incorrect. We could consider offering fixes for raw-strings if they don't contain any quotes or backslashes. |
Summary
This aims to resolve #13294 by implementing raw to non-raw string conversion, as suggested by @dscorbett. The conversion is comprehensive (as far as I'm aware), but it does introduce an unfortunate amount of complexity to the rule.
I'm new to this codebase and not a Rust expert, so my code might not be idiomatic.
Test Plan
I have added the following test cases: