Don't normalize strings in the CLI #127

DavisVaughan · 2025-01-03T21:25:12Z

Closes #90
Closes #123

Follow up to #78

The CLI now never normalizes line endings, allowing --check to work correctly, and allowing us to take advantage of an optimization where we detect that no changes occurred during the formatting
The LSP continues to always normalizes line endings to Unix endings. Since everything is Unix line endings there, we could add the optimization there too (and pre-emptively return None in the response to the format request if we detect no changes, rather than running an expensive diff algorithm)

As mentioned in #90, our parser and Biome's formatter are happy with alternative line endings when they appear in the trivia, but if non-unix line endings appear in a token then the formatter panics. This happens in multiline strings, so we now normalize strings efficiently using a Cow::Borrowed when nothing changed.

The way this all flows through biome is (as implemented by rome/tools#1672):

We send a parse tree with CRLF line endings into the formatter
The formatter normalizes all trivia to Unix line endings
We normalize all tokens to Unix line endings (only multiline strings have this issue)
At Print time, the formatter turns a Unix line ending into the user requested LineEnding

The last step there is why the Printer doesn't allow any non-Unix line endings internally. It really just looks for \n at print time to decide when to apply LineEnding, so \r\n would make it behave incorrectly.

lionel- · 2025-01-07T16:22:28Z

crates/air_r_formatter/src/string_literal.rs

+    }
+}
+
+/// Normalize a string, returning a [`Cow::Borrowed`] if the input was already normalized


Maybe mention why using the line_ending won't work here, because it mutates strings in place?

I also anticipate a small chance that we might do further "normalization" here, like using a consistent quote style or something similar

…alize in the CLI Allowing us to actually take advantage of the `Unchanged` optimization with CRLF endings, and correctly handle `--check` too

To prove we can parse these line endings, and to prove that the CRLF ends up in the `RStringValue`

… worth

DavisVaughan requested a review from lionel- January 6, 2025 18:07

lionel- approved these changes Jan 7, 2025

View reviewed changes

DavisVaughan added 5 commits January 7, 2025 14:46

Normalize multiline strings in the formatter so we don't have to norm…

8c1549e

…alize in the CLI Allowing us to actually take advantage of the `Unchanged` optimization with CRLF endings, and correctly handle `--check` too

Add a parser snapshot test for multiline strings with CRLF line endings

ab2c867

To prove we can parse these line endings, and to prove that the CRLF ends up in the `RStringValue`

Add CHANGELOG bullets

3f1980c

Mention why no line_ending crate usage

121c5d4

Don't use Cell after all, since it's more mental overhead than it's…

c6bf4b8

… worth

DavisVaughan force-pushed the feature/no-cli-normalize branch from 985bf67 to c6bf4b8 Compare January 7, 2025 19:46

DavisVaughan merged commit f41a27e into main Jan 7, 2025
4 checks passed

DavisVaughan deleted the feature/no-cli-normalize branch January 7, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't normalize strings in the CLI #127

Don't normalize strings in the CLI #127

DavisVaughan commented Jan 3, 2025

lionel- Jan 7, 2025

DavisVaughan Jan 7, 2025

Don't normalize strings in the CLI #127

Don't normalize strings in the CLI #127

Conversation

DavisVaughan commented Jan 3, 2025

lionel- Jan 7, 2025

Choose a reason for hiding this comment

DavisVaughan Jan 7, 2025

Choose a reason for hiding this comment