-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: Remove StringWriter that modifies input in place (#12219)
Summary: StringWriter supports two flavors, one that writes out new strings to a result Vector, and one that modifies existing strings in place. The latter is only used in two places, providing fast paths for upper/lower and replace/replace_first. I found a related bug with fuzzer that occurs in replace/replace_first that occurs when the StringViews in the first argument's Vector point to overlapping ranges of the same string buffers. In this case the changes to one row are accidentally applied to multiple rows resulting in incorrect results. Since we explicitly allow operations that produce a substring of the original string to do so with a no-copy implementation, StringViews with overlapping ranges can occur. E.g. SimpleFunctions that apply this no-copy optimization on a string argument which could be constant, with other arguments that determine the range of the original string to take that are non-constant (like trim). I discussed this offline with a few folks and since the above is allowed and this in-place optimization is so rarely used, the consensus was to treat the string buffers in a FlatVector as immutable. Therefore in this change, I remove the flavor of StringWriter that modifies strings in place. This fixes the bug in replace/replace_first. upper/lower was using it correctly because the optimization was only applied to ASCII strings, the function takes a single argument, and the modification is idempotent (the value of a byte in the string don't depend on any other bytes in the string or any other arguments, and can be reapplied without consequences). Given how precarious this optimization is (if any of those conditions changed it would result in difficult to detect bugs), and allowing upper/lower to mutate the string in place would invite others to do so in the future (potentially leading to more bugs like in replace/replace_first) I think losing this fast path is worth the added safety. I also updated the documentation to clarify that the string buffers in a FlatVector should be treated as immutable. Differential Revision: D68924324
- Loading branch information
1 parent
49292da
commit 7639307
Showing
23 changed files
with
122 additions
and
288 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.