Suggestion: `conservative_normalize!` #475

jarthod · 2022-08-23T15:31:06Z

After removing my patch for #459 now that it's released under 2.8.1 (thanks 🎉), I noticed I also wrote this little additional method in order to only normalize the URL if needed and otherwise keep it as-is (to avoid changing some reserved chars which can be either encoded or not):

module Addressable
  class URI
    # only normalize if we see obivously unallowed chars, to avoid changing
    # user preference (encoded or not) for some reserved chars
    def conservative_normalize!
      self.path = normalized_path if path&.match?(/[^#{Addressable::URI::CharacterClasses::PATH + "%"}]/)
      self.query = normalized_query if query&.match?(/[^#{Addressable::URI::CharacterClasses::QUERY + "%"}]/)
      self.host = normalized_host if host&.match?(/[^#{Addressable::URI::CharacterClasses::HOST}]/)
      self.userinfo = normalized_userinfo if userinfo&.match?(/[^#{Addressable::URI::CharacterClasses::UNRESERVED + Addressable::URI::CharacterClasses::SUB_DELIMS + ":"}]/)
      self
    end
  end
end

I just though I would share it in case you (or somebody else) are interested in it.

Here is an example URL taken from my service and redacted (note the presence of / and %2F in the param):

http://domain.net/?param=fil:for(webp)/http%3A%2F%2Fwin.net%2Fpics.jpg

The current normalize! method would change the %2F back into /:

http://domain.net/?param=fil:for(webp)/http://win.net/pics.jpg

Both are valid URL, yes, but it's not what the person who wrote this webservice intendend and as they are doing some parsing before unencoding, it actually breaks their server (they probably expected to have a slash unencoded to split and then the rest encoded).

So I had to write this conservative_normalize! to avoid messing with the existing encoding as long as it's legal, and only normalize if necessary.

The text was updated successfully, but these errors were encountered:

sporkmonger · 2022-09-07T19:01:27Z

Interesting concept. I've generally recommended an approach where normalization is done component-by-component when people have had similar needs, and it's partly why escaping methods already take character classes as parameters. That said, I could see an argument for why this is a common-enough use-case to warrant something like this.

sporkmonger · 2022-09-07T19:04:36Z

#472 is probably related and might make this issue redundant once fixed?

jarthod · 2022-09-08T09:46:17Z

Ah yes indeed, I did not had the courage to check spec to verify if it was acceptable to normalize this way so I choose the less resistance path by considering the normalize method as correct and wrote my own less invasive version. But if it's actually acceptable to modify the normalize method to avoid doing this it might make my method useless.

dentarg · 2023-07-19T08:04:40Z

Close this one in favour of #366? Looks like the normalization should be fixed, counted at least 5 different issues reported about it (besides this one)

jarthod · 2023-07-19T13:28:32Z

It's possible that once #366 is fixed this won't be needed any more indeed, I would have to run some tests on my end to see if I detect other problematic cases. In any case this method is monkey-patched on my end so yes the issue can be closed, it was just posted a suggestion and/or for others.

dentarg · 2023-07-19T20:25:00Z

I'll keep it open until there is a PR for #366 as a reminder about you double checking the work :)

jarthod · 2023-07-19T22:37:36Z

Yep ok 👍

dentarg mentioned this issue Jul 19, 2023

Normalization: don't decode percent-encoded reserved characters #366

Open

dentarg added the Duplicate label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: `conservative_normalize!` #475

Suggestion: `conservative_normalize!` #475

jarthod commented Aug 23, 2022

sporkmonger commented Sep 7, 2022

sporkmonger commented Sep 7, 2022 •

edited

Loading

jarthod commented Sep 8, 2022

dentarg commented Jul 19, 2023 •

edited

Loading

jarthod commented Jul 19, 2023

dentarg commented Jul 19, 2023

jarthod commented Jul 19, 2023

Suggestion: conservative_normalize! #475

Suggestion: conservative_normalize! #475

Comments

jarthod commented Aug 23, 2022

sporkmonger commented Sep 7, 2022

sporkmonger commented Sep 7, 2022 • edited Loading

jarthod commented Sep 8, 2022

dentarg commented Jul 19, 2023 • edited Loading

jarthod commented Jul 19, 2023

dentarg commented Jul 19, 2023

jarthod commented Jul 19, 2023

Suggestion: `conservative_normalize!` #475

Suggestion: `conservative_normalize!` #475

sporkmonger commented Sep 7, 2022 •

edited

Loading

dentarg commented Jul 19, 2023 •

edited

Loading