Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When should percent decoding occur by default with paths? #42

Open
evancz opened this issue Feb 10, 2021 · 4 comments
Open

When should percent decoding occur by default with paths? #42

evancz opened this issue Feb 10, 2021 · 4 comments
Labels
breaking would require a MAJOR release meta

Comments

@evancz
Copy link
Member

evancz commented Feb 10, 2021

Problem

Some issues on this repo are saying that more decoding should happen on path segments by default:

Issue #20 also says that this decoding should not occur in all cases, referencing this issue that points to implementations in many languages that do not call percentDecode in a naive way.

What should be done?

Current Status

As mentioned in #16, it is currently possible to use Url.Parser.custom to create custom path parsers that have whatever behavior you personally think is best. For example, custom "STRING" Url.percentDecode would do very aggressive percent decoding.

So nobody is blocked on this topic. It is a question of defaults, and any change should be considered a breaking change that triggers a MAJOR version bump.

Goal

Make a table of scenarios to try to find the ideal defaults for a broad range of people. The default options for path parsing and building are:

  1. no percent decoding
  2. percent decoding for specific characters
  3. percent decoding for all characters

We currently do (1) but maybe it'd be good to make a table to show the various options. Right now I think it would be ideal for someone interested in this topic to:

  1. Build a table of interesting paths and see how they all work under different defaults.
  2. Check the defaults of implementations in other languages.

Hopefully this will reveal more information / some sort of consensus. I think being thorough about this is very important, since any change here could break a lot of peoples' code in ways that might be very hard to detect. Please share your efforts here or on the Elm Discourse.

@choonkeat
Copy link

choonkeat commented May 12, 2021

Since an Elm application produces a [ href ] links, and upon clicking it, the Browser.application is also expected to Url.Parser.parse router url, I think the important bit is: the value need to survive "round trip"

  • given a String value that we provide to Url.Builder.absolute
  • it must produce a usable link where after we click
  • and parse with Url.Parser.string
  • we must get back the same String value without further intervention

Current elm/url does not round trip.

Given String Url.Builder.absolute produces Url.Parser.string returns String Round trip
"hello" "hello" "hello" ✔️
"👍" "👍" "%F0%9F%91%8D"
"искать" "искать" "%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C"

If we use the auto encode and auto decode versions

Given String Url.Builder.absolute+encode produces Url.Parser.string+decode returns String Round trip
"hello" "hello" "hello" ✔️
"👍" "%F0%9F%91%8D" "👍" ✔️
"искать" "%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C" "искать" ✔️

We can try it out in this tiny app https://elm-url-test.netlify.app (source)

  1. Fill in the values
  2. Choose the mode
    • NoPercentEncodeDecode uses Url.Builder.absolute and Url.Parser.string as-is
    • PercentEncodeDecodePath uses Url.Builder.absolute + encode and Url.Parser.string + decode
  3. And most importantly, click on the generated link to see how the round trip turns out. Are we able to parse and obtain our original String values back

NOTE: if you gave Set Path a value of !@#$%^&*() and run the following js in browser console,

Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })

you'll notice that the a[href] wasn't escaped. this is a problem since # is treated literally as url hash

https://elm-url-test.netlify.app/path/!@#$%^&*()?query=%F0%9F%91%8D

if you did the same but with Set mode: [x] PercentEncodeDecodePath and run the same js, you'll notice that the a[href] is properly escaped

https://elm-url-test.netlify.app/path/!%40%23%24%25%5E%26*()?query=%F0%9F%91%8D

since any change here could break a lot of peoples' code in ways that might be very hard to detect

The function names would best be renamed to avoid this problem

  • Url.Parser.string becomes Url.Parser.rawString and we add Url.Parser.decodedString
  • Url.Builder.absolute et al will need to accept List String List PathSegment (counter part to List QueryParameter) but that means there's no generating of un-encoded paths

@choonkeat
Copy link

choonkeat commented May 13, 2021

using strings as-is, vuejs mostly round trips properly except it got thrown off by # (fixed by using named route for !@#$%^&*())

<router-link to="/path/foo?query=foo">Go to foo</router-link>
<router-link to="/path/👍?query=👍">Go to 👍</router-link>
<router-link :to="{ name: 'pathRoute', params: { id: '!@#$%^&*()' }}">Go to !@#$%^&*()</router-link>
<router-link to="/path/{}[]<>;/?query=/{}[]<>;/">Go to {}[]<>;/</router-link>
<router-link to="/path/日本?query=日本">Go to 日本</router-link>
<router-link to="/path/خحجث?query=خحجث">Go to خحجث</router-link>

the generated href attribute values are escaped

https://vue-url-test.netlify.app/path/foo?query=foo
https://vue-url-test.netlify.app/path/%F0%9F%91%8D?query=%F0%9F%91%8D
https://vue-url-test.netlify.app/path/!@%23$%25%5E&*()
https://vue-url-test.netlify.app/path/%7B%7D[]%3C%3E;/?query=%2F%7B%7D%5B%5D%3C%3E%3B%2F
https://vue-url-test.netlify.app/path/%E6%97%A5%E6%9C%AC?query=%E6%97%A5%E6%9C%AC
https://vue-url-test.netlify.app/path/%D8%AE%D8%AD%D8%AC%D8%AB?query=%D8%AE%D8%AD%D8%AC%D8%AB

https://vue-url-test.netlify.app/ (source)

@choonkeat
Copy link

choonkeat commented May 13, 2021

using strings as-is, react router mostly round trips properly. # character gave runtime error; solved with manual encodeURI

<li><Link to={{ pathname: "/path/foo", search: "?query=foo" }}>foo</Link></li>
<li><Link to={{ pathname: "/path/👍", search: "?query=👍" }}>👍</Link></li>
<li><Link to={{ pathname: "/path/!@#$%^&*()", search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link></li>
<li><Link to={{ pathname: "/path/" + encodeURI("!@#$%^&*()"), search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link> fixed</li>
<li><Link to={{ pathname: "/path/{}[]<>;", search: "?query={}[]<>;" }}>{}[]<>;</Link></li>
<li><Link to={{ pathname: "/path/日本", search: "?query=日本" }}>日本</Link></li>
<li><Link to={{ pathname: "/path/خحجث", search: "?query=خحجث" }}>خحجث</Link></li>

https://codesandbox.io/s/react-router-url-parameters-forked-sno3x?file=/example.js

Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })

running the above js in codesandbox's own console (bottom right), you'll notice most of the a[href] values are escaped, except for the 2 with !@#$%^&*()

https://sno3x.csb.app/path/foo?query=foo 
https://sno3x.csb.app/path/%F0%9F%91%8D?query=%F0%9F%91%8D
https://sno3x.csb.app/path/!@#$%^&*()?query=!@#$%^&*()
https://sno3x.csb.app/path/!@#$%25%5E&*()?query=!@#$%^&*()
https://sno3x.csb.app/path/%7B%7D[]%3C%3E;?query={}[]%3C%3E;
https://sno3x.csb.app/path/%E6%97%A5%E6%9C%AC?query=%E6%97%A5%E6%9C%AC
https://sno3x.csb.app/path/%D8%AE%D8%AD%D8%AC%D8%AB?query=%D8%AE%D8%AD%D8%AC%D8%AB

@choonkeat
Copy link

choonkeat commented May 13, 2021

  • Not auto-encoding Url.Builder.absolute path segments means we rely on browser (or http client) implementation of Postel's Law to fixup urls for us
  • Not auto-decoding Url.Parser.string means the values we provide to build links does not come back to us correctly

So autoencoding Url.Builder.absolute and auto decoding Url.Parser.string are both required.

[very sorry this came in as multiple comments. I didn't know how much time I had or needed]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking would require a MAJOR release meta
Projects
None yet
Development

No branches or pull requests

2 participants