Skip to content

Commit

Permalink
jq/urlencode.jq: update for full UTF-8 support
Browse files Browse the repository at this point in the history
  • Loading branch information
RayPlante committed Jul 7, 2024
1 parent 3930bc8 commit f02e86f
Showing 1 changed file with 31 additions and 14 deletions.
45 changes: 31 additions & 14 deletions jq/urldecode.jq
Original file line number Diff line number Diff line change
Expand Up @@ -15,34 +15,51 @@ def until(condition; next):
def u: if condition then . else (next|u) end;
u;

# interpret a string as a number in some base system
#
# Input: string
# Output: number
#
def to_i(base):
explode
| reverse
| map(if 65 <= . and . <= 90 then . + 32 else . end) # downcase
| map(if . > 96 then . - 87 else . - 48 end) # "a" ~ 97 => 10 ~ 87
| reduce .[] as $c
# base: [power, ans]
([1,0]; (.[0] * base) as $b | [$b, .[1] + (.[0] * $c)]) | .[1];

def hex2utf8(shift; off):
[to_i(16)-off+((shift|to_i(16))*64)] | implode;

def hex2utf8(shift):
hex2utf8(shift; 128);

def hex2utf8:
hex2utf8("0"; 0);

# replace all url-encodings (%XX) in an input string with their unencoded
# characters.
# characters. This decoder should be fully UTF-8 compliant, recognizing
# the %CX%XX pattern.
#
# Input: string
# Output: string
#
def url_decode:
# The helper function converts the input string written in the given
# "base" to an integer
def to_i(base):
explode
| reverse
| map(if 65 <= . and . <= 90 then . + 32 else . end) # downcase
| map(if . > 96 then . - 87 else . - 48 end) # "a" ~ 97 => 10 ~ 87
| reduce .[] as $c
# base: [power, ans]
([1,0]; (.[0] * base) as $b | [$b, .[1] + (.[0] * $c)]) | .[1];

. as $in
| length as $length
| [0, ""] # i, answer
| until ( .[0] >= $length;
.[0] as $i
| if $in[$i:$i+1] == "%"
then [ $i + 3, .[1] + ([$in[$i+1:$i+3] | to_i(16)] | implode) ]
then
if $in[$i+1:$i+2] == "C" and $in[$i+3:$i+4] == "%"
then [ $i + 6, .[1] + ($in[$i+4:$i+6] | hex2utf8($in[$i+2:$i+3])) ]
else [ $i + 3, .[1] + ($in[$i+1:$i+3] | hex2utf8) ]
end
else [ $i + 1, .[1] + $in[$i:$i+1] ]
end)
| .[1]; # answer
| .[1]; # answer

# replace url-encodings, including pluses (+), with their corresponding
# characters. This is like url_encode, except that it also replaces each
Expand Down

0 comments on commit f02e86f

Please sign in to comment.