Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64url support? #66

Open
jrch2k20 opened this issue Feb 28, 2020 · 7 comments
Open

Base64url support? #66

jrch2k20 opened this issue Feb 28, 2020 · 7 comments

Comments

@jrch2k20
Copy link

Hi, aklomp i was searching for a nice base64 library and ended here but my low level C is a bit rusty, so i would like to ask you if this library could be used to handle base64URL variant? or at least some tips on how to add it?(i would post patches back if i made it)

Thanks you very much for your time

@aklomp
Copy link
Owner

aklomp commented Mar 3, 2020

Hi! Currently this library only supports the standard Base64 alphabet.

It would probably be feasible, but difficult, to add support for additional alphabets. It could be non-trivial to add that support to the SIMD codecs, because they do arithmetic on the raw character values. Here's an encoder example and here's a decoder.

My intuition is that it would probably be possible to find an arithmetic-based solution that works with the alternative alphabet since the differences are small. However, it would require duplicating a lot of code, and adding some sort of user-visible flag to the API to indicate which alphabet to use. Maybe it could be a compile-time flag to not incur runtime penalties or complexities.

@gfoidl
Copy link

gfoidl commented Mar 4, 2020

since the differences are small. However, it would require duplicating a lot of code

Yeah, it's possible. I've done this for C#, but as @aklomp says there's a lot of duplication and the nice tricks applied to standard base64 don't work so nice with base64Url. Especially on the decoding side for input-validation.

@jrch2k20
Copy link
Author

jrch2k20 commented Mar 4, 2020

Thank you very much for your time.

Yeah, i see your point and probably will be easier to handle base64url to base64 translation externally in the c++17 side of thing since the chunk are small and this library already give me a nice speed up so i have some wiggle room.

@mayeut
Copy link
Contributor

mayeut commented Mar 4, 2020

There's an example of translation in https://github.com/mayeut/pybase64/blob/1e2f3ec63549085f06b3118671818edb969c1e3d/pybase64/_pybase64.c#L71

The translation is done in-place for encoding.
The translation is done out-of-place for decoding (warning, the translation is not safe here to mimic python behavior, c.f. inline comment)

@jrch2k20
Copy link
Author

jrch2k20 commented Mar 5, 2020

well for now a very simple std implementation seems to do the job with very small assembly output

https://godbolt.org/z/8J2zKG

@ashundi-tibco
Copy link

What is being asked is to just accept base64url format when decoding. Replacing characters 62 and 63 is best done at https://github.com/aklomp/base64/blob/master/lib/tables/tables.c Supporting encoding would require more work...

@emmansun
Copy link

a sample lookup table used in https://github.com/emmansun/base64

// The input consists of six character sets in the Base64 alphabet, which we
// need to map back to the 6-bit values they represent. There are three ranges,
// two singles, and then there's the rest.
//
//  #  From       To        Add  Characters
//  1  [45]       [62]      +17  -
//  2  [48..57]   [52..61]   +4  0..9
//  3  [65..90]   [0..25]   -65  A..Z
//  4  [95]       [63]      -32  _
//  5  [97..122]  [26..51]  -71  a..z
// (6) Everything else => invalid input
//
// We will use lookup tables for character validation and offset computation.
//
// For offsets:
// Perfect hash for lut = ((src >> 4) & 0x0F) - ((src > 0x5e) ? 0xFF : 0x00)
// 0000 = garbage
// 0001 = garbage
// 0010 = -
// 0011 = 0-9
// 0100 = A-Z
// 0101 = A-Z
// 0110 = _
// 0111 = a-z
// 1000 = a-z
// 1000 > garbage
//
// For validation, here's the table.
// A character is valid if and only if the AND of the 2 lookups equals 0:
//
// hi \ lo              0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
//      LUT             0x15 0x11 0x11 0x11 0x11 0x11 0x11 0x11 0x11 0x11 0x13 0x1B 0x1B 0x1A 0x1B 0x33
//
// 0000 0x10 char        NUL  SOH  STX  ETX  EOT  ENQ  ACK  BEL   BS   HT   LF   VT   FF   CR   SO   SI
//           andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
//
// 0001 0x10 char        DLE  DC1  DC2  DC3  DC4  NAK  SYN  ETB  CAN   EM  SUB  ESC   FS   GS   RS   US
//           andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
//
// 0010 0x01 char               !    "    #    $    %    &    '    (    )    *    +    ,    -    .    /
//           andlut     0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x01 0x00 0x01 0x01
//
// 0011 0x02 char          0    1    2    3    4    5    6    7    8    9    :    ;    <    =    >    ?
//           andlut     0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x02 0x02 0x02 0x02 0x02 0x02
//
// 0100 0x04 char          @    A    B    C    D    E    F    G    H    I    J    K    L    M    N    O
//           andlut     0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
//
// 0101 0x08 char          P    Q    R    S    T    U    V    W    X    Y    Z    [    \    ]    ^    _
//           andlut     0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x08 0x08 0x08 0x00
//
// 0110 0x04 char          `    a    b    c    d    e    f    g    h    i    j    k    l    m    n    o
//           andlut     0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
// 0111 0x28 char          p    q    r    s    t    u    v    w    x    y    z    {    |    }    ~
//           andlut     0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x08 0x08 0x08 0x20
//
// 1000 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1001 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1010 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1011 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1100 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1101 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1110 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10
// 1111 0x10 andlut     0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants