Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] De-/Encoding of adapted base64 #12

Open
Vlix opened this issue Mar 31, 2020 · 10 comments
Open

[Request] De-/Encoding of adapted base64 #12

Vlix opened this issue Mar 31, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@Vlix
Copy link

Vlix commented Mar 31, 2020

I found that some libraries out there use a slightly different base64 encoding, namely the "adapted base64 encoding", which is the same as regular base64 encoding, but with the + replaced by the ., and having no padding characters.
Would be nice (and not too much work, I think?) to add something like Data.ByteString.Base64.Adapted to handle this type of base64 encoding?

@emilypi
Copy link
Owner

emilypi commented Apr 2, 2020

Oh? This sounds interesting! Do you have any examples of such libraries? It wouldn't be too much trouble to implement something like this - it would just come down to adding a new encoding and decoding table, and then writing the module api.

@Vlix
Copy link
Author

Vlix commented Apr 2, 2020

I'm working on the password library, to offer an easy interface for all kinds of password algorithms, and I've noticed there are some implementations (including bcrypt) that do irregular base64 encoding, I'm still unsure if it's exactly the same as regular base64, except having . instead of + and not having padding, there's some sources which seem to say instead of the regular A-Za-z0-9+/ order, it uses ./0-9A-Za-z, or something...

The passlib library from Python apparently has some functions that handle this:

@Vlix
Copy link
Author

Vlix commented Apr 2, 2020

Oh and it's not always obvious from the documentation if any decode... functions can be used for unpadded base64.
In Data.ByteString.Base64 the decodeBase64 seems to allow unpadded input? And it refers to decodeBase64Unpadded, which is not in the module. In Data.ByteString.Base64.URL, the decodeBase64 is more explicit in its documentation to allow it. And there are more encode functions that remove the padding. Why not also have those in regular Data.ByteString.Base64?
In Data.Text.Encoding.Base64, the decodeBase64Lenient is the one that seems to allow unpadded input, but decodeBase64 has no such "Note: ..." like the ByteString variant has, so does it work the same or not?

... Should I maybe make a new issue for this inconsistency in documentation and availability of module functions? (I'd like a Data.ByteString.Base64.encodeBase64Unpadded for example)

@emilypi
Copy link
Owner

emilypi commented Apr 3, 2020

Ah i must have missed some documentation from when I removed unpadded std alphabet base64 support; thank you for bringing that up.

So far, only the URL-safe alphabet is supported by an RFC calling for optionally padded encodings. This is kept that way because it's RFC compliant, and otherwise makes for a confusing API. However, you have decodeBase64Unpadded precisely because some consumers require unpadded exclusively. I should probably add a decodeBase64Padded for symmetry - i agree that the lack of that function is slightly confusing.

Because it's important to be spec compliant, I am not willing to do an unpadded version of the std alphabet, but i'm happy to do this with any nonstandard alphabets, since they are not governed by an RFC!

@Vlix
Copy link
Author

Vlix commented Apr 3, 2020

Ah, ok, that sounds reasonable. And on second thought, you're right, there's no instance I'd want to use a Data.ByteString.Base64.encodeBase64Unpadded. I was thinking of the "Hash64" alphabet, since that's the one that doesn't pad in my use cases.

I hope the examples that the passlib library give can be used to decipher how this non-RFC style encoding works.

@Vlix
Copy link
Author

Vlix commented Apr 4, 2020

Hmmm, I've been scouring some more, and found it's also called radix-64 encoding, unix/crypt encoding and some more things. I've found a page of a linux function that should just do what you expect? Maybe?

And this bcrypt source code also has the order of:
"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
^ this also has the encodeBase64 function at the bottom (though it's in C)

EDIT: Oh man the Base64 wiki shows that the unix/crypt encoding is slightly different from the bcrypt encoding > .<
It also shows there are tons of non-standard encodings... maybe for the library, it'd be best to only add the unix/crypt variant, as its (probably?) the most used. Or make a module that has different tables, so the user of the API can just rawBytes = decodeBase64Other UnixCrypt "someEncodedString" where UnixCrypt is one of the non-standard tables you can give (or a data constructor for an EncodingTable sum-type that will choose the table for you in the decodeBase64Other function? I dunno, I'm just brainstorming...

EDIT2: Ok, the main reason I got into this is trying to parse Python's passlib formats, where its PBKDF2 formats have the . and no padding... and I tried the following literally in RepLit

from passlib.utils.binary import ab64_encode
val = ab64_encode('>>>')
print val

And it gives "Pj4." sigh which is literally standard base64, but the + replaced with a ., so not even by the unix/crypt alphabet or anything... WHY!? </rant>

@emilypi
Copy link
Owner

emilypi commented Apr 5, 2020

Thanks for hunting these down @Vlix. To address some of the issues raised here, I threw together an omnibus PR yesterday for all the things I want to get in before i do this here: #13

I'm probably going to punt on those TODO's in favor of getting yours in. I just need to sit at my screen on Monday and make sure it all makes sense to me first :)

@Vlix
Copy link
Author

Vlix commented Apr 6, 2020

Yeah, it requires a bit of reading up, but I think it would make sense to have this in the library, even though some of it is somewhat obscure. No rush, I've found the encoding I was looking for isn't even really a different one, so I can just s/./+/ and add padding ='s and keep going 👍
Good luck with the library! And don't hesitate to ask, I'm fairly responsive.

@emilypi emilypi added the enhancement New feature or request label Aug 15, 2020
@emilypi
Copy link
Owner

emilypi commented Dec 13, 2020

So just to follow up on this, I still do plan on doing this, but i don't want to start until we get Backpack supported in Stack. I'm just repeating the same module over and over, and it's not scalable.

@Vlix
Copy link
Author

Vlix commented Jan 26, 2022

Just for future reference, I've found Passlib's charmaps which show:

  • Base64 (A-Za-z0-9+/)
    • The Classic
  • Alternate Base64 (A-Za-z0-9./)
    • s/+/./
  • Hash64 (./0-9A-Za-z)
    • "This encoding system appears to have originated with des_crypt, but is used by md5_crypt, sha256_crypt, and others. Within Passlib, this encoding is referred as the “hash64” encoding, to distinguish it from normal base64 and others."
  • Bcrypt (./A-Za-z0-9)
    • "Base64 character map used by bcrypt."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants