Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMAC key should be an array of bytes, not a character string #308

Open
ErwanLegrand opened this issue Aug 30, 2023 · 9 comments
Open

HMAC key should be an array of bytes, not a character string #308

ErwanLegrand opened this issue Aug 30, 2023 · 9 comments
Milestone

Comments

@ErwanLegrand
Copy link
Contributor

ErwanLegrand commented Aug 30, 2023

What would you like to see added?

Better support for high entropy secrets

Additional Context

Proper keys for symmetric cryptographic algorithms are produced by a cryptographically secure pseudorandom number generator (CSPRNG) or a key derivation function (KDF). CSPRNGs and KDFs do not typically output character strings. They produce streams of bytes (or bits).

This is the reason why the OpenSSL functions take arrays of unsigned char as input. Note the unsigned qualifier. This is OpenSSL's way of saying "these are not characters". (uint8_t was not available when SSLeay was originally designed.)

While an array of bytes could be turned into a character string using hexadecimal encoding or base64, this is not convenient.

Moreover, accepting a character string as input gives the wrong impression to users lacking the relevant cryptography knowledge. They are likely to use low entropy secrets (AKA passwords) instead of high entropy secrets (AKA keys). This introduces a vulnerability since using a low entropy secret as HMAC key makes brute force attacks against the JWTs possible.

(Whenever a password is used in a properly designed cryptographic system, it is always passed first to a password-based key derivation function such as Argon2. Only then are keys produced by the KDF used with symmetric algorithms such as ciphers or MACs.)

Finally, supporting symmetric JWK keys (with "kty" set to "oct") requires decoding the base64url encoded key, which yields an array of octets, not a character string. (See https://datatracker.ietf.org/doc/html/rfc7518#section-6.4)

Now that I am done with the explanations, here is what I suggest:

  1. Support arrays of bytes as the HMAC keys.
  2. Find ways to warn users that they are probably doing something wrong if they use a string as the key.
@ErwanLegrand
Copy link
Contributor Author

ErwanLegrand commented Aug 30, 2023

Rewriting the code example which builds a token signed with HS256 and "secret" as the key could be a first step toward 2.

@ErwanLegrand
Copy link
Contributor Author

hashcat includes support for cracking JWts: https://hashcat.net/hashcat/.

@prince-chrismc
Copy link
Collaborator

There's no objections to offering a more secure API and better examples. A contribution like this would be amazing!

Also is this you https://www.linkedin.com/in/erwanlegrand/? You seem very well informed 💭If so, I am happy this is the biggest issue 🙃

Given https://www.openssl.org/docs/man1.1.1/man3/HMAC.html and https://www.openssl.org/docs/man3.1/man3/HMAC.html have always taken a const void *key I do not share your characterization on the authors intent.

That being said, I would love to see one which takes a BIGNUM taking advantage of BN_rand for example to help guide less experienced dev. This matches what you are suggestion as well.

HMAC is generally not recommended https://research.securitum.com/jwt-json-web-token-security/ or https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/06-Session_Management_Testing/10-Testing_JSON_Web_Tokens or https://auth0.com/blog/brute-forcing-hs256-is-possible-the-importance-of-using-strong-keys-to-sign-jwts/#Brute-Forcing-a-HS256-JSON-Web-Token so I personally am not so pressed on seeing JWK support. As you put it

gives the wrong impression to users lacking the relevant cryptography knowledge.

I would prefer to see ECDSA or RSA to help encourage those being picked. If there was demand or contributions, I'd be very open, but I have the impression you are a security researcher not a user.

@Thalhammer
Copy link
Owner

While an array of bytes could be turned into a character string using hexadecimal encoding or base64, this is not convenient.

jwt-cpp passes the string/byte array passed to it directly to openssl, it does not parse it in any way. If you encode the key in any way it will cause wrong results.

Moreover, accepting a character string as input gives the wrong impression to users lacking the relevant cryptography knowledge. They are likely to use low entropy secrets (AKA passwords) instead of high entropy secrets (AKA keys).

A password is not necessarily low entropy. You can easily compensate the unused bits by choosing a slightly longer password.

This introduces a vulnerability since using a low entropy secret as HMAC key makes brute force attacks against the JWTs possible.

It doesn't introduce a vulnerability. At best it reduces the time it takes for brute forcing a tiny bit. That said, if it takes 16x the age of the universe to brute force or 4x hardly matters. Above that I'd argue if someone doesn't understand that using a short string as the key is a bad idea, he's are probably not the right person to implement this portion of the codebase.

Finally, supporting symmetric JWK keys (with "kty" set to "oct") requires decoding the base64url encoded key, which yields an array of octets, not a character string.

I feel like you have a misunderstanding when it comes to "array of octets" and "character string". Those are fundamentally the same thing. Both represent 8bits per element of usable space. The only difference is their typical usage and the range, which is 0-255 for unsigned and -128 - 127 for signed. It has no implications for entropy. 0x48,0x65,0x6c,0x6c,0x6f has exactly the same entropy as "Hello".

In C++ std::string isn't always a string, it doubles as a convenient container for raw bytes because it has some benefits over std::vector<uint8_t> (e.g. sso, convenient substring, etc.), but fundamentally they are the same thing. Nothing in the api stops you from using bytes outside the printable range as a key and I highly encourage you to do so, however virtually every software I have encountered so far (that allows you configure the secret) expects you to supply a random string thats directly used as the key. No encoding, no key derivation, just a plain raw string. The reason for this is that they expect you to use a long enough (>24 characters) key and so does jwt-cpp.

Lets make a simple calculation:
We only use printable ascii (codepoint 32 - 127) and have a rtx3090, which can do ~65000 MHash/s.
96^24/65000000000 = 5775588411186190950089257469772814530s = ≈ 1.3×10^19 × age of the universe
Compared to ≈ 2.2×10^29 × age of the universe if we use a fully random 24 byte key instead.
Sure, the "real key" takes 10^10 times longer. But who cares. Both will take longer than the universe has existed, which is my line of "safe enough". Not to mention that nothing is stopping you from using the latter.

The point I am trying to make is that jwt-cpp makes it simple to use, while aligning with every other implementation in terms of usage (even the official jwt.io website uses ascii keys for hmac by default).

I agree that in theory it makes it less secure. I have nothing against adding an overload that takes a uint8_t array, but I certainly won't deprecate/remove the existing string api (and break countless users code) for a purely theoretical attack.

However if you think otherwise:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.4HW86rk_yniOszQ2y7IM_bqtXT-Mh_tuycNZGkcUhH8
break the key and sign a new jwt with my username as the "name" claim. The key is (according to you) completely insecure and bruteforceable. In fact it doesn't even use special characters, only numbers and lower/uppercase ascii. But please don't use all our universes electricity for doing so.

@ErwanLegrand
Copy link
Contributor Author

ErwanLegrand commented Sep 1, 2023

I thank you, @prince-chrismc, for the fast reply!

There's no objections to offering a more secure API and better examples. A contribution like this would be amazing!

I can certainly try to write a better example. As a for improving the API, I'd love to, but I don't think I am the right person for this. The truth is I stopped writing C++ code on a regular basis back in 2000. This was a long time ago. I have forgotten a lot since then and the language has changed a lot, too! C++98 was not widely adopted back then. g++ did not even provide a STL implementation, I had to use something called stlport which was derived from the original SGI STL written by Stepanov. And g++ did not support some perfectly legitimate C++ constructs. I think you'll get the idea.

Also is this you https://www.linkedin.com/in/erwanlegrand/? You seem very well informed 💭If so, I am happy this is the biggest issue 🙃

Yes, this is me. Jokes aside, I am not familiar enough with your code base at this point to tell whether or not this is the biggest issue. Lack of entropy will be seen as a big problem by anyone who has studied cryptography, though.

Given https://www.openssl.org/docs/man1.1.1/man3/HMAC.html and https://www.openssl.org/docs/man3.1/man3/HMAC.html have always taken a const void *key I do not share your characterization on the authors intent.

My bad! I was sure I knew this part of OpenSSL's API and it turns out I didn't. This API is highly inconsistent! :-( I see no good reason why the key and the data have different types. Both are arrays of bytes which are concatenated before they are fed to whatever message digest function is used in conjunction with HMAC.

That being said, I would love to see one which takes a BIGNUM taking advantage of BN_rand for example to help guide less experienced dev. This matches what you are suggestion as well.

Public key cryptography is essentially algebra done with very large numbers. Thus, BIGNUMs are used when you do things like RSA, DH, ECDH or ECDSA. The keys used by the implementation of these algorithms in OpenSSL are actually BIGNUMs. Symmetric cryptography is a different kind of animal. Here the algorithms manipulate arrays or streams of bits (or in practice octets, bytes, words...) Thus, what you should use to generate a secret for use with symmetric cryptography (HMAC, AES...) is a function of the RAND_bytes() family.

HMAC is generally not recommended https://research.securitum.com/jwt-json-web-token-security/ or https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/06-Session_Management_Testing/10-Testing_JSON_Web_Tokens or https://auth0.com/blog/brute-forcing-hs256-is-possible-the-importance-of-using-strong-keys-to-sign-jwts/#Brute-Forcing-a-HS256-JSON-Web-Token so I personally am not so pressed on seeing JWK support. As you put it

gives the wrong impression to users lacking the relevant cryptography knowledge.

I would prefer to see ECDSA or RSA to help encourage those being picked.

HS256 is perfectly fine provided the entity which signs the JWT and the one which verifies the signature are the same and it is used with a high entropy secret (a "strong key"). I realize this is not typically the case within OAuth2/OIDC, but this is a case I have to deal with.

This Auth0 blog entry is worrying to say the least. The authors have a partial understanding of the issue. They should have read the JWS RFC: https://www.rfc-editor.org/rfc/rfc7515#section-10.1

Let me quote it:

Keys are only as strong as the amount of entropy used to generate
them. A minimum of 128 bits of entropy should be used for all keys,
and depending upon the application context, more may be required.
Implementations must randomly generate public/private key pairs, MAC
keys, and padding values. The use of inadequate pseudorandom number
generators (PRNGs) to generate cryptographic keys can result in
little or no security. An attacker may find it much easier to
reproduce the PRNG environment that produced the keys, searching the
resulting small set of possibilities rather than brute-force
searching the whole key space. The generation of quality random
numbers is difficult. RFC 4086 offers important guidance in this area.

Now, these are people who know what they are talking about, as opposed to the Auth0 folks! "secret" might be 48 bits in length encoded in ASCII, it does not qualify as a 48 bit cryptographic secret. In cryptography, entropy is a measure of unpredictability not a measure of data length. There is no way an attacker is going to make 2^48 attempts to break this key. The entropy of this "secret" is close to 1, not 2^48. May this time I managed to give a better understanding of what the problem is?

Reading this and what @Thalhammer says regarding jwt.io it seems that the people at Auth0, not you, are the origin of the problem. Probably this explain why I see the same bad examples in so many JWT libraries.

It is hard to believe that the people behind Auth0 and jw.io can't even read the JWS RFC and provide guidance based on... On what exactly? I have no idea! Honesty, this is infuriating!

If there was demand or contributions, I'd be very open, but I have the impression you are a security researcher not a user.

While I do have a research background (I graduated in mathematics / theoretical computer science and studied cryptography back in the 1990's) I have been writing production code for the past 25 years or so. What I am concerned with here is real world issues, not theoretical problems.

@ErwanLegrand
Copy link
Contributor Author

ErwanLegrand commented Sep 1, 2023

@Thalhammer I will not reply to everything you have written. If you could please give me pointers to these examples on jwt.io which use text as HMAC keys, though, I will ask the people behind Auth0 and jwt.io to clean up the mess they have created, if they will.

And if you will heed my advice, please trust the RFC's, not what the people at Auth0/jwt.io write!

@ErwanLegrand
Copy link
Contributor Author

Also, the reason why passwords are commonly called "low entropy secrets" by cryptographers is that passwords are secrets which are generated, remembered and managed by humans and humans are really bad at these things. If there is no good reason for a secret to be human-generated, then you don't want it to be human-generated. Hopefully this clarifies things a little bit?

@ErwanLegrand
Copy link
Contributor Author

This appears to generate keys for HS256 the right way: https://github.com/rakutentech/jwkgen

A classic example showing that password length and entropy are two different things: https://www.reddit.com/r/Bitcoin/comments/1ptuf3/brain_wallet_disaster/

@ErwanLegrand
Copy link
Contributor Author

I've been giving this more thought. I think I can suggest a plan to address the issue and maybe contribute some code, after all. (Do not expect the code to be top quality C++, though, given what I said earlier.)

@prince-chrismc prince-chrismc added this to the 0.8.0 milestone Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@ErwanLegrand @Thalhammer @prince-chrismc and others