Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind out crc64 and fix various hash bidning issues #583

Merged
merged 16 commits into from
Oct 3, 2024
Merged

Conversation

DmitriyMusatkin
Copy link
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

export class Md5Hash {
private hash?: Crypto.WordArray;
function hashToUint8Array(hash: Crypto.WordArray) {
return Uint8Array.from(hash.toString(Crypto.enc.Latin1).split('').map(c => c.charCodeAt(0)));;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use Crypto.enc.Latin1?

brix/crypto-js#274
Looks like there were some discussions about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its mostly for simplicity purposes. Dumping to hex and then parsing from it is slightly more code.
I tried out parsing the array out of the words initally and it was not that much of perf difference and would require updating types dependency, which breaks mqtt code. Instead of mucking further with cryptoJS and its deps at that point i think it would be better to spend that time on dropping cryptoJS altogether, so i just went with whatever works with current deps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe renaming to wordArrayToUint8Array?

Maybe add some comments about how Crypto.enc.Latin1 and charCodeAt to convert the buffer back and forth?

So, firstly, I thought we should use base64 as we commonly used. But, after digging, it seems not a directly match when converted to string. And the documentation of charCodeAt said it used utf-16?

Agreed on whatever works, but I feel those encoding are really confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely agree and i think eventually it makes sense to move off CryptoJS and use WebCrypto as well.
current complication is that webcrypto only has shas and we need alternatives for crc's and md5 (if we want to keep a browser version).
I added a comment, but the jist of it is that latin1 is 8bit encoding (my understanding its the same as 8bit ascii variant) and charCodeAt just retrieves the corresponding byte value (it can return values bigger than a byte for unicode, but in this case we are limited by chars not being unicode to byte size values)

package.json Outdated Show resolved Hide resolved
source/checksums.c Outdated Show resolved Hide resolved
uint64_t previous = 0;
if (!aws_napi_is_null_or_undefined(env, node_args[1])) {
if (aws_byte_buf_init_from_napi(&previous_buf, env, node_args[1])) {
napi_throw_type_error(env, NULL, "previous argument must be undefined or a positive number");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous is confusing to be a name of the variable or a real word.

Suggested change
napi_throw_type_error(env, NULL, "previous argument must be undefined or a positive number");
napi_throw_type_error(env, NULL, "`previous` argument must be undefined or a positive number");

Also, a positive number doesn't seem to descriptive enough as a 64bit data? Oh, I guess you were copying from the crc32 case. So, we should update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, previous naming was copied from crc32 case. let me update the naming to indicate its previous crc

export class Md5Hash {
private hash?: Crypto.WordArray;
function hashToUint8Array(hash: Crypto.WordArray) {
return Uint8Array.from(hash.toString(Crypto.enc.Latin1).split('').map(c => c.charCodeAt(0)));;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe renaming to wordArrayToUint8Array?

Maybe add some comments about how Crypto.enc.Latin1 and charCodeAt to convert the buffer back and forth?

So, firstly, I thought we should use base64 as we commonly used. But, after digging, it seems not a directly match when converted to string. And the documentation of charCodeAt said it used utf-16?

Agreed on whatever works, but I feel those encoding are really confusing.

Copy link
Contributor

@TingDaoK TingDaoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just trivial comments

if (napi_get_value_uint32(env, node_args[1], &previous)) {
napi_throw_type_error(env, NULL, "previous argument must be undefined or a positive number");
if (napi_get_value_uint32(env, node_args[1], &previous_crc)) {
napi_throw_type_error(env, NULL, "previous value argument must be undefined or a positive number");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error is meant to point the variable name in nodejs function here

Maybe rename the nodejs variable name to previous_crc, and use previous_crc here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im on a fence about it. previous_crc is a local variable that does not have much meaning to caller in js land. from that perspective i think just referring to it as previous value makes more sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly because all the other place we have the error throw out is referring to the variable name used in the nodejs land. I think it's better to be consistent

source/checksums.c Outdated Show resolved Hide resolved
@DmitriyMusatkin DmitriyMusatkin merged commit 83d197d into main Oct 3, 2024
30 checks passed
@DmitriyMusatkin DmitriyMusatkin deleted the crc64 branch October 3, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants