Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow decoding raw strings #235

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,12 +148,15 @@ Name|Type|Default
extensionCodec | ExtensionCodec | `ExtensionCodec.defaultCodec`
context | user-defined | -
useBigInt64 | boolean | false
rawStrings | boolean | false
maxStrLength | number | `4_294_967_295` (UINT32_MAX)
maxBinLength | number | `4_294_967_295` (UINT32_MAX)
maxArrayLength | number | `4_294_967_295` (UINT32_MAX)
maxMapLength | number | `4_294_967_295` (UINT32_MAX)
maxExtLength | number | `4_294_967_295` (UINT32_MAX)

To skip UTF-8 decoding of strings, `rawStrings` can be set to `true`. In this case, strings are decoded into `Uint8Array`.

You can use `max${Type}Length` to limit the length of each type decoded.

### `decodeMulti(buffer: ArrayLike<number> | BufferSource, options?: DecoderOptions): Generator<unknown, void, unknown>`
Expand Down Expand Up @@ -498,18 +501,19 @@ null, undefined|nil|null (*1)
boolean (true, false)|bool family|boolean (true, false)
number (53-bit int)|int family|number
number (64-bit float)|float family|number
string|str family|string
ArrayBufferView |bin family|Uint8Array (*2)
string|str family|string (*2)
ArrayBufferView |bin family|Uint8Array (*3)
Array|array family|Array
Object|map family|Object (*3)
Date|timestamp ext family|Date (*4)
bigint|N/A|N/A (*5)
Object|map family|Object (*4)
Date|timestamp ext family|Date (*5)
bigint|N/A|N/A (*6)

* *1 Both `null` and `undefined` are mapped to `nil` (`0xC0`) type, and are decoded into `null`
* *2 Any `ArrayBufferView`s including NodeJS's `Buffer` are mapped to `bin` family, and are decoded into `Uint8Array`
* *3 In handling `Object`, it is regarded as `Record<string, unknown>` in terms of TypeScript
* *4 MessagePack timestamps may have nanoseconds, which will lost when it is decoded into JavaScript `Date`. This behavior can be overridden by registering `-1` for the extension codec.
* *5 bigint is not supported in `useBigInt64: false` mode, but you can define an extension codec for it.
* *2 If you'd like to skip UTF-8 decoding of strings, set `rawStrings: true`. In this case, strings are decoded into `Uint8Array`.
* *3 Any `ArrayBufferView`s including NodeJS's `Buffer` are mapped to `bin` family, and are decoded into `Uint8Array`
* *4 In handling `Object`, it is regarded as `Record<string, unknown>` in terms of TypeScript
* *5 MessagePack timestamps may have nanoseconds, which will lost when it is decoded into JavaScript `Date`. This behavior can be overridden by registering `-1` for the extension codec.
* *6 bigint is not supported in `useBigInt64: false` mode, but you can define an extension codec for it.

If you set `useBigInt64: true`, the following mapping is used:

Expand All @@ -519,15 +523,15 @@ null, undefined|nil|null
boolean (true, false)|bool family|boolean (true, false)
**number (32-bit int)**|int family|number
**number (except for the above)**|float family|number
**bigint**|int64 / uint64|bigint (*6)
**bigint**|int64 / uint64|bigint (*7)
string|str family|string
ArrayBufferView |bin family|Uint8Array
Array|array family|Array
Object|map family|Object
Date|timestamp ext family|Date


* *6 If the bigint is larger than the max value of uint64 or smaller than the min value of int64, then the behavior is undefined.
* *7 If the bigint is larger than the max value of uint64 or smaller than the min value of int64, then the behavior is undefined.

## Prerequisites

Expand Down
28 changes: 24 additions & 4 deletions src/Decoder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,17 @@ export type DecoderOptions<ContextType = undefined> = Readonly<
*/
useBigInt64: boolean;

/**
* By default, string values will be decoded as UTF-8 strings. However, if this option is true,
* string values will be returned as Uint8Arrays without additional decoding.
*
* This is useful if the strings may contain invalid UTF-8 sequences.
*
* Note that this option only applies to string values, not map keys. Additionally, when
* enabled, raw string length is limited by the maxBinLength option.
*/
Comment on lines +29 to +31
Copy link

@jannotti jannotti Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this for map keys as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library doesn't support binary keys, so I decided to skip that case.

rawStrings: boolean;

/**
* Maximum string length.
*
Expand Down Expand Up @@ -195,6 +206,7 @@ export class Decoder<ContextType = undefined> {
private readonly extensionCodec: ExtensionCodecType<ContextType>;
private readonly context: ContextType;
private readonly useBigInt64: boolean;
private readonly rawStrings: boolean;
private readonly maxStrLength: number;
private readonly maxBinLength: number;
private readonly maxArrayLength: number;
Expand All @@ -215,6 +227,7 @@ export class Decoder<ContextType = undefined> {
this.context = (options as { context: ContextType } | undefined)?.context as ContextType; // needs a type assertion because EncoderOptions has no context property when ContextType is undefined

this.useBigInt64 = options?.useBigInt64 ?? false;
this.rawStrings = options?.rawStrings ?? false;
this.maxStrLength = options?.maxStrLength ?? UINT32_MAX;
this.maxBinLength = options?.maxBinLength ?? UINT32_MAX;
this.maxArrayLength = options?.maxArrayLength ?? UINT32_MAX;
Expand Down Expand Up @@ -399,7 +412,7 @@ export class Decoder<ContextType = undefined> {
} else {
// fixstr (101x xxxx) 0xa0 - 0xbf
const byteLength = headByte - 0xa0;
object = this.decodeUtf8String(byteLength, 0);
object = this.decodeString(byteLength, 0);
}
} else if (headByte === 0xc0) {
// nil
Expand Down Expand Up @@ -451,15 +464,15 @@ export class Decoder<ContextType = undefined> {
} else if (headByte === 0xd9) {
// str 8
const byteLength = this.lookU8();
object = this.decodeUtf8String(byteLength, 1);
object = this.decodeString(byteLength, 1);
} else if (headByte === 0xda) {
// str 16
const byteLength = this.lookU16();
object = this.decodeUtf8String(byteLength, 2);
object = this.decodeString(byteLength, 2);
} else if (headByte === 0xdb) {
// str 32
const byteLength = this.lookU32();
object = this.decodeUtf8String(byteLength, 4);
object = this.decodeString(byteLength, 4);
} else if (headByte === 0xdc) {
// array 16
const size = this.readU16();
Expand Down Expand Up @@ -637,6 +650,13 @@ export class Decoder<ContextType = undefined> {
this.stack.pushArrayState(size);
}

private decodeString(byteLength: number, headerOffset: number): string | Uint8Array {
if (!this.rawStrings || this.stateIsMapKey()) {
return this.decodeUtf8String(byteLength, headerOffset);
}
return this.decodeBinary(byteLength, headerOffset);
}

private decodeUtf8String(byteLength: number, headerOffset: number): string {
if (byteLength > this.maxStrLength) {
throw new DecodeError(
Expand Down
43 changes: 43 additions & 0 deletions test/decode-raw-strings.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import assert from "assert";
import { encode, decode } from "../src";
import type { DecoderOptions } from "../src";

describe("decode with rawStrings specified", () => {
const options = { rawStrings: true } satisfies DecoderOptions;

it("decodes string as binary", () => {
const actual = decode(encode("foo"), options);
const expected = Uint8Array.from([0x66, 0x6f, 0x6f]);
assert.deepStrictEqual(actual, expected);
});

it("decodes invalid UTF-8 string as binary", () => {
const invalidUtf8String = Uint8Array.from([61, 180, 118, 220, 39, 166, 43, 68, 219, 116, 105, 84, 121, 46, 122, 136, 233, 221, 15, 174, 247, 19, 50, 176, 184, 221, 66, 188, 171, 36, 135, 121]);
const encoded = Uint8Array.from([196, 32, 61, 180, 118, 220, 39, 166, 43, 68, 219, 116, 105, 84, 121, 46, 122, 136, 233, 221, 15, 174, 247, 19, 50, 176, 184, 221, 66, 188, 171, 36, 135, 121]);

const actual = decode(encoded, options);
assert.deepStrictEqual(actual, invalidUtf8String);
});

it("decodes object keys as strings", () => {
const actual = decode(encode({ key: "foo" }), options);
const expected = { key: Uint8Array.from([0x66, 0x6f, 0x6f]) };
assert.deepStrictEqual(actual, expected);
});

it("ignores maxStrLength", () => {
const lengthLimitedOptions = { ...options, maxStrLength: 1 } satisfies DecoderOptions;

const actual = decode(encode("foo"), lengthLimitedOptions);
const expected = Uint8Array.from([0x66, 0x6f, 0x6f]);
assert.deepStrictEqual(actual, expected);
});

it("respects maxBinLength", () => {
const lengthLimitedOptions = { ...options, maxBinLength: 1 } satisfies DecoderOptions;

assert.throws(() => {
decode(encode("foo"), lengthLimitedOptions);
}, /max length exceeded/i);
});
});