Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser tweaks #64

Merged
merged 5 commits into from
Nov 11, 2024
Merged

Parser tweaks #64

merged 5 commits into from
Nov 11, 2024

Conversation

defnull
Copy link
Owner

@defnull defnull commented Oct 31, 2024

Multiple parser improvements

fix: Allow CRLF in front of first boundary even in strict mode

Browsers do not do this, but some HTTP client libaries do and it's
technically allowed.

fix: Fail on invalid first boundary instead of skipping the first segment

As per spec, the start boundary must be at position zero, or start with CRLF
to separate it from the preamble. Boundaries are forbidden in segment
bodies, but not in the preamble. This means that a preamble can actually
contain the boundary, as long as it does not start with CRLF. This is
nonsense, so let's ignore the spec here. A preamble that contains the
boundary is so rare and suspicious that we assume a broken or malicious
client and fail fast, even in non-strict mode. This is way better than
silently skipping the first segment and loosing data.

fix: Accept tiny (1 byte) chunks and arbitrary chunk borders

There were edge cases where a very uncommon chunk border would break the
parser. This is fixed and properly tested now.

fix: Fail if message did not contain the boundary at all.

In strict mode, we fail very fast (because the message is either empty, or
the message does not start with a boundary. In non-strict mode, we fail
at the end when the parser closes in PREAMBLE state.

@defnull defnull mentioned this pull request Nov 10, 2024
fix: Allow CRLF in front of first boundary even in strict mode

  Browsers do not do this, but some HTTP client libaries do and it's
  technically allowed.

fix: Fail on invalid first boundary instead of skipping the first segment

  As per spec, the start boundary must be at position zero, or start with CRLF
  to separate it from the preamble. Boundaries are forbidden in segment
  bodies, but not in the preamble. This means that a preamble can actually
  contain the boundary, as long as it does not start with CRLF. This is
  nonsense, so let's ignore the spec here. A preamble that contains the
  boundary is so rare and suspicious that we assume a broken or malicious
  client and fail fast, even in non-strict mode. This is way better than
  silently skipping the first segment and loosing data.

fix: Accept tiny (1 byte) chunks and arbitrary chunk borders

  There were edge cases where a very uncommon chunk border would break the
  parser. This is fixed and properly tested now.

fix: Fail if message did not contain the boundary at all.

  In strict mode, we fail very fast (because the message is either empty, or
  the message does not start with a boundary. In non-strict mode, we fail
  at the end when the parser closes in PREAMBLE state.
We deliberately break spec and reject input that is technically valid. The test that ensures this should be clear on why that's a good idea.
@defnull
Copy link
Owner Author

defnull commented Nov 10, 2024

I'll update the changelog, then this can be merged I think.

@defnull defnull merged commit 203b1d1 into master Nov 11, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant