Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize IO::Delimited #11242

Merged
merged 15 commits into from
Jul 25, 2023
Merged

Optimize IO::Delimited #11242

merged 15 commits into from
Jul 25, 2023

Conversation

asterite
Copy link
Member

IO::Delimited will now use the underlying's IO#peek if available

When IO#peek is available we can do much better!

Benchmark:

require "benchmark"
require "io/delimited"

string = ("a" * 10_000) + "\n" + ("b" * 10_000)
File.write("text.txt", string)

delimiters = ["\n", "\nbbbbbbbbbb"]

file = File.open("text.txt")
buffer = uninitialized UInt8[20_000]

Benchmark.ips do |x|
  delimiters.each do |delimiter|
    x.report("delimiter: #{delimiter.inspect}") do
      file.rewind
      delimited = IO::Delimited.new(file, read_delimiter: delimiter)
      delimited.read(buffer.to_slice)
    end
  end
end

Output before this PR:

          delimiter: "\n"  11.15k ( 89.67µs) (±11.49%)  160B/op   1.14× slower
delimiter: "\nbbbbbbbbbb"  12.74k ( 78.49µs) (± 6.13%)  160B/op        fastest

Output after this PR:

          delimiter: "\n" 634.93k (  1.57µs) (± 4.41%)  160B/op        fastest
delimiter: "\nbbbbbbbbbb" 633.12k (  1.58µs) (± 4.34%)  160B/op   1.00× slower

IO::Delimited now implements peek

As it turns out, IO::Delimited is used in the standard library for multipart parsing:

body_io = IO::Delimited.new(@io, read_delimiter: @delimiter)
headers = parse_headers(body_io)

while line = io.gets(chomp: false)

But, as you can see, it calls gets on the IO::Delimited. gets works best when the IO is peekable, because then it can quickly scan the delimiter (say, "\n" if no arguments are given) in the peek buffer.

So, this PR also implements peek for IO::Delimited, so gets can use the optimized version.

require "benchmark"
require "io/delimited"

string = ("a" * 10_000) + "\n" + ("a" * 10) + ("b" * 10_000)
File.write("text.txt", string)

file = File.open("text.txt")
buffer = uninitialized UInt8[20_000]

Benchmark.ips do |x|
  x.report("gets") do
    file.rewind
    delimited = IO::Delimited.new(file, read_delimiter: "bbb")
    delimited.gets
  end
end

Output before this PR:

gets   5.57k (179.61µs) (± 3.40%)  42.5kB/op  fastest

Output after this PR:

gets 177.35k (  5.64µs) (± 2.29%)  26.2kB/op  fastest

An extra fix

IO#gets, when peek was available, would use that. However, when it exhausted the peek buffer it would call peek again. If peek returned nil, it would just stop consuming input right there. That's wrong! When peek returns nil it means the IO can't be peeked. In reality this could probably never happen again because there was no IO where peek would sometimes return nil and sometimes a slice. But in IO::Delimited's case that can happen: if the peek buffer contains a portion of the delimiter right at the beginning of the peek buffer, but not in its entirety, we can't know if the rest of the delimiter will come next or not. So we can't peek, and nil is returned.

How to review

I recommend going commit by commit.

src/io/delimited.cr Outdated Show resolved Hide resolved
src/io/delimited.cr Outdated Show resolved Hide resolved
src/io/delimited.cr Outdated Show resolved Hide resolved
src/io/delimited.cr Outdated Show resolved Hide resolved
@asterite
Copy link
Member Author

@straight-shoota wow, thanks for the through review! ❤️

I'll send fixes later today or the next week.

@asterite
Copy link
Member Author

As a follow up, I just realized IO::Buffered#peek returns a slice to the internal buffer. Client's aren't supposed to write to that slice, but if they do then it will change the data returned by that IO. We could easily change that by returning a different slice that's read_only, so that mistake/bugs goes away. But this is for another PR.

straight-shoota
straight-shoota previously approved these changes Dec 4, 2021
src/io.cr Show resolved Hide resolved
src/io.cr Outdated Show resolved Hide resolved
src/io/delimited.cr Outdated Show resolved Hide resolved
src/io.cr Show resolved Hide resolved
@asterite
Copy link
Member Author

@straight-shoota I think I missed it or after so much time I forgot... what's the potential bug?

@asterite
Copy link
Member Author

Oops, now Windows CI suddenly exits. Maybe something is broken with this code...

@asterite
Copy link
Member Author

Should we eventually merge this? It's a good optimization.

@straight-shoota
Copy link
Member

straight-shoota commented Sep 29, 2022

Yes, I think it's a good change. But it's also delicate. We need to be careful about vetting the behaviour for correctness.

This concern from my last review is still unaddressed: #11242 (comment)

@asterite
Copy link
Member Author

Ah, I think I didn't address it because I didn't understand it. Too much time has passed now that I lost context, so I don't think I'll be able to continue working on this.

@jgaskins
Copy link
Contributor

jgaskins commented Jul 21, 2023

This PR just came up on the forum and it turns out to be a massive performance boost for large multipart uploads being processed on the server. With this code:

require "http"

http = HTTP::Server.new do |context|
  MIME::Multipart.parse context.request do |headers, io|
    file_path = "public"
    File.open(file_path, "w") do |f|
      IO.copy(io, f)
    end
  end
end

http.listen 3000

Using the latest release of Crystal, I uploaded a ~2GB file to it, and it transfers at 162MB/sec, taking about 10 seconds:

➜  upload_demo git:(main) curl -X POST  -F "image1=@file" http://localhost:3000/upload -o test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 56 1839M    0     0   56 1034M      0   162M  0:00:11  0:00:06  0:00:05  162M

With this patch, it transfers at 1.7GB/sec in barely over a second:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1839M    0     0  100 1839M      0  1678M  0:00:01  0:00:01 --:--:-- 1685M

EDIT: I'd forgotten to compile the app in release mode.

@zw963
Copy link
Contributor

zw963 commented Jul 23, 2023

Really cool improvement, we should merge it to avoid it become stale again.

@asterite
Copy link
Member Author

@straight-shoota I just pushed another commit that addresses your concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants