Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the abilty to get a binary value for a key #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jgaskins
Copy link
Owner

@jgaskins jgaskins commented May 1, 2023

See discussion starting here. This PR does not resolve everything in that issue, but does allow getting non-String data as discussed in some of the comments.

Turns out accepting String | Bytes in set isn't as trivial as I thought it would be. Connection#run emits the command to the debug log and Bytes is not a valid value type for Log::Metadata. 😕 I don't know what would be good to do for that.

@z64
Copy link

z64 commented May 2, 2023

Connection#run emits the command to the debug log and Bytes is not a valid value type for Log::Metadata. confused I don't know what would be good to do for that.

I would just do command.inspect. I find that is usually the easiest way out when logging incompatible metadata values. (IMO, stdlib should just fall back to inspect...).

Should be fine for such debug-level logging.

# redis.get("foo", as: Bytes) # => "bar"
# ```
def get_bytes(key : String)
get(key).try(&.to_slice)
Copy link

@z64 z64 May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observations:

  • As I see it, the real issue is that the lower level reader (parser) assumes io.read_string.

    • If we're talking about efficiency, ideally the reader would just give bytes, then it is decided what to do with it based on what the caller wants, instead of invoking utf8 machinery always.
    • Unfortunately Crystal makes it a hassle to make a new string from a slice without copying. It is totally fine to do in this context, but you would just have to write the boilerplate.
    • The Rust driver copies it. (to_string() heap copies a vec) Same as String#new(slice) would.
  • get_bytes sounds fine and all, but have you thought about how to support all other commands?

  • Strictly speaking, you can have binary keys. UUID is probably a legitimate use-case. Duck typing to_slice might work.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lower level reader (parser) assumes io.read_string.

This is a good callout. I didn't consider the String-vs-Bytes distinction when I wrote this originally.

It's great that there is prior art for this. However, last time I benchmarked, the Rust driver was 2-3x slower than this one, measured at the CPU for the most common operations (SET, GET returning a String, GET returning nil, and INCR). Given the following benchmark, copying vec to strings could be a contributing factor for that.

Benchmark code
require "benchmark"

require "../src/errors"

[
  0,
  1,
  10,
  100,
  1_000,
  10_000,
  100_000,
].each do |bytesize|
  string = "." * bytesize # 10KB
  io = IO::Memory.new("$#{string.bytesize}\r\n#{string}\r\n")

  puts
  puts "#{bytesize.humanize} bytes"
  Benchmark.ips do |x|
    x.report "string" { Parser.new(io.rewind).read_string }
    x.report "bytes" { Parser.new(io.rewind).read_bytes }
    x.report "bytes->string" { String.new(Parser.new(io.rewind).read_bytes.not_nil!) }
  end
end

struct Parser
  def initialize(@io : IO)
  end

  def read_string
    if (byte_marker = @io.read_byte) == '$'.ord
      size = parse_int
      @io.skip 2
      if size >= 0
        value = @io.read_string size
        @io.skip 2
        value
      end
    else
      raise Redis::Error.new("Expected '$', got #{byte_marker.try(&.chr).inspect}")
    end
  end

  def read_bytes
    if (byte_marker = @io.read_byte) == '$'.ord
      size = parse_int
      @io.skip 2
      if size >= 0
        bytes = Bytes.new(size)
        @io.read_fully bytes
        @io.skip 2
        bytes
      end
    else
      raise Redis::Error.new("Expected '$', got #{byte_marker.try(&.chr).inspect}")
    end
  end

  # Copied from the existing parser
  private def parse_int
    int = 0i64
    negative = false
    loop do
      if peek = @io.peek
        case next_byte = peek[0]
        when nil
          break
        when '-'
          negative = true
          @io.skip 1
        when '0'.ord..'9'.ord
          int = (int * 10) + (next_byte - '0'.ord)
          @io.skip 1
        else
          break
        end
      else
        break
      end
    end

    if negative
      -int
    else
      int
    end
  end
end
0.0 bytes
       string  39.21M ( 25.50ns) (± 4.57%)  16.0B/op        fastest
        bytes  37.19M ( 26.89ns) (± 1.80%)  16.0B/op   1.05× slower
bytes->string  35.44M ( 28.22ns) (± 5.37%)  16.0B/op   1.11× slower

1.0 bytes
       string  13.69M ( 73.07ns) (± 2.12%)  16.0B/op   1.25× slower
        bytes  17.13M ( 58.39ns) (± 2.24%)  16.0B/op        fastest
bytes->string   8.74M (114.41ns) (± 2.32%)  32.0B/op   1.96× slower

10.0 bytes
       string  10.78M ( 92.76ns) (± 3.45%)  32.0B/op   1.21× slower
        bytes  13.09M ( 76.39ns) (± 2.04%)  16.0B/op        fastest
bytes->string   6.67M (149.91ns) (± 2.42%)  48.0B/op   1.96× slower

100 bytes
       string   9.12M (109.69ns) (± 2.59%)  128B/op        fastest
        bytes   8.79M (113.76ns) (± 2.69%)  112B/op   1.04× slower
bytes->string   5.01M (199.48ns) (± 1.83%)  240B/op   1.82× slower

1.0k bytes
       string   2.50M (400.74ns) (± 1.43%)  1.0kB/op        fastest
        bytes   2.44M (409.12ns) (± 1.33%)  1.0kB/op   1.02× slower
bytes->string   1.29M (776.61ns) (± 1.72%)  2.0kB/op   1.94× slower

10.0k bytes
       string 488.07k (  2.05µs) (± 1.08%)   9.8kB/op        fastest
        bytes 381.95k (  2.62µs) (± 1.26%)   9.8kB/op   1.28× slower
bytes->string 193.47k (  5.17µs) (± 0.80%)  19.5kB/op   2.52× slower

100k bytes
       string  63.69k ( 15.70µs) (± 0.99%)   98kB/op        fastest
        bytes  58.27k ( 17.16µs) (± 2.28%)   98kB/op   1.09× slower
bytes->string  30.15k ( 33.17µs) (± 1.04%)  195kB/op   2.11× slower
  • If we're talking about efficiency, ideally the reader would just give bytes

I ran the benchmark above on one of our production servers (a C3 instance on GCP) because I wanted to test this theory since Crystal zeroes out memory. Turns out, which way is more efficient depends on the size of the value. 🙃 I received these results consistently between runs of the benchmark code, which effectively rules out unlucky timing.

Since the copy operation requires additional memory and takes 1.8x-2.5x as long, and since storing text in Redis is an extremely common use case, it seems that as long as Crystal doesn't object to storing binary data in String instances, it may be a good compromise. It's not strictly correct, but I haven't run into any issues with it despite using it all the time, so it seems like an acceptable compromise to get the performance boost. And honestly, it's nice knowing that this is one of the most performant Redis clients in existence (even if we've gotta cut a corner like this to make that happen 😄) while still having a pretty nice API. Maybe Rust pays the copy tax in this scenario because it doesn't allow cutting this particular corner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants