Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the abilty to get a binary value for a key #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions spec/redis_spec.cr
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ describe Redis::Client do
redis.get(key).should eq nil
end

test "can get a string key as Bytes" do
redis.set key, "bar"
redis.get_bytes(key).should eq "bar".to_slice
end

test "can set expiration timestamps on keys" do
redis.set key, "foo", ex: 10.milliseconds.from_now
redis.get(key).should eq "foo"
Expand Down
11 changes: 11 additions & 0 deletions src/connection.cr
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,17 @@ module Redis
xrevrange: Array,
xtrim: Int64,
})


# Get the value for the specified key
#
# ```
# redis.set "foo", "bar"
# redis.get("foo", as: Bytes) # => "bar"
# ```
def get_bytes(key : String)
get(key).try(&.to_slice)
Copy link

@z64 z64 May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observations:

  • As I see it, the real issue is that the lower level reader (parser) assumes io.read_string.

    • If we're talking about efficiency, ideally the reader would just give bytes, then it is decided what to do with it based on what the caller wants, instead of invoking utf8 machinery always.
    • Unfortunately Crystal makes it a hassle to make a new string from a slice without copying. It is totally fine to do in this context, but you would just have to write the boilerplate.
    • The Rust driver copies it. (to_string() heap copies a vec) Same as String#new(slice) would.
  • get_bytes sounds fine and all, but have you thought about how to support all other commands?

  • Strictly speaking, you can have binary keys. UUID is probably a legitimate use-case. Duck typing to_slice might work.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lower level reader (parser) assumes io.read_string.

This is a good callout. I didn't consider the String-vs-Bytes distinction when I wrote this originally.

It's great that there is prior art for this. However, last time I benchmarked, the Rust driver was 2-3x slower than this one, measured at the CPU for the most common operations (SET, GET returning a String, GET returning nil, and INCR). Given the following benchmark, copying vec to strings could be a contributing factor for that.

Benchmark code
require "benchmark"

require "../src/errors"

[
  0,
  1,
  10,
  100,
  1_000,
  10_000,
  100_000,
].each do |bytesize|
  string = "." * bytesize # 10KB
  io = IO::Memory.new("$#{string.bytesize}\r\n#{string}\r\n")

  puts
  puts "#{bytesize.humanize} bytes"
  Benchmark.ips do |x|
    x.report "string" { Parser.new(io.rewind).read_string }
    x.report "bytes" { Parser.new(io.rewind).read_bytes }
    x.report "bytes->string" { String.new(Parser.new(io.rewind).read_bytes.not_nil!) }
  end
end

struct Parser
  def initialize(@io : IO)
  end

  def read_string
    if (byte_marker = @io.read_byte) == '$'.ord
      size = parse_int
      @io.skip 2
      if size >= 0
        value = @io.read_string size
        @io.skip 2
        value
      end
    else
      raise Redis::Error.new("Expected '$', got #{byte_marker.try(&.chr).inspect}")
    end
  end

  def read_bytes
    if (byte_marker = @io.read_byte) == '$'.ord
      size = parse_int
      @io.skip 2
      if size >= 0
        bytes = Bytes.new(size)
        @io.read_fully bytes
        @io.skip 2
        bytes
      end
    else
      raise Redis::Error.new("Expected '$', got #{byte_marker.try(&.chr).inspect}")
    end
  end

  # Copied from the existing parser
  private def parse_int
    int = 0i64
    negative = false
    loop do
      if peek = @io.peek
        case next_byte = peek[0]
        when nil
          break
        when '-'
          negative = true
          @io.skip 1
        when '0'.ord..'9'.ord
          int = (int * 10) + (next_byte - '0'.ord)
          @io.skip 1
        else
          break
        end
      else
        break
      end
    end

    if negative
      -int
    else
      int
    end
  end
end
0.0 bytes
       string  39.21M ( 25.50ns) (± 4.57%)  16.0B/op        fastest
        bytes  37.19M ( 26.89ns) (± 1.80%)  16.0B/op   1.05× slower
bytes->string  35.44M ( 28.22ns) (± 5.37%)  16.0B/op   1.11× slower

1.0 bytes
       string  13.69M ( 73.07ns) (± 2.12%)  16.0B/op   1.25× slower
        bytes  17.13M ( 58.39ns) (± 2.24%)  16.0B/op        fastest
bytes->string   8.74M (114.41ns) (± 2.32%)  32.0B/op   1.96× slower

10.0 bytes
       string  10.78M ( 92.76ns) (± 3.45%)  32.0B/op   1.21× slower
        bytes  13.09M ( 76.39ns) (± 2.04%)  16.0B/op        fastest
bytes->string   6.67M (149.91ns) (± 2.42%)  48.0B/op   1.96× slower

100 bytes
       string   9.12M (109.69ns) (± 2.59%)  128B/op        fastest
        bytes   8.79M (113.76ns) (± 2.69%)  112B/op   1.04× slower
bytes->string   5.01M (199.48ns) (± 1.83%)  240B/op   1.82× slower

1.0k bytes
       string   2.50M (400.74ns) (± 1.43%)  1.0kB/op        fastest
        bytes   2.44M (409.12ns) (± 1.33%)  1.0kB/op   1.02× slower
bytes->string   1.29M (776.61ns) (± 1.72%)  2.0kB/op   1.94× slower

10.0k bytes
       string 488.07k (  2.05µs) (± 1.08%)   9.8kB/op        fastest
        bytes 381.95k (  2.62µs) (± 1.26%)   9.8kB/op   1.28× slower
bytes->string 193.47k (  5.17µs) (± 0.80%)  19.5kB/op   2.52× slower

100k bytes
       string  63.69k ( 15.70µs) (± 0.99%)   98kB/op        fastest
        bytes  58.27k ( 17.16µs) (± 2.28%)   98kB/op   1.09× slower
bytes->string  30.15k ( 33.17µs) (± 1.04%)  195kB/op   2.11× slower
  • If we're talking about efficiency, ideally the reader would just give bytes

I ran the benchmark above on one of our production servers (a C3 instance on GCP) because I wanted to test this theory since Crystal zeroes out memory. Turns out, which way is more efficient depends on the size of the value. 🙃 I received these results consistently between runs of the benchmark code, which effectively rules out unlucky timing.

Since the copy operation requires additional memory and takes 1.8x-2.5x as long, and since storing text in Redis is an extremely common use case, it seems that as long as Crystal doesn't object to storing binary data in String instances, it may be a good compromise. It's not strictly correct, but I haven't run into any issues with it despite using it all the time, so it seems like an acceptable compromise to get the performance boost. And honestly, it's nice knowing that this is one of the most performant Redis clients in existence (even if we've gotta cut a corner like this to make that happen 😄) while still having a pretty nice API. Maybe Rust pays the copy tax in this scenario because it doesn't allow cutting this particular corner?

end
end

set_return_types!
Expand Down