-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the abilty to get a binary value for a key #42
base: master
Are you sure you want to change the base?
Conversation
I would just do Should be fine for such debug-level logging. |
# redis.get("foo", as: Bytes) # => "bar" | ||
# ``` | ||
def get_bytes(key : String) | ||
get(key).try(&.to_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observations:
-
As I see it, the real issue is that the lower level reader (
parser
) assumesio.read_string
.- If we're talking about efficiency, ideally the reader would just give bytes, then it is decided what to do with it based on what the caller wants, instead of invoking utf8 machinery always.
- Unfortunately Crystal makes it a hassle to make a new string from a slice without copying. It is totally fine to do in this context, but you would just have to write the boilerplate.
- The Rust driver copies it. (
to_string()
heap copies a vec) Same asString#new(slice)
would.
-
get_bytes
sounds fine and all, but have you thought about how to support all other commands? -
Strictly speaking, you can have binary keys. UUID is probably a legitimate use-case. Duck typing
to_slice
might work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lower level reader (
parser
) assumesio.read_string
.
This is a good callout. I didn't consider the String
-vs-Bytes
distinction when I wrote this originally.
- The Rust driver copies it. (
to_string()
heap copies a vec) Same asString#new(slice)
would.
It's great that there is prior art for this. However, last time I benchmarked, the Rust driver was 2-3x slower than this one, measured at the CPU for the most common operations (SET
, GET
returning a String
, GET
returning nil
, and INCR
). Given the following benchmark, copying vec
to strings could be a contributing factor for that.
Benchmark code
require "benchmark"
require "../src/errors"
[
0,
1,
10,
100,
1_000,
10_000,
100_000,
].each do |bytesize|
string = "." * bytesize # 10KB
io = IO::Memory.new("$#{string.bytesize}\r\n#{string}\r\n")
puts
puts "#{bytesize.humanize} bytes"
Benchmark.ips do |x|
x.report "string" { Parser.new(io.rewind).read_string }
x.report "bytes" { Parser.new(io.rewind).read_bytes }
x.report "bytes->string" { String.new(Parser.new(io.rewind).read_bytes.not_nil!) }
end
end
struct Parser
def initialize(@io : IO)
end
def read_string
if (byte_marker = @io.read_byte) == '$'.ord
size = parse_int
@io.skip 2
if size >= 0
value = @io.read_string size
@io.skip 2
value
end
else
raise Redis::Error.new("Expected '$', got #{byte_marker.try(&.chr).inspect}")
end
end
def read_bytes
if (byte_marker = @io.read_byte) == '$'.ord
size = parse_int
@io.skip 2
if size >= 0
bytes = Bytes.new(size)
@io.read_fully bytes
@io.skip 2
bytes
end
else
raise Redis::Error.new("Expected '$', got #{byte_marker.try(&.chr).inspect}")
end
end
# Copied from the existing parser
private def parse_int
int = 0i64
negative = false
loop do
if peek = @io.peek
case next_byte = peek[0]
when nil
break
when '-'
negative = true
@io.skip 1
when '0'.ord..'9'.ord
int = (int * 10) + (next_byte - '0'.ord)
@io.skip 1
else
break
end
else
break
end
end
if negative
-int
else
int
end
end
end
0.0 bytes
string 39.21M ( 25.50ns) (± 4.57%) 16.0B/op fastest
bytes 37.19M ( 26.89ns) (± 1.80%) 16.0B/op 1.05× slower
bytes->string 35.44M ( 28.22ns) (± 5.37%) 16.0B/op 1.11× slower
1.0 bytes
string 13.69M ( 73.07ns) (± 2.12%) 16.0B/op 1.25× slower
bytes 17.13M ( 58.39ns) (± 2.24%) 16.0B/op fastest
bytes->string 8.74M (114.41ns) (± 2.32%) 32.0B/op 1.96× slower
10.0 bytes
string 10.78M ( 92.76ns) (± 3.45%) 32.0B/op 1.21× slower
bytes 13.09M ( 76.39ns) (± 2.04%) 16.0B/op fastest
bytes->string 6.67M (149.91ns) (± 2.42%) 48.0B/op 1.96× slower
100 bytes
string 9.12M (109.69ns) (± 2.59%) 128B/op fastest
bytes 8.79M (113.76ns) (± 2.69%) 112B/op 1.04× slower
bytes->string 5.01M (199.48ns) (± 1.83%) 240B/op 1.82× slower
1.0k bytes
string 2.50M (400.74ns) (± 1.43%) 1.0kB/op fastest
bytes 2.44M (409.12ns) (± 1.33%) 1.0kB/op 1.02× slower
bytes->string 1.29M (776.61ns) (± 1.72%) 2.0kB/op 1.94× slower
10.0k bytes
string 488.07k ( 2.05µs) (± 1.08%) 9.8kB/op fastest
bytes 381.95k ( 2.62µs) (± 1.26%) 9.8kB/op 1.28× slower
bytes->string 193.47k ( 5.17µs) (± 0.80%) 19.5kB/op 2.52× slower
100k bytes
string 63.69k ( 15.70µs) (± 0.99%) 98kB/op fastest
bytes 58.27k ( 17.16µs) (± 2.28%) 98kB/op 1.09× slower
bytes->string 30.15k ( 33.17µs) (± 1.04%) 195kB/op 2.11× slower
- If we're talking about efficiency, ideally the reader would just give bytes
I ran the benchmark above on one of our production servers (a C3 instance on GCP) because I wanted to test this theory since Crystal zeroes out memory. Turns out, which way is more efficient depends on the size of the value. 🙃 I received these results consistently between runs of the benchmark code, which effectively rules out unlucky timing.
Since the copy operation requires additional memory and takes 1.8x-2.5x as long, and since storing text in Redis is an extremely common use case, it seems that as long as Crystal doesn't object to storing binary data in String
instances, it may be a good compromise. It's not strictly correct, but I haven't run into any issues with it despite using it all the time, so it seems like an acceptable compromise to get the performance boost. And honestly, it's nice knowing that this is one of the most performant Redis clients in existence (even if we've gotta cut a corner like this to make that happen 😄) while still having a pretty nice API. Maybe Rust pays the copy tax in this scenario because it doesn't allow cutting this particular corner?
See discussion starting here. This PR does not resolve everything in that issue, but does allow getting non-
String
data as discussed in some of the comments.Turns out accepting
String | Bytes
inset
isn't as trivial as I thought it would be.Connection#run
emits the command to the debug log andBytes
is not a valid value type forLog::Metadata
. 😕 I don't know what would be good to do for that.