-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce SQL sanitizer allocations #2136
base: master
Are you sure you want to change the base?
Conversation
fix benchmmark script fix benchmark script
check new quoteBytes
use lexer pool
benchmark diffs for concrete optimisations
|
LGTM. But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging. |
lgtm, i'm try to check on my hot path in next few days. |
In my tests i don't saw any issues. |
It would be very much appreciated if you could suggest someone I can tag on this issue. I'm also in the process of writing more tests for SQL injection + more fuzzing |
I wish I could. Unfortunately, I don't know of anyone.
👍 It's been a couple weeks since I reviewed the code, so I can review it again with fresh eyes now. It's not quite as good as multiple reviewers, but at least it will be multiple reviews. I'll wait until you add the additional tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the inline suggested changes. These small set of optimizations reduced the sec/op from 607.9n
down to 604.9n
(-0.49%). I can push this as a different PR or these changes can be incorporated into this branch.
$ bash benchmmark.sh 2ec900454bfe65daa9648488e93f7627c26b810c 82642726914a8b054ca123fd87c4d984da6d78eb 431e11b61c809c2373128ecf63ed48cf8bdf4dd4 71c3b107187b02ea44dbc7d38e931115ca7286c7
$ benchstat benchmarks/*.bench
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/internal/sanitize
cpu: Apple M3 Pro
│ benchmarks/1_fix_preallocations_of_quoted_string.bench │ benchmarks/2_optimize_QuoteBytes.bench │ benchmarks/2_rework_QuoteString_to_iterate_over_bytes.bench │ benchmarks/3_optimize_QuoteBytes.bench │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │
Sanitize-12 307.1n ± 1% 300.6n ± 2% -2.10% (p=0.001 n=10) 305.3n ± 1% -0.57% (p=0.015 n=10) 304.2n ± 1% -0.93% (p=0.003 n=10)
SanitizeSQL-12 1.204µ ± 2% 1.207µ ± 1% ~ (p=0.100 n=10) 1.204µ ± 2% ~ (p=0.697 n=10) 1.203µ ± 2% ~ (p=0.898 n=10)
geomean 607.9n 602.3n -0.91% 606.3n -0.26% 604.9n -0.49%
│ benchmarks/1_fix_preallocations_of_quoted_string.bench │ benchmarks/2_optimize_QuoteBytes.bench │ benchmarks/2_rework_QuoteString_to_iterate_over_bytes.bench │ benchmarks/3_optimize_QuoteBytes.bench │
│ B/op │ B/op vs base │ B/op vs base │ B/op vs base │
Sanitize-12 424.0 ± 0% 424.0 ± 0% ~ (p=1.000 n=10) ¹ 424.0 ± 0% ~ (p=1.000 n=10) ¹ 424.0 ± 0% ~ (p=1.000 n=10) ¹
SanitizeSQL-12 552.0 ± 0% 552.0 ± 0% ~ (p=1.000 n=10) ¹ 552.0 ± 0% ~ (p=1.000 n=10) ¹ 552.0 ± 0% ~ (p=1.000 n=10) ¹
geomean 483.8 483.8 +0.00% 483.8 +0.00% 483.8 +0.00%
¹ all samples are equal
│ benchmarks/1_fix_preallocations_of_quoted_string.bench │ benchmarks/2_optimize_QuoteBytes.bench │ benchmarks/2_rework_QuoteString_to_iterate_over_bytes.bench │ benchmarks/3_optimize_QuoteBytes.bench │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │
Sanitize-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ 2.000 ± 0% ~ (p=1.000 n=10) ¹ 2.000 ± 0% ~ (p=1.000 n=10) ¹
SanitizeSQL-12 10.00 ± 0% 10.00 ± 0% ~ (p=1.000 n=10) ¹ 10.00 ± 0% ~ (p=1.000 n=10) ¹ 10.00 ± 0% ~ (p=1.000 n=10) ¹
geomean 4.472 4.472 +0.00% 4.472 +0.00% 4.472 +0.00%
¹ all samples are equal
} | ||
|
||
# Sanitized commmit message | ||
commit_message=$(git log -1 --pretty=format:"%s" | tr ' ' '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to escape /
:
commit_message=$(git log -1 --pretty=format:"%s" | tr -c '[:alnum:]-_' '_')
bench_files+=("$bench_file") | ||
done | ||
|
||
benchstat "${bench_files[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you prefix with a small comment: # go install golang.org/x/perf/cmd/benchstat@latest
|
||
dst = append(dst, quote...) | ||
|
||
return dst | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purely a style nit, but I don't like reslicing for these types of functions because it's not idiomatic and hard to follow. I took the above QuoteString()
and replaced it with something that uses an iterator:
func QuoteString(dst []byte, str string) []byte {
const quote = '\''
// Preallocate space for the worst case scenario
dst = slices.Grow(dst, len(str)*2+2)
// Add opening quote
dst = append(dst, quote)
// Iterate through the string without allocating
for i := 0; i < len(str); i++ {
if str[i] == quote {
dst = append(dst, quote, quote)
} else {
dst = append(dst, str[i])
}
}
// Add closing quote
dst = append(dst, quote)
return dst
}
dst = append(dst, p...) | ||
|
||
dst = append(dst, `'`...) | ||
return dst | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to measure an improvement by optimizing this function:
func QuoteBytes(dst, buf []byte) []byte {
if len(buf) == 0 {
return append(dst, `'\x'`...)
}
// Calculate required length
requiredLen := 3 + hex.EncodedLen(len(buf)) + 1
// Ensure dst has enough capacity
if cap(dst)-len(dst) < requiredLen {
newDst := make([]byte, len(dst), len(dst)+requiredLen)
copy(newDst, dst)
dst = newDst
}
// Record original length and extend slice
origLen := len(dst)
dst = dst[:origLen+requiredLen]
// Add prefix
dst[origLen] = '\''
dst[origLen+1] = '\\'
dst[origLen+2] = 'x'
// Encode bytes directly into dst
hex.Encode(dst[origLen+3:len(dst)-1], buf)
// Add suffix
dst[len(dst)-1] = '\''
return dst
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the inline suggested changes. These small set of optimizations reduced the sec/op from 607.9n
down to 604.9n
(-0.49%). I can push this as a different PR or these changes can be incorporated into this branch.
$ bash benchmmark.sh 2ec900454bfe65daa9648488e93f7627c26b810c 82642726914a8b054ca123fd87c4d984da6d78eb 431e11b61c809c2373128ecf63ed48cf8bdf4dd4 71c3b107187b02ea44dbc7d38e931115ca7286c7
$ benchstat benchmarks/*.bench
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/internal/sanitize
cpu: Apple M3 Pro
│ benchmarks/1_fix_preallocations_of_quoted_string.bench │ benchmarks/2_optimize_QuoteBytes.bench │ benchmarks/2_rework_QuoteString_to_iterate_over_bytes.bench │ benchmarks/3_optimize_QuoteBytes.bench │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │
Sanitize-12 307.1n ± 1% 300.6n ± 2% -2.10% (p=0.001 n=10) 305.3n ± 1% -0.57% (p=0.015 n=10) 304.2n ± 1% -0.93% (p=0.003 n=10)
SanitizeSQL-12 1.204µ ± 2% 1.207µ ± 1% ~ (p=0.100 n=10) 1.204µ ± 2% ~ (p=0.697 n=10) 1.203µ ± 2% ~ (p=0.898 n=10)
geomean 607.9n 602.3n -0.91% 606.3n -0.26% 604.9n -0.49%
│ benchmarks/1_fix_preallocations_of_quoted_string.bench │ benchmarks/2_optimize_QuoteBytes.bench │ benchmarks/2_rework_QuoteString_to_iterate_over_bytes.bench │ benchmarks/3_optimize_QuoteBytes.bench │
│ B/op │ B/op vs base │ B/op vs base │ B/op vs base │
Sanitize-12 424.0 ± 0% 424.0 ± 0% ~ (p=1.000 n=10) ¹ 424.0 ± 0% ~ (p=1.000 n=10) ¹ 424.0 ± 0% ~ (p=1.000 n=10) ¹
SanitizeSQL-12 552.0 ± 0% 552.0 ± 0% ~ (p=1.000 n=10) ¹ 552.0 ± 0% ~ (p=1.000 n=10) ¹ 552.0 ± 0% ~ (p=1.000 n=10) ¹
geomean 483.8 483.8 +0.00% 483.8 +0.00% 483.8 +0.00%
¹ all samples are equal
│ benchmarks/1_fix_preallocations_of_quoted_string.bench │ benchmarks/2_optimize_QuoteBytes.bench │ benchmarks/2_rework_QuoteString_to_iterate_over_bytes.bench │ benchmarks/3_optimize_QuoteBytes.bench │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │
Sanitize-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ 2.000 ± 0% ~ (p=1.000 n=10) ¹ 2.000 ± 0% ~ (p=1.000 n=10) ¹
SanitizeSQL-12 10.00 ± 0% 10.00 ± 0% ~ (p=1.000 n=10) ¹ 10.00 ± 0% ~ (p=1.000 n=10) ¹ 10.00 ± 0% ~ (p=1.000 n=10) ¹
geomean 4.472 4.472 +0.00% 4.472 +0.00% 4.472 +0.00%
¹ all samples are equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[review comment was posted twice for some reason]
#2124
Result:
Main optimizations:
sync.Pool
for byte buffers, lexers and parsed query structsint64
,float64
andtime.Time
+bytes.Buffer.AvailableBuffer
QuoteString
andQuoteBytes
to append-style (with tests for backwards compatibility)Misc changes:
Query.Sanitize
andSanitizeSQL
functionsQuoteString
andQuoteBytes
(I did'n find any problems for 1h of fuzzing, but you can never be sure for 100%)Since optimization is an extremely hard problem, I think it's worth checking some more benchmarks.
I would be very grateful for your opinion on this and recommendations/advice, @jackc @vtolstov