-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 add benchmark pipeline #1651
base: main
Are you sure you want to change the base?
🌱 add benchmark pipeline #1651
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1651 +/- ##
=======================================
Coverage 67.45% 67.45%
=======================================
Files 61 61
Lines 5245 5245
=======================================
Hits 3538 3538
Misses 1446 1446
Partials 261 261
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10 1 2093042334 ns/op
BenchmarkCreateClusterCatalog-10 4 611432146 ns/op
BenchmarkCreateClusterCatalog-10 10 224809304 ns/op
BenchmarkCreateClusterCatalog-10 13 92630189 ns/op
BenchmarkCreateClusterCatalog-10 6 174559444 ns/op
BenchmarkCreateClusterCatalog-10 12 107088038 ns/op
BenchmarkCreateClusterCatalog-10 1 1003581583 ns/op
BenchmarkCreateClusterCatalog-10 3 379469264 ns/op
BenchmarkCreateClusterCatalog-10 2 606936271 ns/op
BenchmarkCreateClusterCatalog-10 1 2773485917 ns/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 34.115s
jiazha-mac:e2e jiazha$ go tool pprof mem.out
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 5:22pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 8433.33kB, 62.21% of 13555.35kB total
Showing top 10 nodes out of 126
flat flat% sum% cum cum%
1536.16kB 11.33% 11.33% 1536.16kB 11.33% golang.org/x/net/http2.(*ClientConn).roundTrip
1536.09kB 11.33% 22.66% 1536.09kB 11.33% path.Join
1184.27kB 8.74% 31.40% 1184.27kB 8.74% runtime/pprof.StartCPUProfile
902.59kB 6.66% 38.06% 1553.21kB 11.46% compress/flate.NewWriter
650.62kB 4.80% 42.86% 650.62kB 4.80% compress/flate.(*compressor).init
553.04kB 4.08% 46.94% 553.04kB 4.08% github.com/gogo/protobuf/proto.RegisterType
528.17kB 3.90% 50.84% 528.17kB 3.90% regexp.(*bitState).reset
516.01kB 3.81% 54.64% 516.01kB 3.81% google.golang.org/protobuf/internal/filedesc.(*File).initDecls
513.69kB 3.79% 58.43% 513.69kB 3.79% regexp.mergeRuneSets.func2
512.69kB 3.78% 62.21% 512.69kB 3.78% regexp/syntax.(*compiler).inst
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof cpu.out
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 5:22pm (CST)
Duration: 33.19s, Total samples = 380ms ( 1.14%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 310ms, 81.58% of 380ms total
Showing top 10 nodes out of 154
flat flat% sum% cum cum%
90ms 23.68% 23.68% 90ms 23.68% runtime.kevent
50ms 13.16% 36.84% 50ms 13.16% runtime.pthread_cond_signal
50ms 13.16% 50.00% 50ms 13.16% syscall.syscall
40ms 10.53% 60.53% 40ms 10.53% runtime.pthread_cond_wait
30ms 7.89% 68.42% 60ms 15.79% runtime.scanobject
10ms 2.63% 71.05% 20ms 5.26% k8s.io/client-go/rest.(*Request).tryThrottleWithInfo
10ms 2.63% 73.68% 10ms 2.63% runtime.(*itabTableType).find
10ms 2.63% 76.32% 10ms 2.63% runtime.(*mheap).allocSpan
10ms 2.63% 78.95% 10ms 2.63% runtime.(*mspan).heapBitsSmallForAddr
10ms 2.63% 81.58% 10ms 2.63% runtime.(*unwinder).resolveInternal
(pprof) exit |
test/e2e/benchmark_test.go
Outdated
} | ||
|
||
// GetRandomString generates a random string of the given length | ||
func getRandomString(length int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k8s.io/apimachinery has a rand.String
function that can be used instead of us maintaining a separate implementation. https://pkg.go.dev/k8s.io/apimachinery/pkg/util/rand#String
"time" | ||
) | ||
|
||
func BenchmarkCreateClusterCatalog(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this benchmark tells us a whole lot because it is mostly measuring:
- client CPU/memory used to submit a CREATE request to the apiserver
- client CPU/memory used to submit a DELETE request to the apiserver
But it notably isn't able to account for the resources or time spent by the controller to reconcile the ClusterCatalog, which I think is what we actually want to measure.
But regardless, in my mind the biggest open question in is NOT "what machinery do we use to take measurements?". The real question is "where/how do we store/retrieve historical measurements that we can use as a baseline for comparison?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real question is "where/how do we store/retrieve historical measurements that we can use as a baseline for comparison?"
The test results will be stored on the GCS bucket if we run these test cases in Prow CI. And then, we can use Big Query to retrieve historical measurements. Similar to Sippy.
I think it's a little bit complex. How about storing the baseline in benchmarks/baseline.txt
? The initial benchmark results as the baseline? Once a new version is released, update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it notably isn't able to account for the resources or time spent by the controller to reconcile the ClusterCatalog, which I think is what we actually want to measure.
Yes, however, I was trying to cover the "real world" case instead of the unit test. Do we need to benchmark some functions? Similar to the unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about storing the baseline in benchmarks/baseline.txt?
Maybe? But if, for example, we wanted to build a histogram of response times of the catalogd web server over multiple commits and/or CI runs in order to get a smoothed out baseline, that seems hard if a file is committed to the repo because we'd have to update it often.
Maybe another option would be to make use of github's upload-artifact/download-artifact actions? If an artifact can be uploaded/download and shared between GH actions runs, then maybe we could do something like:
- download a snapshotted prometheus DB
- run prometheus with the DB, and configure scarping with a short interval to pull metrics from our components into the DB
- verify via promql that we haven't broken our thresholds, if we have fail the run
- snapshot the updated prometheus DB
- upload the snapshot
- success!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! If so, we don't need to use the benchstat
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to use the Prometheus.
Test on 4.19.0-0.nightly-2025-02-04-230011, GCP. jiazha-mac:e2e jiazha$ export CATALOG_IMG=registry.redhat.io/redhat/redhat-operator-index:v4.18
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10 1 1274109792 ns/op
BenchmarkCreateClusterCatalog-10 18 85367590 ns/op
BenchmarkCreateClusterCatalog-10 21 98587806 ns/op
BenchmarkCreateClusterCatalog-10 20 80611660 ns/op
BenchmarkCreateClusterCatalog-10 15 95789072 ns/op
BenchmarkCreateClusterCatalog-10 15 87705625 ns/op
BenchmarkCreateClusterCatalog-10 14 186946500 ns/op
BenchmarkCreateClusterCatalog-10 12 166680833 ns/op
BenchmarkCreateClusterCatalog-10 9 177902880 ns/op
BenchmarkCreateClusterCatalog-10 4 447856864 ns/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 38.205s |
.github/workflows/benchmark.yaml
Outdated
- name: Compare with baseline | ||
run: | | ||
go install golang.org/x/perf/cmd/benchstat@latest | ||
benchstat benchmarks/baseline.txt new.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchstat is nice to get comparisons, but we'd have to be careful to somehow generate and commit a baseline from GH CI's system to avoid issues like "Jian's laptop is beefier than Joe's, and both are beefier than a GHA VM"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
I really think we need to get a Brief and/or RFC written up to get agreement and consensus on the approaches before we merge stuff. But I also think the kind of prototyping and brainstorming that you're doing here and that @OchiengEd did will be necessary to get to the point that we are confident in our approach. |
3cde584
to
e0e379d
Compare
e0e379d
to
6ea7de2
Compare
dd32769
to
8b1afa9
Compare
A Brief is drafting here: [WIP]Brief: Benchmarking test OLMv1 |
ff8faec
to
08a24d1
Compare
Benchmark test pass: https://github.com/operator-framework/operator-controller/actions/runs/13214623057/job/36892420269?pr=1651 jiazha-mac:~ jiazha$ tree Downloads/benchmark-artifacts/
Downloads/benchmark-artifacts/
├── new.txt
└── output
1 directory, 2 files
jiazha-mac:~ jiazha$ cat Downloads/benchmark-artifacts/new.txt
goos: linux
goarch: amd64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: AMD EPYC 7763 64-Core Processor
BenchmarkCreateClusterCatalog
BenchmarkCreateClusterCatalog-4 81 82695425 ns/op 36570 B/op 397 allocs/op
BenchmarkCreateClusterCatalog-4 12 99872913 ns/op 37266 B/op 404 allocs/op
BenchmarkCreateClusterCatalog-4 12 99852972 ns/op 37327 B/op 402 allocs/op
BenchmarkCreateClusterCatalog-4 12 99882173 ns/op 37409 B/op 405 allocs/op
BenchmarkCreateClusterCatalog-4 12 100024048 ns/op 37350 B/op 405 allocs/op
BenchmarkCreateClusterCatalog-4 12 100098746 ns/op 37568 B/op 406 allocs/op
BenchmarkCreateClusterCatalog-4 12 100037742 ns/op 38134 B/op 405 allocs/op
BenchmarkCreateClusterCatalog-4 12 99984867 ns/op 37121 B/op 403 allocs/op
BenchmarkCreateClusterCatalog-4 12 99855796 ns/op 38886 B/op 406 allocs/op
BenchmarkCreateClusterCatalog-4 12 99946190 ns/op 38851 B/op 404 allocs/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 20.427s
jiazha-mac:~ jiazha$ cat Downloads/benchmark-artifacts/output
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
│ benchmarks/baseline.txt │
│ sec/op │
CreateClusterCatalog-10 85.11m ± 16%
│ benchmarks/baseline.txt │
│ B/op │
CreateClusterCatalog-10 36.21Ki ± 6%
│ benchmarks/baseline.txt │
│ allocs/op │
CreateClusterCatalog-10 394.5 ± 1%
goos: linux
goarch: amd64
cpu: AMD EPYC 7763 64-Core Processor
│ /tmp/artifacts/new.txt │
│ sec/op │
CreateClusterCatalog-4 99.91m ± 0%
│ /tmp/artifacts/new.txt │
│ B/op │
CreateClusterCatalog-4 36.50Ki ± 4%
│ /tmp/artifacts/new.txt │
│ allocs/op │
CreateClusterCatalog-4 404.5 ± 1% |
521a49f
to
9305638
Compare
88d2518
to
dba035d
Compare
dba035d
to
982fa8c
Compare
a1d1b5a
to
24b9450
Compare
24b9450
to
17c9d51
Compare
This benchmark pipeline logic as follows:
cat << EOF > prometheus.yml
global:
scrape_interval: 5s
scrape_configs:
- job_name: 'benchmark_metrics'
static_configs:
- targets: ['$HOST_IP:9000']
EOF
docker run -d --name prometheus -p 9090:9090 \
--user=root \
-v ${{ github.workspace }}/prometheus.yml:/etc/prometheus/prometheus.yml \
-v ${{ github.workspace }}/prometheus-data:/prometheus \
prom/prometheus --config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/prometheus \
--storage.tsdb.retention.time=1h \
--web.enable-admin-api
MAX_TIME_NS=1200000000 # 1.2s
MAX_ALLOCS=4000
MAX_MEM_BYTES=450000
# Query Prometheus Metrics, get the max value
time_ns=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_ns)" | jq -r '.data.result[0].value[1]')
allocs=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_allocs)" | jq -r '.data.result[0].value[1]')
mem_bytes=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_mem_bytes)" | jq -r '.data.result[0].value[1]')
Hi @joelanford , I have implemented the logic you suggested above, could you help have a review when you get a chance? Thanks! |
I download the snapshot from https://github.com/operator-framework/operator-controller/actions/runs/13304141348 jiazha-mac:prometheus-3.1.0.darwin-arm64 jiazha$ tree data
data
├── 01JHSRNJT32YH9858HS49WKGNS
│ ├── chunks
│ │ └── 000001
│ ├── index
│ ├── meta.json
│ └── tombstones
├── 01JKZ7W6XKK9WQ4B9JM5GCNZMW
│ ├── chunks
│ │ └── 000001
│ ├── index
│ ├── meta.json
│ ├── tmp_dbro_sandbox3927239696
│ └── tombstones
├── 01JKZ9ZF06MSA9VCPFBRSRWB4J
│ ├── chunks
│ │ └── 000001
│ ├── index
│ ├── meta.json
│ └── tombstones
├── chunks_head
├── prometheus.tar
├── queries.active
└── wal
├── 00000000
├── 00000001
└── 00000002
10 directories, 17 files
jiazha-mac:prometheus-3.1.0.darwin-arm64 jiazha$ ./prometheus
time=2025-02-13T09:38:09.214Z level=INFO source=main.go:636 msg="No time or size retention was set so using the default time retention" duration=15d
time=2025-02-13T09:38:09.214Z level=INFO source=main.go:683 msg="Starting Prometheus Server" mode=server version="(version=3.1.0, branch=HEAD, revision=7086161a93b262aa0949dbf2aba15a5a7b13e0a3)"
... http://localhost:9090/query?g0.expr=benchmark_createclustercatalog_allocs&g0.show_tree=0&g0.tab=graph&g0.range_input=1h&g0.res_type=auto&g0.res_density=medium&g0.display_mode=lines&g0.show_exemplars=0 |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description
See more: #920
Reviewer Checklist