`benchmark baseline check` doesn't respect threshold-tolerances when checking baseline against static thresholds #275

MahdiBM · 2024-09-18T02:13:37Z

benchmark baseline check doesn't respect threshold-tolerances when checking already-acquired baseline which is on disk, against static thresholds .json files.
Apparently it's because the benchmarks are not loaded so the tool falls back to the strict threshold-tolerances.

This makes our benchmarks fail even though we have less than 1% and 100ms of fluctuations.

The text was updated successfully, but these errors were encountered:

MahdiBM · 2024-10-04T11:04:27Z

@hassila I'm writing some real benchmarks now, and the benchmarks target needs to depend on a big monolithic target.
The problem is, when I do benchmark run, and then benchmark check, it builds that monolithic target twice, each time taking 11 minutes on an arm 64cpu 128ram bare-metal machine.

While I see this as mostly a Swift/SwiftPM problem that builds take soooo long, I also notice that the benchmark package doesn't need to build the benchmarking target the second time as we're checking static thresholds vs already-calculated baselines.

So hopefully there can be a command or something so "check" can skip building or running anything at all, and only perform checks with what is saved on disk.

hassila · 2024-10-04T11:52:37Z

Yes, it is a known problem unfortunately:

swiftlang/swift-package-manager#7210

Would be great if you can ping in that issue and explain the issue above - maybe we can add a "--skip-build" to work around it (but it really should be done by SwiftPM, we just use the official API and it rebuilds always...).

MahdiBM · 2024-10-04T12:10:47Z

Sure, but I'm skeptical about when will SwiftPM maintainers get to any issues.
Would be very very nice to have this feature while we're waiting a possible few years for SwiftPM.

As always I'm open to submitting a PR if needed.

hassila · 2024-10-04T12:47:08Z

It is a bit tricky, as we use the build to also get us the paths to the resulting built executables - which are needed.

I tried a quick and dirty --skip-builds flag, but it won't fly.

I am open to suggestions/PR:s (still worth pinging in the case though, I do think the more noise there is about something, the more likely it is to be fixed... The squeaky wheel gets the grease.)

hassila · 2024-10-04T12:47:27Z

This issue should be fixed by #284 now, if you can please have a go testing against that (it will be merged fairly soon).

MahdiBM · 2024-10-04T16:15:46Z

@hassila

          swift package --disable-sandbox \
            benchmark thresholds check \
            "${{ github.head_ref || github.ref_name }}" \
            --check-absolute-path $PWD/Benchmarks/Thresholds/ \
            --no-progress \
            --format markdown

did still build the benchmarks, unlike benchmark read.

hassila · 2024-10-04T16:17:53Z

--check-absolute-path is not used for that, you want to use --path

It needs to build the benchmarks for the check operation to get the latest threshold tolerances from the code itself.

MahdiBM · 2024-10-04T19:31:20Z

For what it's worth this issue is resolved now.

Also, for some reason, using:

          swift package --disable-sandbox \
            benchmark baseline update \
            "${{ github.head_ref || github.ref_name }}"

          swift package --disable-sandbox \
            benchmark thresholds check \
            "${{ github.head_ref || github.ref_name }}" \
            --path $PWD/Benchmarks/Thresholds/ \
            --no-progress \
            --format markdown

No longer builds the whole benchmarks target twice.
I notice that it does try to build the benchmark target even in the second command because 1- I've confirmed thresholds-tolerances do work 2- I can see some warnings about a dependency of the benchmark target (Swift 6 regression); but it pretty much just uses the previous build and does not waste time building the target again.
No complaints as this solves my whole problem, but would be nice to know what's going on?!

hassila · 2024-10-04T20:04:26Z

Glad to hear it is working better now - the build is a bit of a mystery of swiftpm I think - it should only rebuild what's needed between runs. Are there any differences on os/runtime environments between your tests when it failed vs now?

Should try to nail down the related swiftpm bug probably - maybe this can help track down some more info.

MahdiBM · 2024-10-04T20:08:42Z

Not really no environment differences.
Maybe the fact that I'm using --disable-sandbox helped? But I think I enabled using that sooner and it didn't work?
Can try that later to see if it's helping anything 🤔.

hassila · 2024-10-04T20:14:20Z

Perhaps - I haven't nailed down exactly when it rebuilds, just that it's way too often. Happy to hear results if your try it.

hassila · 2024-10-04T20:20:40Z

But maybe close this issue then and we can discuss the builds separately?

MahdiBM · 2024-10-04T20:32:02Z

I mean we can even close this issue and keep talking 😛

MahdiBM · 2024-10-06T11:25:15Z

@hassila I couldn't find out why SwiftPM is deciding to not totally build the benchmark target from the ground up.

Right now, majority of the times it won't re-build the whole target. But sometimes it does. Even with no code changes (e.g. rerunning the same GitHub Action can result in a different behavior... perhaps the .build cache was updated and caused the different behavior).

I did try removing --disable-sandbox. No difference.
Though I feel like I only started getting those "most of the times uses cache but sometimes rebuilds the whole target" after I removed --disable-sandbox?! not sure.

hassila · 2024-10-06T12:25:20Z

Yeah... thanks for trying - hope eventually figuring it out to try to fix root cause...

MahdiBM · 2024-10-11T11:28:37Z

@hassila This "caching sometimes works and sometimes not" situation is flaky enough that I decided to not rely on it and just only do 1 check instead of running comparison, then reading result, then doing the check.
Just reporting back about what we've (so far) end up doing.

MahdiBM · 2024-10-11T11:29:42Z

This is also why I've filed this issue: #288

MahdiBM · 2024-10-16T08:16:33Z

@hassila interesting observation:

It's been a few days I've added -c release to all swift package calls and there hasn't been any double-builds 🤔.
I noticed the benchmarking package builds some stuff in release mode anyway to be able to reliably benchmark performance, so I thought maybe having both debug and release builds together goes too hard on SwiftPM.
I'll monitor for the next few days as well and see if the double-build happens again or not.

Also for the record, using -c release did change the result of one of the two of our benchmarks from 310ms -> 290ms. Not sure why that should happen, might be a sign that we should use -c release.

hassila · 2024-10-17T09:52:36Z

Hmm, that is weird - if that is correct, it would imply that building in debug somehow invalidates the release build - we're just calling into the SwiftPM API and ask for a release build (and all normal caching should happen...).

MahdiBM · 2024-10-17T10:17:30Z

@hassila right ... this is definitely a bug somewhere ...

it would imply that building in debug somehow invalidates the release build

It would have been nice if it was that simple. The problem is it isn't consistent in invalidating the release build either. Or at least i haven't noticed the pattern.

Just to confirm, -c release is still working like I mentioned above, around 24h ago. I'll still monitor the situation.

MahdiBM mentioned this issue Sep 18, 2024

fix: threshold-tolerances sometimes are not respected #276

Closed

3 tasks

hassila mentioned this issue Oct 4, 2024

feat: Add explicit support for static thresholds #284

Merged

3 tasks

MahdiBM closed this as completed Oct 4, 2024

MahdiBM mentioned this issue Oct 27, 2024

SwiftPM unconditionally rebuilds artefacts that should be cached for command plugins swiftlang/swift-package-manager#7210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`benchmark baseline check` doesn't respect threshold-tolerances when checking baseline against static thresholds #275

`benchmark baseline check` doesn't respect threshold-tolerances when checking baseline against static thresholds #275

MahdiBM commented Sep 18, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

MahdiBM commented Oct 6, 2024 •

edited

Loading

hassila commented Oct 6, 2024

MahdiBM commented Oct 11, 2024

MahdiBM commented Oct 11, 2024

MahdiBM commented Oct 16, 2024

hassila commented Oct 17, 2024

MahdiBM commented Oct 17, 2024 •

edited

Loading

benchmark baseline check doesn't respect threshold-tolerances when checking baseline against static thresholds #275

benchmark baseline check doesn't respect threshold-tolerances when checking baseline against static thresholds #275

Comments

MahdiBM commented Sep 18, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

hassila commented Oct 4, 2024

hassila commented Oct 4, 2024

MahdiBM commented Oct 4, 2024

MahdiBM commented Oct 6, 2024 • edited Loading

hassila commented Oct 6, 2024

MahdiBM commented Oct 11, 2024

MahdiBM commented Oct 11, 2024

MahdiBM commented Oct 16, 2024

hassila commented Oct 17, 2024

MahdiBM commented Oct 17, 2024 • edited Loading

`benchmark baseline check` doesn't respect threshold-tolerances when checking baseline against static thresholds #275

`benchmark baseline check` doesn't respect threshold-tolerances when checking baseline against static thresholds #275

MahdiBM commented Oct 6, 2024 •

edited

Loading

MahdiBM commented Oct 17, 2024 •

edited

Loading