Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve busy loop stability #90

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

douglas-raillard-arm
Copy link

The current implementation of the busy loop seems to lead to somehow non-reproducible calibration values, and also potentially different duty cycle when the same workload is executed twice. Old versions of rt-app seem to behave well, but for some reason building a recent version with the same toolchain leads to these issues.

In order to solve that:

  • Shield the busy loop against compiler optimizations, since the function is "useless". It's cheap to do and should avoid future pain.
  • Use a simpler loop body that avoids branches to other functions. Hopefully, the behaviour of such body should stay the same in the future.

This PR will however change the behavior of rt-app on asymmetric systems with so called CPU PELT invariance. The invariance described by CPU capacities only holds for a given mix of instructions. Since the CPU capacities have typically been established using a benchmark X (supposedly Dhrystone), the duty cycle of any other periodic workload will scale differently when moved around on different CPUs. Changing the rt-app loop body will therefore change the utilization of the task when running on a little CPU. This can be accounted for when creating the JSON when the task will be pinned on a given class of CPUs, but there is no real solution when the task is free to move on any CPU.

Since there is no way of actually ensuring that rt-app calibration values will be inversely proportional to CPU capacities, it's a lost battle so IMO we should aim at getting reproducible results. People interested in reproducing very accurate util signals should update the in-kernel capacities of their CPUs based on rt-app calibration values before running their tests.

Fixes #89

@douglas-raillard-arm douglas-raillard-arm changed the title [RFC] Improve busy loop stability Improve busy loop stability Oct 9, 2019
Since the function is pure, the compiler is free to do anything it wants to that
function, including removing all its call site. To avoid any such issues:

* disable optimizations for that function
* forbid inlining
* add some no-op statement that is guaranteed to be treated as a side-effect by
  the compiler.

Signed-off-by: Douglas RAILLARD <[email protected]>
Previous implementation based on ldexp() was not always giving consistent
results from one run to another. Using more basic operations without extra
branches makes the execution time of the body much more predictable, leading to
more stable calibration values and precise reproduction of duty cycles.

Signed-off-by: Douglas RAILLARD <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rtapp execution gives unreliable actual duty cycle
1 participant