Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-14: MCS: Adding budget limit thresholds to endpoints for SC Donation #24

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lsf37
Copy link
Member

@lsf37 lsf37 commented Jun 17, 2024

Original Jira issue and discussion

Rendered version of the RFC text.

@lsf37 lsf37 force-pushed the 0140-budget-limit-threshold branch from dffaf1c to 0d64b60 Compare June 17, 2024 01:43
@kent-mcleod
Copy link
Member

If the server thread was told how much budget was available on the donated SC when it is unblocked from seL4_Recv, then it could immediately return an error to the client based on some policy it can implement itself. This in combination with the proposed mechanism that allows a well-behaving client to yield-until-budget before making the call gives more power to both client and server and doesn't require adding more complexity to endpoints.

A single threshold value on an endpoint would have to be suitable for all calls over that endpoint whereas a server implementing a policy at user level could enforce a different threshold limit that's specific to each unique call.

I couldn't find a user-level approach discussed in the proposal or the thesis or the comments.

@lsf37
Copy link
Member Author

lsf37 commented Jul 22, 2024

I would also welcome some more thinking on alternatives to this approach. Doesn't mean we have to take them, but the design space is large and the RFC at least does not show exploration of much of it -- it's the prototype of one implementation that happens to work, but for me it has two main problems:

  • To my understanding we are now introducing a loop into the IPC fastpath(!) that is a priory only bounded by the absolute maximum number of refills a user can create a scheduling context with. As the performance number tables show, that means there is no "IPC takes x cycles" for specific scenarios any more -- it all depends on how things were scheduled before and how many refills happen to be in the scheduling context. The user does not really have much control over that apart from setting the maximum number.

  • The other aspect is verification cost: this is the IPC path -- even small modifications are very expensive there, and even the simplest version of the RFC is not a small modification. This would need significant verification funding first to proceed (it does not look to me like this can be made conditional easily -- if it can, then it could be made available in non-verification configs first).

Would it make sense the keep a "remaining overall budget" field in the SC that is updated each time an SC is charged and re-charged? (I.e. explicitly keep the sum of all refill budgets). This would avoid the loop and make Kent's alternative of a server (and even well-behaved client) checking the remaining budget quite fast and easy to implement without introducing new kernel mechanisms on endpoints.

Edit: sorry, that last sentence doesn't make sense. The sum of all refill budgets would always be the overall budget of the SC. It's just the potentially mergable head refills that we want to know, which is what the loop currently computes.

@lsf37
Copy link
Member Author

lsf37 commented Aug 4, 2024

Status update from 2024-07-25 TSC meeting: awaiting change of prototype implementation by Indan.

@Indanz
Copy link

Indanz commented Aug 5, 2024

If the server thread was told how much budget was available on the donated SC when it is unblocked from seL4_Recv, then it could immediately return an error to the client based on some policy it can implement itself. This in combination with the proposed mechanism that allows a well-behaving client to yield-until-budget before making the call gives more power to both client and server and doesn't require adding more complexity to endpoints.

The problem is that we don't know how much budget is available when the server gets unblocked. To know that, we need to take a timestamp on the fast path, which currently doesn't happen.

So from a performance point of view your proposal would require all IPC calls to take a timestamp and an extra yield-until-budget syscall for tasks wanting this feature.

With the proposal the cost is one extra if-check to check for budget threshold for all IPC calls, and taking the timestamp only when a budget is set.

@Indanz
Copy link

Indanz commented Aug 5, 2024

  • To my understanding we are now introducing a loop into the IPC fastpath(!)

I'm getting rid of those new loops and will go to the slow path if the head budget isn't sufficient.

The other aspect is verification cost

The loop to gather enough budget is the same loop as the one at the end of refill_budget_check, except that the budget threshold is used instead of MIN_BUDGET. refill_unblock_check already merges overlapping refills (I don't understand how they can happen though, seems easier to avoid than to fix afterwards to me).

But yes, if you want to proof that budget threshold feature does what it should, the verification cost will be high no matter how it's implemented.

it does not look to me like this can be made conditional easily -- if it can, then it could be made available in non-verification configs first

I think it could be made conditional easily, but the question is whether you want that.

@kent-mcleod
Copy link
Member

The problem is that we don't know how much budget is available when the server gets unblocked. To know that, we need to take a timestamp on the fast path, which currently doesn't happen.

There is currently a timestamp taken on every slowpath entry to the kernel because every time the kernel is entered it calculates whether the time remaining in the head refill is greater than MIN_BUDGET: refill_head(sc)->rAmount - usage in refill_sufficient(). The result of this calculation is also the same as the amount of time until the currently programmed timer interrupt will fire and is the amount of budget available to the server before a timeout fault would be generated.

If this check isn't also in the fastpath then it would need to be added regardless.

@Indanz
Copy link

Indanz commented Aug 5, 2024

If this check isn't also in the fastpath then it would need to be added regardless.

Why? We already paid most of the cost of a syscall, handling the fast path is always faster than doing anything else. MIN_BUDGET applies to generic system calls as it is bound to WCET, not fastpath ones, which are on the order of one microsecond.

At worst the called task will be interrupted within MIN_BUDGET by the timer interrupt to end its time-slice. You can usually do 10 or more IPC calls within MIN_BUDGET time.

IPC fastpath is a context switch without SC change, and because of that it can avoid doing anything timing related at the moment.

(For SMP I think we should take the timestamp before taking the kernel lock, see #844, and always taking the timestamp at entry would simplify some things, so I'm not against it in principle. But it will add extra overhead, especially on RISCV where it usually traps.)

@kent-mcleod
Copy link
Member

Why? We already paid most of the cost of a syscall, handling the fast path is always faster than doing anything else. MIN_BUDGET applies to generic system calls as it is bound to WCET, not fastpath ones, which are on the order of one microsecond.

In the kernel design the fastpaths are each optimizations of a specific slowpath kernel operation. So if refill_sufficient() would return false in the slowpath and result in a thread switch, then the fastpath would have to produce the same thread switch, or else fall back to the slowpath. To facilitate this, the fastpath needs to perform the same check.

IPC fastpath is a context switch without SC change, and because of that it can avoid doing anything timing related at the moment.

It arguably can't avoid performing the budget check. The kernel has to ensure there is enough remaining budget to get back into the kernel whenever it returns to user level. It has to guard against a thread calling late enough in its time slice and amplifying its budget via a non-preeptable kernel operation.

I think there's a design inconsistency here where the fastpath is always expected to behave the same as a particular path through the slowpath, but the slowpath will now abort an operation if it doesn't have enough time to complete while the fastpath is hoped to be fastenough to always have enough time available to complete.

Shouldn't the fastpath also abort an operation if it doesn't have enough time to complete?

We either want to assume that:

  1. the fastpath time rounds to zero (and not needing the check as a consequence),
  2. the fastpath time doesn't round to zero and so a check is needed, and because of the design requirement that the fastpath follows the same behavior as the slowpath then the check is the same as the check in the slowpath. In this case maybe we could define the check to be parameterized by input arguments to the kernel, and in the case where it's the fastpath condition the check checks a smaller time value?

@Indanz
Copy link

Indanz commented Aug 24, 2024

Thinking about time is always tricky, hence my delayed response.

It arguably can't avoid performing the budget check. The kernel has to ensure there is enough remaining budget to get back into the kernel whenever it returns to user level. It has to guard against a thread calling late enough in its time slice and amplifying its budget via a non-preeptable kernel operation.

The kernel does not check at syscall entry if there is a higher priority thread scheduled within WCET time. Hence the worst case expected scheduling delay is equal to WCET. Even if a task calls a fastpath syscall just before it's timeslice ends, the worst case scheduling delay for the next task is much less than WCET.

The budget amplification is negligible. However, avoiding that amplification will not reduce the total amount of kernel operation time, though it might avoid taking a timer interrupt. I'm not convinced that stealing MIN_BUDGET budget from a task is better that letting it overrun by a fraction.

We either want to assume that:

1. the fastpath time rounds to zero (and not needing the check as a consequence),

This is the assumption I am making. Scheduling is at 1 us granularity currently, so anything close to that is a rounding error.

2. the fastpath time doesn't round to zero and so a check is needed, and because of the design requirement that the fastpath follows the same behavior as the slowpath then the check is the same as the check in the slowpath.  In this case maybe we could define the check to be parameterized by input arguments to the kernel, and in the case where it's the fastpath condition the check checks a smaller time value?

I have the feeling that the kernel currently abuses WCET for kernel entry and exit overhead, while that should be separate values from WCET (the only explanation I have for MIN_BUDGET = 2 * WCET). As WCET is used everywhere, it's unclear what purpose each margin has.

In the end it's about what timing guarantees seL4 provides. In my view seL4 should be suitable for hard-realtime, which implies (ignoring IRQs):

  1. Tasks start when they should. This currently has error Es = WCET.
  2. Tasks get enough budget in a timely manner. This currently has error of at least Eb = Es + MIN_BUDGET because of the budget check. But because notifications cause LIFO scheduling behaviour, no guarantees whatsoever can be given at the moment.
  3. Scheduling errors should not accumulate. Es and Eb are non-accumulating. (Notifications causing LIFO scheduling is exactly such a big problem because it does accumulate, but let's ignore it for this analysis.)

Note that no budget overruns happening is not a design goal in itself. It may be necessary to be able to make certain guarantees, but I'm not fully convinced it is yet. It does avoid accumulating scheduling delays in case other tasks of the same priority all do budget amplification. However, that error is at most WCET per task (much less in reality), while the budget check causes tasks to lose up to MIN_BUDGET per period. I'd argue that a scheduling jitter of MIN_BUDGET is much worse than a scheduling jitter bound by WCET and much less in practice.

In other words, currently tasks needing a budget of b need to set budget = b + Eb if they have a time limit and budget = b + MIN_BUDGET otherwise. Without the budget check they can set budget = b + Es or budget = b for all tasks. Overall time can be divided more efficiently without a budget check at each syscall entry and scheduling jitter will be lower. As far as I can remember, the original sporadic scheduling paper did not check budget at every kernel entry either.

@Indanz
Copy link

Indanz commented Aug 24, 2024

To clarify, I think a slight budget overrun is fine if it means the task will get that amount less budget next period, to avoid accumulating errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants