Rapid Failure Protection - How to implement #2022
-
Hi. There is currently custom logic in the services' ExecuteAsync method to
This restart logic is conceptually similar to the IIS App Pool Rapid Failure Protection. If the long-running tasks fails, it should be restarted for x times within y minutes, with the delay between the restart increasing each time. If its been greater than y minutes since the last restart, the retry counter needs to be reset to 0 and the cycle starts overs. However, if max retries is reached within y minutes then whole service needs to be aborted. The Polly Retry Strategy will handle the restart and the track the failure counts, but it doesn't reset the retry counter until the task successfully completes. Which would never happen on a long-running tasks. The other strategies don't seem to help with what I'm trying to achieve. I am trying to figure out if this is something:
For all I know, I could have the wrong approach or just be fixated on using the wrong tool. Thank you for your advice. -marc |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 13 replies
-
The short answer is yes I think you can achieve something like what you have described. |
Beta Was this translation helpful? Give feedback.
-
Hi @peter-csala, I am using the new V8 API. Thanks -marc |
Beta Was this translation helpful? Give feedback.
-
Okay, lets build the sample code together. (If you are interested only about the end result then scroll to the end of this answer). Start with some basic stuffvar retry = new ResiliencePipelineBuilder()
.AddRetry(new () {
ShouldHandle = args => PredicateResult.True(),
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Linear,
MaxRetryAttempts = int.MaxValue,
OnRetry = args =>
{
Console.WriteLine($"Attempt: {args.AttemptNumber} Delay: {args.RetryDelay} Duration: {args.Duration}");
return default;
}
})
.Build();
await retry.ExecuteAsync(ct => throw new MyException()); So, we have a retry strategy which retries unconditionally and indefinitely. The delay between retry attempts are linearly increasing. The output looks something like this
So, here the Use context for elapsed time managementResiliencePropertyKey<DateTime> cycleStartKey = new(nameof(cycleStartKey)); // newly added
var context = ResilienceContextPool.Shared.Get(); // newly added
var retry = new ResiliencePipelineBuilder()
.AddRetry(new () {
ShouldHandle = args => // rewritten
{
var cycleStart = args.Context.Properties.GetValue(cycleStartKey, DateTime.MinValue);
var diff = DateTime.UtcNow - cycleStart;
return diff <= TimeSpan.FromSeconds(10) ? PredicateResult.True() : PredicateResult.False();
},
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Linear,
MaxRetryAttempts = int.MaxValue,
OnRetry = args =>
{
Console.WriteLine($"Attempt: {args.AttemptNumber} Delay: {args.RetryDelay} Duration: {args.Duration}");
return default;
}
})
.Build();
context.Properties.Set(cycleStartKey, DateTime.UtcNow); // newly added
await retry.ExecuteAsync(ctx => throw new MyException(), context); // rewritten
ResilienceContextPool.Shared.Return(context); // newly added So, we add The output looks something like this
So, before each retry attempt we assess whether we still have time to kick off a new attempt or not. Retry the retriesResiliencePropertyKey<DateTime> cycleStartKey = new(nameof(cycleStartKey));
var context = ResilienceContextPool.Shared.Get();
var retry = new ResiliencePipelineBuilder()
.AddRetry(new() {
MaxRetryAttempts = int.MaxValue,
OnRetry = args => // newly added
{
args.Context.Properties.Set(cycleStartKey, DateTime.UtcNow);
Console.WriteLine($"Reset retry, new cycle {args.AttemptNumber + 1} will start");
return default;
}
})
.AddRetry(new () {
ShouldHandle = args =>
{
var cycleStart = args.Context.Properties.GetValue(cycleStartKey, DateTime.MinValue);
var diff = DateTime.UtcNow - cycleStart;
return diff <= TimeSpan.FromSeconds(10) ? PredicateResult.True() : PredicateResult.False();
},
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Linear,
MaxRetryAttempts = int.MaxValue,
OnRetry = args =>
{
Console.WriteLine($"Attempt: {args.AttemptNumber} Delay: {args.RetryDelay} Duration: {args.Duration}");
return default;
}
})
.Build();
context.Properties.Set(cycleStartKey, DateTime.UtcNow);
await retry.ExecuteAsync(ctx => throw new MyException(), context);
ResilienceContextPool.Shared.Return(context); As a final piece we added a new retry to the pipeline. If the inner retry exceeds the threshold then it lets the outer retry to handle the situation. The outer simply resets the The output looks something like this
Why don't we use That could be a really good question. This approach lets the attempt to finish (either succeed or fail). The timeout based approach would intrusively stop the execution of the given attempt. Based on the original question I had the impression that you don't want to have a disruptive solution. Rather gently reset the retry counter after a threshold has been exceeded. If I misunderstood your question, please let me know. |
Beta Was this translation helpful? Give feedback.
-
Hi @peter-csala. Referencing this feature request. Here is what I've run into. I need to be able to do the following:
Below is the code that I have come up with. I added a new class CycleStats that I used to capture the overall start time, and the start/stop of each attempt.
The IsLastAttempt method is modified to check to see if there have been too many attempts within the FailureInterval window.
The CycleStats class:
After every execution of the callback, we need to check to see if the code ran longer than the FailureInterval. If so, the attempt counter, and the Operation start time need to be reset. If we haven't run longer than the FailureInterval, than we just carry on and let the logic in ShouldHandle and OnRetry determine how to proceed. The idea behind the rapid failure is if the code fails x times in y minutes than we shouldn't restart or maybe add a longer delay. The current code base doesn't restart, due to an issue with the Rabbit MQ library being used. I did look into using just a single retry strategy and two retry strategies. They work, however, I was managing multiple context properties, including my own failure counter. I felt like I was overriding a lot of what retry strategy was doing, therefore it made more sense to implement my own strategy. If I need to use the existing Retry Strategy, I would have code similar to this:
This is corporate development, I always assume the person that will be maintain my code is not me, therefore, it needs to follow the kiss principal. Having the rapid failure logic split over two different handlers is not optimal. -marc |
Beta Was this translation helpful? Give feedback.
Let me share with you my prototype:
ResilienceStrategyOptions
to capture the parameters of the cyclic retryResilienceStrategy
to set the cycle start