Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate documentation is needed for how PubSub + Inbound Resiliency Policy augment together on delivery attempts #3650

Closed
olitomlinson opened this issue Jul 30, 2023 · 3 comments · Fixed by #4325
Assignees
Labels
content/missing-information More information requested or needed

Comments

@olitomlinson
Copy link
Contributor

Conversation on Discord :

OP

Hi all! I'm trying to set up a retry policy for a pubsub component but I don't know if I'm misunderstading how resilinency policies works or if there is a problem with my configuration. I'm publishing an event that fails with a retriable return code and I have this resiliency component:

  constantRetry:
    policy: constant
    duration: 5s
    maxRetries: 5
targets:
  components:
    pubsubComponent:
      inbound:
        retry: constantRetry

I was expecting the subscription to be called 5 times in a span of 25 seconds and then just stop retrying, but I'm getting groups of those 5 requests every ~1 min (see attached screenshot if the traces). Does anyone know what I'm doing wrong or missing here? Thanks!

image


Oli

I'm not an authority, but from what I've heard, I think what's happening here is that each PubSub component, has its own 'in-built' retry behaviours. (Just as a note : These behaviours are not gauranteed to be consistent across all the various pubsub components btw)

In your case, the given pubsub component will make an initial delivery attempt, + 3 retries, with each retry attempt being ~1 minute apart.

What's happening is that the resiliency policy is augmenting the in-built retry, hence why you're getting the repeated clustering of messages.

To prove this theory, turn down the maxRetries in your policy to just 1, or 3, and see if your visualisation matches that.

I'm sure berndverst (maintainer) can correct me here if I've got this wrong 🙂


Bernd

That is correct. Not sure where the 1min comes from though. Resiliency policies today only multiply the retry behavior of each component.


OP

I see, so if I understand correctly the 1 min should be coming from the implementation of the component (in this case redis)? If that's the case, what is the recommended setup then? I was thinking in setting a dead letter topic so the event stops retrying. Would that work?


OP

Yep, it was redis. I set redeliverInterval to "0" and the message was not retried after the initial burst, Thanks Oli and berndverst (maintainer) !


Oli

Great! Glad you got there! This has just reconfirmed my sketchy knowledge, so thanks 🙂

FWIW - You're definitely not the first person to find the behaviour weird. It tripped me up, and my team. I think the problem here is when you explicitly provide a resilieny policy, you would think it would override any implicit/built-in retry policy. But thats not the case. So it feels a little weird. But once you know, its not so bad I guess 🙂


Bernd

There is an issue open to expand resiliency policies to cover built in retry behavior.. but it's a ridiculous amount of work that is custom for each component. At this point I don't see it happening before 1.14

@olitomlinson olitomlinson added the content/missing-information More information requested or needed label Jul 30, 2023
@kendallroden kendallroden moved this from Triaged to 🚧 Needs Review for Attendees in Grace Hopper Conference 2023 Sep 14, 2023
@sicoyle
Copy link
Contributor

sicoyle commented Sep 18, 2023

List default retry logic to documentation, and clarify dapr resiliency policy will be applied on top of that.

@sicoyle sicoyle moved this from 🚧 Needs Review for Attendees to Triaged in Grace Hopper Conference 2023 Sep 18, 2023
@olitomlinson
Copy link
Contributor Author

Another end-user on Discord not understanding the interop between PubSub component retries and PubSub inbound resiliency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content/missing-information More information requested or needed
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants