You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TD maybe bump the claim queue backoff of something when messages start timing out.
So if you only have 1 job in progress, stop sending claim requests to lightning. Because lightning is busy! So back off and let the work finish.
I wonder if this is something like: take the average lightning reply time, and if that exceeds some threshold, multiply the claim backoff by it. In a trivial case, if the average message round trip is 9 seconds, then your backoff is +9 seconds.
That would help decrease load when Lightning is struggling and reduce the chance of lost runs.
If any worker -> lightning message times out on the websocket (ie because it took 10 seconds to reply), the run right now will be lost.
We can do better than this! We should surely be able to report the timeout somwhere, or continue retrying.
We may need help on the lightning side to recognise that message responses are slow.
Everyone will understand if the system is under load and running slow - so long as the work does get done eventually.
Probably the answer here is just to retry the message, or backoff and retry.
The text was updated successfully, but these errors were encountered: