-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What can I do about Retry header did not contain a valid timeout
?
#1421
Comments
Retry header did not contain a valid timeout
?
Hey @meichstedt, sorry about the trouble you're experiencing. If at all possible, if you have some logs to share around when this happened and which URL/API you saw was returning no retry delay information in its header, that would be helpful to dig into this further. I can then dig into the relevant backend API code to see how this could come about. At this point, I don't have any good answer for your first two questions, but with more information like logs and which APIs, we could take a deeper look. As for workarounds, what about wrapping the API invocation with a try/catch, inspecting the error message for "Retry header did not contain a valid timeout", and if it contains that, defaulting to some reasonable default delay before trying again? Certainly not ideal, and the intention behind the retry and queueing features of web-api is so that you don't have to think about that! Ideally we could get to the bottom of the underlying issue and address that. |
Hey @filmaj thanks for getting back to me so quickly.
I also have IP addresses of the machine that did the requests – I can share those if you have a non-public way to exchange. The code in question is only using the
where I'm happy to catch those and retry later; we have similar workaround in other places; the main thing that worries me in this case is that it seems that the client didn't do many requests at all – if you could verify using logs/IP addresses, that would be great. Let me know if the timestamps help, and whether I can share the IP somehow. |
Hey @meichstedt, thanks a lot for the extra information, that's really helpful. I am asking around internally to see if if/how I can get access to the production logs for these APIs based on the timestamps and potentially the IPs you could provide. I am relatively new to the company so I'm still getting my bearings and it might take me a little bit to figure out where to look for this stuff - but as soon as I figure this out I'll report back. FYI it probably will have to wait until after the weekend. |
Hey @meichstedt, I can indeed peruse our logs to try to identify the requests and see what happened. Beyond timestamps, can you also provide me with the timezone of your machines (unless the timestamps are UTC). I can also use the IP addresses to help filter through the information. Any other info you can provide to me to help narrow down the requests, like channel IDs, bot IDs, app IDs, would be helpful too. All this information you could ideally provide to me over a secure channel. I use keybase.io (keybase.io/filmaj) - would that work for you? |
Hi @filmaj perfect thank you so much. Yes the timestamps are all UTC; I'll send you all that information via keybase. |
Got the info, thanks, I will dig around! |
Alright, I've dug around, and I believe I can find the relevant logs from the backend for these calls. I see many calls to the Unfortunately, on our side, while we do log the request parameters and some other helpful tidbits, we do not log the full response returned. As such, I am limited to trying to work backwards through the logic of the backend code to see how this situation could arise. That said, we do log a specific SHA of the backend code as well as a kind of stack trace through the backend code - so I can see the separate paths of logic executed between the rate-limited call and the successful calls. I will list out my assumptions based on what I've seen from the backend code and how the node-slack-sdk parses the headers:
I think the key to solving this problem lies in identifying what was returned in these rate-limited response headers. Perhaps as a baby step to helping with this issue, I can improve the exception being raised such that it records the values returned in the HTTP response's Sorry this is so indirect / taking so long! However, if you have the time and patience to work with me on this issue, I hope we can get to the bottom of it 😃 |
…eader value when throwing an error due to unparseable/invalid header value
…eader value when throwing an error due to unparseable/invalid header value (#1426)
Thanks for the details and for improving the error already @filmaj! I'll look into what we're doing there, we should definitely not hammer the api like this 🤔 |
@meichstedt with the improved error being merged to Another idea that came up as I was talking with a backend engineer at Slack about the potential issue with rate limiting retry headers being returned was to possibly change the behaviour of this SDK in the case that an invalid header is returned. Perhaps, instead of throwing an exception, the API client could fallback to some sensible default behaviour? Just some ideas, but perhaps introducing a long retry delay of, for example, 30 seconds instead of throwing an exception? I am not sure myself but thought perhaps it is worth a discussion. |
@filmaj I'm happy to use the main branch if that's considered stable – I planned on upgrading from 5.15.0 anyway. Unless there's a specific commit I should look for, I'll just pick the latest. We are trying to release weekly so this change should be up soon once I get to doing it. I will also check our retry behavior and the api usage in general – I haven't written our slack integration but I have a hunch about where the issues originate from:
Anyway, this combination of loading all channels incl. archived is clearly unfavorable in our case. I don't know how many channels that user has in their workspace, but it's 300 people so depending on how long they've been using Slack, it may be >1000 easily I guess; we're fetching Re the discussion about what to do when no retry header was given or could be parsed, both of these options would be fine IMO:
As a consumer, anything that allows me to figure out this case is fine, meaning an error code I can rely on that won't change. I just wouldn't want to infer this case from an arbitrary error string that says I'll report back once I've adjusted the settings. Thanks again so much for diving into this ❤️ |
Re: your particular usage of My one concern about introducing a default value for a retry if no valid retry-header is provided is that I worry this could introduce a significant bottleneck for the request queue in the SDK (since a single queue is used to control outgoing requests). Using your particular situation as an example, consider a request queue that gets ~10 requests per second added to it. If suddenly one of the responses for these requests comes back with an invalid retry-header and the SDK defaults to waiting some number of seconds before retrying the request (say, 30 seconds) - then ~300 requests would pile up in the queue while waiting for the default retry timeout! I think that could negatively impact the perceived performance of your application. I will get the rest of the team to chime in on this thread to see what further potential improvements could be made to address this situation. |
100% agreed that I've adjusted the settings for I wasn't aware the SDK queues requests! That's actually pretty neat; other SDKs I've seen are simply providing types as their main value add. With the default |
Yes, I think we will aim to release a new version very soon. It has been about a month since the previous release and we aim to generally release on a monthly cadence. I can tackle the releasing steps during my business hours (east coast of Canada) tomorrow. While I wouldn't count this issue as resolved with the release, at least we should be able to get better errors next time the error does occur. |
FYI @meichstedt I just published web-api 6.7.0 to npm. |
Hi @filmaj our updated version using 6.7.0 with adjusted filters went live a couple of days ago. Haven't seen any issue so far and I expect this to be resolved. Greatly appreciated your involvement and immediate help on this! 🙌 Unless you want to keep this issue open for anything, it can be closed. |
Hi @meichstedt, sounds good, I will close this issue, but if you do catch any more logs around responses coming back with no retry headers, please feel free to re-open and at-message me (this way I will be sure to get a notification). |
Hi @filmaj actually I completely missed this issue in our reporting; one single user has been running into this regularly, and since March 5 I'm getting this updated error message from 6.7.0
the latest event is from ts 1646760189 (that's Mar 8, 2022 5:23:09 PM UTC) I'm not sure what how exactly this user is using the integration, but I'm seeing many previous errors so that user may be misusing our app and try to send too many events – still, maybe this helps you debug why the header is missing? |
@filmaj looks like I'm not able to re-open the issue |
Hey @meichstedt, yes I see the rate limited requests around this time, coming from your existing App ID (assuming it is the same one you confirmed with me last time we spoke), from a single IP address, using the I see about 200 requests coming in to the Since these requests are relatively fresh (happened about an hour ago) I am going to try to cross-reference the request ID with other logs in our systems to try to get a better idea of what is going on in other parts of the system. |
Digging into the logs did not reveal clues. I have asked for assistance from the backend team to see what we can do to address this. Either by adding additional logging on the backend to try to get more awareness on the situation, or perhaps by applying some default / fallback Sorry for the troubles, as this may take some time to resolve, but as I get updates and we take steps to address this, we will keep you informed. |
Thanks for the details @filmaj. To provide a bit of background, Bardeen is a no-code tool that allows users to integrate various services and automate workflows. E.g. a use case could be (not saying this one is enormously useful)
Now, depending on what the user does, this can trigger a lot of invocations on the right side of the automation, in this example the I've reached out to that user but haven't heard back and don't know whether they appreciate my investigation ;-) Since the automation is running from the user's browser, there's only so much I can do to understand their use case w/o getting input from them. We aim at eventually throttling these invocations but – that will take some time. |
Hey there,
I've been running into 3 occurrences of this error:
Retry header did not contain a valid timeout
(from here)The client was using the outdated 5.15.0 WebApi, so it didn't tell me the url.
My main questions are
Thanks in advance
Packages:
Select all that apply:
@slack/web-api
@slack/rtm-api
@slack/webhooks
@slack/oauth
@slack/socket-mode
Reproducible in:
The Slack SDK version
@slack/web-api: 5.15.0
Python runtime version
v15.14.0
OS info
Chrome/98.0.4758.80
Macintosh; Intel Mac OS X 10_15_7
Steps to reproduce:
(Share the commands to run, source code, and project settings)
Expected result:
I don't know, assume a default timeoutSec?
Actual result:
(Tell what actually happened with logs, screenshots)
Requirements
For general questions/issues about Slack API platform or its server-side, could you submit questions at https://my.slack.com/help/requests/new instead. 🙇
Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.
The text was updated successfully, but these errors were encountered: