-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigquery: RowIterator connection does not recovered from read: connection reset by peer
error
#11364
Comments
read: connection reset by peer
errorread: connection reset by peer
error
hey @HurSungYun, thanks for the report. You mentioned the Storage API, are you using the Storage Read API acceleration with the We have some retry logic already for the BigQuery v2 and Storage API, but for the BigQuery v2 (which have the jobs and query APIs) are HTTP based, so it doesn't need to keep a TCP connection open, just retry to call the method again. For the BQ Storage APIs things are a bit different, as we do keep some gRPC connection open and try to reuse things as much as possible. I'll try to reproduce your scenario here and see what we can improve in regards to retry |
@alvarowolfx Thanks for the reply.
Yes. In my configuratxion,
I believe functions like ( Thank you again for investigating my issue. |
The query text doesn't restrict storage acceleration. You can verify if the row iterator is trying to use the read API by consulting https://pkg.go.dev/cloud.google.com/go/bigquery#RowIterator.IsAccelerated. If it's not being accelerated, possibly this is related to http2? If it is accelerated (which is likely here), then we may need to look deeper at ReadRows retries. |
@shollyman Thank you for your reply.
Thank you for letting me know.
Understood. I am going to verify whether the row iterator is accelerated using the I’ll let you know once I have the results. It may take about 1 week. |
Client
Bigquery
Environment
Code and Dependencies
Expected behavior
When a
read: connection reset by peer
error occurs, subsequent calls tori.Next(&row)
should attempt to re-establish the TCP connection and retry the requests.Actual behavior
It seems that the RowIterator does not re-establish TCP connections after encountering a
read: connection reset by peer
error. Instead, it continues retrying on the same socket.The same ephemeral port is logged repeatedly: (I might be wrong. please let me know if i'm wrong)
Additional context
This code is running on AWS Fargate, where intermediate routers or load balancers might close the TCP connection (e.g., due to idleness or bandwidth constraints) without notifying the client.
I believe the connection should be re-established in such cases.
It's practically impossible to eliminate
read: connection reset by peer
errors entirely in a production environment. It would be helpful if RowIterator could handle these scenarios by internally re-establishing the TCP connection.I understand there are two APIs involved: the Job API and the Storage API. Handling only the Job API is acceptable for my use case.
The text was updated successfully, but these errors were encountered: