Clear pooled connection after node health checks #557

merlimat · 2024-10-27T00:04:51Z

When using Istio between coordinator and storage node, we should discard all the pooled connections in the coordinator, after a node is declared failed (and then it recovers).

The problem is that the gRPC client is not able to detect that TCP has failed, because of the Istio proxy. eg:

the istio is still setup to talk with the old pod
coordinator uses the pooled client and writes to a gRPC stream. There is no failure, even though the channel is not valid

In normal conditions, gRPC will process the TCP failure and open a new connection, though this does not work with Istio in between.

Modifications

Force to discard all the pooled gRPC clients for target node once we have successfully reconnected with the new pod.

merlimat requested review from mattisonchao, coderzc and RobertIndie as code owners October 27, 2024 00:04

Clear pooled connection after node health checks

d54d2b8

merlimat force-pushed the fix-reconnection branch from 83a42c0 to d54d2b8 Compare October 27, 2024 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear pooled connection after node health checks #557

Clear pooled connection after node health checks #557

merlimat commented Oct 27, 2024

Clear pooled connection after node health checks #557

Are you sure you want to change the base?

Clear pooled connection after node health checks #557

Conversation

merlimat commented Oct 27, 2024

Modifications