You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the connection is very busy, checkuplink often incorrectly recognizes the tunnel as dead and reconnects.
If the connection is heavily utilized, the latency of the client connections increases significantly.
Measured client ping RTT values through the tunnel of 800-1500ms on average, peak values up to 5000ms and occasional packet loss.
(LTE ping RTT from gluon without tunnel increases to approx. 350ms; no packetloss, stagnates there stably)
Checkuplink often incorrectly recognizes the tunnel as dead and reconnects. (Tunnel is not dead, just high latency!)
My Idea:
Increase the timeouts. (helps, but not enough)
+
Repeat the tests. If 3/3 fail --> dead.
(+ some high.latency.mode option for enable/disable the extended tests)
Code might look like this.
I'm testing this at the moment, looks promising.
retry_wget() {
local url="$1"local max_attempts=3
local attempt=1
local delay=1
local ret=0
while [ $attempt-le$max_attempts ];do
wget "$url" --timeout=10 -O/dev/null -q &&return 0 || ret=$?
logger -p warn -t checkuplink "wget attempt $attempt failed with code $ret, retrying in $delay seconds..."
sleep $delay
attempt=$((attempt +1))
delay=$((delay *2))donereturn$ret
}
retry_batctl_ping() {
local gwmac="$1"local max_attempts=3
local attempt=1
local delay=1
while [ $attempt-le$max_attempts ];doif batctl ping -c 7 -t 10 -i 1 "$gwmac"> /dev/null 2>&1;thenreturn 0
fi
logger -p warn -t checkuplink "batctl ping attempt $attempt failed, retrying in $delay seconds..."
sleep $delay
attempt=$((attempt +1))
delay=$((delay *2))donereturn 1
}
is_connected() {
if retry_wget "http://[$(wg|grep fe80|awk '{split($3,A,"/")};{print A[1]}')%$MESH_VPN_IFACE]/";then
GWMAC=$(batctl gwl|awk '/[*]/{print $2}')if retry_batctl_ping "$GWMAC";thenreturn 0
fifireturn 1
}
The text was updated successfully, but these errors were encountered:
The nature of the check seems to be not well suited for what you are trying to achieve.
Given the fact the HTTP request is of blocking nature and pings on the L2 router protocol level are not executed continuously (providing a better assessment over the connection state on a lossy link), this surely might fail.
If wireguard does not provide the information about bidirectional link-health, my suggestion would be to implement a daemon which continuously monitors the link-health.
You can do this by sending regular UDP packets with sequenced bodys (at fixed or adaptive intervals) in order to asses the links properties in terms of loss in multiple intervals. With this you can model the anomaly conditions in a more detailed way. This can be done on the Ethernet layer within the vxlan tunnel with a responder on the other end.
Other indicators might be out-of-order delivery, packet checksums, ...
Examples would be:
Increasing requests on continuous 100% loss detection over short interval A
Decreasing requests on 0 % loss over short interval A
Setting different intervals / thresholds based on the uplink type (cellular / etc)
You can also react in other ways, such as updating or implementing shapers when detecting continuous loss.
When implemented as a separate service (either interfaced by a regular unix-socket, ubus, status-file, you name it) you can still use the whole of your script.
As a second thought, the interface also has Rx packet counters you can base your anomaly assumption on. Granted this does not replace any check of bidirectonal connectivity (assuming this is what you after given the way of implementation currently there) however you can take this as a factor and alter your other means of detection on it.
Not everything a go-to implementation guidance, just my ideas how i would tackle this.
I have a ZTE MF281 with LTE.
If the connection is very busy, checkuplink often incorrectly recognizes the tunnel as dead and reconnects.
If the connection is heavily utilized, the latency of the client connections increases significantly.
Measured client ping RTT values through the tunnel of 800-1500ms on average, peak values up to 5000ms and occasional packet loss.
(LTE ping RTT from gluon without tunnel increases to approx. 350ms; no packetloss, stagnates there stably)
Checkuplink often incorrectly recognizes the tunnel as dead and reconnects. (Tunnel is not dead, just high latency!)
My Idea:
Increase the timeouts. (helps, but not enough)
+
Repeat the tests. If 3/3 fail --> dead.
(+ some high.latency.mode option for enable/disable the extended tests)
Code might look like this.
I'm testing this at the moment, looks promising.
The text was updated successfully, but these errors were encountered: