One of the Testnet clients gets often marked as down in Grafana, despite performing work #3387

michalinacienciala · 2022-10-27T15:43:57Z

A strange behavior was noticed on keep-client-3-0 (0x3FF855895EF4aC833c32Ab6A0d6C7fBfA137E26E) - it's Grafana uptime graph since 25 Oct looks pretty fragmented:

By looking into logs I noticed that the client was restarted near the time of the first reported downtime (2022-10-25 ~02:50 CEST). I can also see in logs that during the periods of downtime reported in Grafana there was activity going on on the client - for example in Grafana there is a period of downtime for the client on 2022-10-25 14:40-16:10, but during that time the client was doing work (for example was involved in the tBTC DKG started at 2022-10-25 15:15:01.182 CEST).

Uptime data for the client (taken from Grafana):
downloaded-logs-20220923-143058.csv

Further investigation of the issue is needed.

The text was updated successfully, but these errors were encountered:

michalinacienciala · 2022-10-28T11:14:00Z

There is a problem with discovery of the client:

{"address":"10.102.1.79", "level":"warn", "msg":"network address is not reachable", "networkPort":3307, "peer":"0x3FF855895EF4aC833c32Ab6A0d6C7fBfA137E26E", "ts":"2022-10-28T10:40:31.702658546Z"}

In the log we can see port 3307, it should be 3919.

Querying bootstrap node (bootstrap node curl bst-a01.test.keep.boar.network:9601/diagnostics) returns:

{
    "chain_address": "0x3FF855895EF4aC833c32Ab6A0d6C7fBfA137E26E",
    "multiaddrs":
    [
        "/ip4/127.0.0.1/tcp/3919",
        "/ip4/104.154.211.185/tcp/3307",
        "/ip4/10.102.1.79/tcp/3919"
    ],
    "network_id": "16Uiu2HAm8KJX32kr3eYUhDuzwTucSfAfspnjnXNf9veVhB12t6Vf"
}

This may be something with the diagnostics output from the bootstrap node.
To handle this correctly in the discovery we need to implement keep-network/prometheus-sd#2.

michalinacienciala added the 📟 client label Oct 27, 2022

michalinacienciala changed the title ~~One of the Testnet clients gets often mark as down in Grafana, despite performing work~~ One of the Testnet clients gets often marked as down in Grafana, despite performing work Oct 27, 2022

pdyraga modified the milestones: v2.0.0-m3, v2.0.0-m4 Nov 21, 2022

pdyraga removed this from the v2.0.0-m4 milestone Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One of the Testnet clients gets often marked as down in Grafana, despite performing work #3387

One of the Testnet clients gets often marked as down in Grafana, despite performing work #3387

michalinacienciala commented Oct 27, 2022 •

edited

Loading

michalinacienciala commented Oct 28, 2022

One of the Testnet clients gets often marked as down in Grafana, despite performing work #3387

One of the Testnet clients gets often marked as down in Grafana, despite performing work #3387

Comments

michalinacienciala commented Oct 27, 2022 • edited Loading

michalinacienciala commented Oct 28, 2022

michalinacienciala commented Oct 27, 2022 •

edited

Loading