Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent panic - concurrent write to websocket #3699

Closed
2 tasks done
thevops opened this issue Aug 18, 2023 · 4 comments
Closed
2 tasks done

Silent panic - concurrent write to websocket #3699

thevops opened this issue Aug 18, 2023 · 4 comments
Assignees
Milestone

Comments

@thevops
Copy link
Contributor

thevops commented Aug 18, 2023

Hello,

I had a problem with my tBTC node. It failed silently but didn't stop completely. Peers/bootstraps connections worked, but Prometheus monitoring (https://monitoring.threshold.network/grafana/public-dashboards/1a09fa3a621c4837988b36f2d6ae6e24?orgId=0) lost him for some time (until I noticed it by viewing the above Prometheus).
I also noticed that endpoint /metrics worked after the failure, but /diagnostics did not (infinite loading).
In that case, the app should exit with an error code, then my system would start it again (I use containers).

I had to manually restart the node to be discovered by the monitoring to be qualified for rewards.

Logs

2023-08-17 03:21:21.933 {"level":"info","ts":"2023-08-17T01:21:21.933Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}
2023-08-17 03:20:21.934 {"level":"info","ts":"2023-08-17T01:20:21.933Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}
2023-08-17 03:19:21.934 {"level":"info","ts":"2023-08-17T01:19:21.934Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}
2023-08-17 03:18:21.933 {"level":"info","ts":"2023-08-17T01:18:21.933Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}
2023-08-17 03:17:33.277 	/usr/local/go/src/net/http/server.go:3071 +0x4db
2023-08-17 03:17:33.277 created by net/http.(*Server).Serve
2023-08-17 03:17:33.277 	/usr/local/go/src/net/http/server.go:1966 +0x5d7
2023-08-17 03:17:33.277 net/http.(*conn).serve(0xc00151b0e0, {0x1d6fb38, 0xc013641020})
2023-08-17 03:17:33.276 	/usr/local/go/src/net/http/server.go:2916 +0x43b
2023-08-17 03:17:33.276 net/http.serverHandler.ServeHTTP({0xc001849200?}, {0x1d6f010, 0xc0017162a0}, 0xc001865b00)
2023-08-17 03:17:33.276 	/usr/local/go/src/net/http/server.go:2462 +0x149
2023-08-17 03:17:33.276 net/http.(*ServeMux).ServeHTTP(0x0?, {0x1d6f010, 0xc0017162a0}, 0xc001865b00)
2023-08-17 03:17:33.276 	/usr/local/go/src/net/http/server.go:2084 +0x2f
2023-08-17 03:17:33.276 net/http.HandlerFunc.ServeHTTP(0x7f2ab35d92e8?, {0x1d6f010?, 0xc0017162a0?}, 0x410a65?)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/pkg/clientinfo/clientinfo.go:61 +0x39
2023-08-17 03:17:33.276 github.com/keep-network/keep-common/pkg/clientinfo.(*Registry).EnableServer.func2({0x1d6f010?, 0xc0017162a0}, 0xc243e206bf9a?)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/pkg/clientinfo/diagnostics.go:26 +0x24e
2023-08-17 03:17:33.276 github.com/keep-network/keep-common/pkg/clientinfo.(*Registry).exposeDiagnostics(0xc0139a8b00)
2023-08-17 03:17:33.276 	/go/src/github.com/keep-network/keep-core/pkg/clientinfo/diagnostics.go:135 +0x26
2023-08-17 03:17:33.276 github.com/keep-network/keep-core/pkg/clientinfo.(*Registry).RegisterBtcChainInfoSource.func1()
2023-08-17 03:17:33.276 	/go/src/github.com/keep-network/keep-core/pkg/bitcoin/electrum/electrum.go:259 +0x2e
2023-08-17 03:17:33.276 github.com/keep-network/keep-core/pkg/bitcoin/electrum.(*Connection).GetLatestBlockHeight(0x4?)
2023-08-17 03:17:33.276 	/go/src/github.com/keep-network/keep-core/pkg/bitcoin/electrum/electrum.go:763 +0x85
2023-08-17 03:17:33.276 github.com/keep-network/keep-core/pkg/bitcoin/electrum.requestWithRetry[...](0x18?, 0x14bd160?)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/pkg/wrappers/wrappers.go:76
2023-08-17 03:17:33.276 github.com/keep-network/keep-common/pkg/wrappers.DoWithDefaultRetry(...)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/pkg/wrappers/wrappers.go:34 +0xff
2023-08-17 03:17:33.276 github.com/keep-network/keep-common/pkg/wrappers.DoWithRetry({0x1d6fac8?, 0xc000048028?}, 0x3b9aca00, 0x0?, 0x4?, 0xc0131bd7a8)
2023-08-17 03:17:33.276 	/go/src/github.com/keep-network/keep-core/pkg/bitcoin/electrum/electrum.go:775 +0xfb
2023-08-17 03:17:33.276 github.com/keep-network/keep-core/pkg/bitcoin/electrum.requestWithRetry[...].func1()
2023-08-17 03:17:33.276 	/go/src/github.com/keep-network/keep-core/pkg/bitcoin/electrum/electrum.go:262 +0x36
2023-08-17 03:17:33.276 github.com/keep-network/keep-core/pkg/bitcoin/electrum.(*Connection).GetLatestBlockHeight.func1({0x1d6fb00?, 0xc001cb5aa0?}, 0xc001e336b0?)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/electrum/subscribe.go:31 +0x7b
2023-08-17 03:17:33.276 github.com/checksum0/go-electrum/electrum.(*Client).SubscribeHeaders(0xc013a16700, {0x1d6fb00, 0xc001cb5aa0})
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/electrum/network.go:296 +0x1db
2023-08-17 03:17:33.276 github.com/checksum0/go-electrum/electrum.(*Client).request(0xc013a16700, {0x1d6fb00, 0xc001cb5aa0}, {0x19017d1, 0x1c}, {0x2afa898, 0x0, 0x0}, {0x142f0e0, 0xc001b9fd58})
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/keep-network/[email protected]/electrum/transport_ws.go:91 +0x16a
2023-08-17 03:17:33.276 github.com/checksum0/go-electrum/electrum.(*WebSocketTransport).SendMessage(0x16069e0?, {0xc0135129c0?, 0x459b6d?, 0xc001abc000?})
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:773 +0x152
2023-08-17 03:17:33.276 github.com/gorilla/websocket.(*Conn).WriteMessage(0x2ac6ba0?, 0xc0135129c0?, {0xc0135129c0, 0x40, 0x40})
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:520 +0x45
2023-08-17 03:17:33.276 github.com/gorilla/websocket.(*Conn).NextWriter(0xc0136bde40, 0x1)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:480 +0x42
2023-08-17 03:17:33.276 github.com/gorilla/websocket.(*Conn).beginMessage(0xc0136bde40, 0xc0138aef90, 0x1)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:731 +0x45
2023-08-17 03:17:33.276 github.com/gorilla/websocket.(*messageWriter).Close(0x0?)
2023-08-17 03:17:33.276 	/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:617 +0x52b
2023-08-17 03:17:33.276 github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc013526c90, 0x1, {0x0?, 0x0?, 0x0?})
2023-08-17 03:17:33.276 	/usr/local/go/src/runtime/panic.go:844 +0x258
2023-08-17 03:17:33.276 panic({0x1484a00, 0x1d5f640})
2023-08-17 03:17:33.276 	/usr/local/go/src/net/http/server.go:1825 +0xbf
2023-08-17 03:17:33.276 net/http.(*conn).serve.func1()
2023-08-17 03:17:33.276 goroutine 1178757 [running]:
2023-08-17 03:17:33.276 2023/08/17 01:17:33 http: panic serving 10.140.1.234:55572: concurrent write to websocket connection
2023-08-17 03:17:21.934 {"level":"info","ts":"2023-08-17T01:17:21.934Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}
2023-08-17 03:16:21.934 {"level":"info","ts":"2023-08-17T01:16:21.934Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}
2023-08-17 03:15:21.933 {"level":"info","ts":"2023-08-17T01:15:21.933Z","logger":"keep-libp2p","caller":"libp2p/libp2p.go:242","msg":"number of connected peers: [144]"}

Tasks

@pdyraga
Copy link
Member

pdyraga commented Aug 22, 2023

Probably related to checksum0/go-electrum#10.

@thevops
Copy link
Contributor Author

thevops commented Aug 28, 2023

Occurred once more 😞

2023/08/25 15:49:20 http: panic serving 10.121.1.207:35180: concurrent write to websocket connection
panic({0x1484a00, 0x1d5f640})
    /usr/local/go/src/runtime/panic.go:844 +0x258

@randyramig
Copy link

I'm seeing the exact same thing. And my uptime went from 99% to 86%. I've left my keep client in this state (because I'm now screwed for monthly rewards) if you need to debug.

@blitmore
Copy link

Node 18.118.184.107 is exhibiting the same behaviour - no 'diagnostics' response while 'metrics' does respond. Additionally, note it will lose rewards due to running v2.0.0-m4 and reporting it has no connectivity to the electrum node.

@dimpar dimpar added this to the v2.0.0-m5 milestone Sep 7, 2023
lukasz-zimnoch added a commit that referenced this issue Sep 14, 2023
Refs #3699

Depends on #3704

It happens that the keep-core client panics when interacting with the
`go-electrum` library. We noticed a "concurrent write to websocket
connection" error was thrown when calling `GetLatestBlockHeight()`
function. The stack trace leads to gorilla/websocket/WriteMessage which
is called from the `go-electrum` library. The latest block height call
was already wrapped with a read mutex, but we should also wrap it with
the write mutex to prevent such concurrent errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@dimpar @pdyraga @lukasz-zimnoch @randyramig @thevops @blitmore and others