You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently see nodes crashing under too many threads, when getting many queries. We are getting over 20% of the goroutines blocked on mutexes in IAVL just for seeing if we have the relevant version. (2878 goroutines stuck here)
But these are queries that don't specify a height, so this is an unneeded contention in the first place. Furthermore, many of the queries themselves are blocked on IAVL reads, so this significantly exacerbates the problem here, leading to crashes + all queries processing too slowly.
We should get query fetching the IAVL version to have lock-free mechanisms. E.g. a CAS operation to fetch a "supported versions" within IAVL, that we update with a CAS op on new block/prune. Or maybe just a CAS op to handle this for getting latest version.
Cosmos SDK Version
0.50
How to reproduce?
Run many slightly slow queries, e.g. cosmwasm queries. You are then liable to too many threads causing a node to crash. If you profile via pprof to get where the threads are, you see graphs as above.
The text was updated successfully, but these errors were encountered:
is this on iavl or the underlying db? goleveldb is not well optimised for workloads for single writer multiple reader. Have you tried testing this with pebbledb?
Is there an existing issue for this?
What happened?
We currently see nodes crashing under too many threads, when getting many queries. We are getting over 20% of the goroutines blocked on mutexes in IAVL just for seeing if we have the relevant version. (2878 goroutines stuck here)
But these are queries that don't specify a height, so this is an unneeded contention in the first place. Furthermore, many of the queries themselves are blocked on IAVL reads, so this significantly exacerbates the problem here, leading to crashes + all queries processing too slowly.
We should get query fetching the IAVL version to have lock-free mechanisms. E.g. a CAS operation to fetch a "supported versions" within IAVL, that we update with a CAS op on new block/prune. Or maybe just a CAS op to handle this for getting latest version.
Cosmos SDK Version
0.50
How to reproduce?
Run many slightly slow queries, e.g. cosmwasm queries. You are then liable to too many threads causing a node to crash. If you profile via pprof to get where the threads are, you see graphs as above.
The text was updated successfully, but these errors were encountered: