Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: more precise mempool metrics #1158

Merged
merged 12 commits into from
Jan 3, 2024

Conversation

evan-forbes
Copy link
Member

Description

This PR introduces more precise mempool metrics to provide a clearer view in why txs are getting evicted or failing

Comment on lines 15 to 22
FailedPrecheck = "precheck"
FailedAdding = "adding"
FailedRecheck = "recheck"

EvictedTxIncomingFullMempool = "full-removed-incoming"
EvictedTxExistingFullMempool = "full-removed-existing"
EvictedTxExpiredBlocks = "expired-ttl-blocks"
EvictedTxExpiredTime = "expired-ttl-time"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by adding labels, we can more precisely track why txs are being rejected, failing, or being evicted

cmwaters
cmwaters previously approved these changes Dec 17, 2023
Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor nits. This could also be two different PRs

mempool/metrics.go Outdated Show resolved Hide resolved
mempool/metrics.go Outdated Show resolved Hide resolved
mempool/v1/mempool.go Outdated Show resolved Hide resolved
@@ -469,7 +481,10 @@ func (txmp *TxMempool) addNewTransaction(wtx *WrappedTx, checkTxRes *abci.Respon
"post_check_err", err,
)

txmp.metrics.FailedTxs.Add(1)
txmp.metrics.FailedTxs.With(mempool.FailedAdding).Add(1)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also trigger of abci code is not zero. In this case we should print the raw logs and code as the reason

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evan-forbes evan-forbes requested a review from cmwaters December 17, 2023 20:07
@evan-forbes evan-forbes enabled auto-merge (squash) December 17, 2023 20:07
cmwaters
cmwaters previously approved these changes Dec 17, 2023
Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@evan-forbes evan-forbes disabled auto-merge December 17, 2023 21:52
@evan-forbes
Copy link
Member Author

ran into an issue while testing this manually, which were fixed by the last two commits. apologies for the mistakes. this is confirmed to be working now tho

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was also thinking about this a little more and it might be also good to distinguish whether this came from another peer or from an RPC endpoint (i.e. directly from a client who does not have complete state). We can do this by checking the TxInfo in the CheckTx method if there is a sender ID or not (can't recall the exact names)

@@ -72,6 +72,7 @@ func PrometheusMetrics(namespace string, labelsAndValues ...string) *Metrics {
for i := 0; i < len(labelsAndValues); i += 2 {
labels = append(labels, labelsAndValues[i])
}
typedCounterLabels := append(append(make([]string, 0, len(labels)+1), labels...), TypeLabel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just append the TypeLabel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably could since I don't think we're writing to it later, but I've ran into too many difficult to find slice bugs in my life.

Since we're not overwriting an existing var, my fingers literally won't let me use append using an existing slice.

@evan-forbes
Copy link
Member Author

We can do this by checking the TxInfo in the CheckTx method if there is a sender ID or not (can't recall the exact names)

nice! good idea. the senderID should be 0, correct?

checkTxRes.Code,
checkTxRes.Codespace,
checkTxRes.Log,
wtx.HasPeer(0), // this checks if the peer id is local
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this will work as intended, but if there is no peer id, then it looks like the value will be zero. I did not check tho. cc @cmwaters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks logically sound to me. It's a new transaction so there can't be any other peers but the sender

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also be tracing this for recheckTx

checkTxRes.Code,
checkTxRes.Codespace,
checkTxRes.Log,
wtx.HasPeer(0), // this checks if the peer id is local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks logically sound to me. It's a new transaction so there can't be any other peers but the sender

checkTxRes.Code,
checkTxRes.Codespace,
checkTxRes.Log,
wtx.HasPeer(0), // this checks if the peer id is local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also be tracing this for recheckTx

@rootulp rootulp self-requested a review January 2, 2024 19:51
@evan-forbes evan-forbes merged commit 0d5f417 into v0.34.x-celestia Jan 3, 2024
17 checks passed
@evan-forbes evan-forbes deleted the evan/more-precise-metrics branch January 3, 2024 15:42
evan-forbes added a commit that referenced this pull request Jan 3, 2024
This PR introduces more precise mempool metrics to provide a clearer
view in why txs are getting evicted or failing
evan-forbes added a commit that referenced this pull request Jan 5, 2024
## Description

same as #1158 but targeted at main
@faddat faddat mentioned this pull request Feb 22, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants