Skip to content

Commit

Permalink
Remove tetragon_msg_op_total metric
Browse files Browse the repository at this point in the history
tetragon_msg_op_total was counting events per opcode in the ring buffer queue.
It wasn't particularly useful, as there are other metrics exposing similar
numbers:
* tetragon_bpf_missed_events_total counting missed events per opcode in BPF
* tetragon_observer_ringbuf_queue_events_received_total counting total events
  received in the ring buffer queue
* tetragon_events_total counting events per event type in grpc

If needed, in the future we can add opcode label to metrics counting events in
the observer:
* tetragon_observer_ringbuf_events_received_total
* tetragon_observer_ringbuf_queue_events_received_total
* tetragon_observer_ringbuf_queue_events_lost_total

We could also add a metric counting all events (not only missed) per opcode in
BPF. However, it's unclear if they could be useful - ringbuffer and events
queue shouldn't discriminate different types of events, so having total counts
of successful and missed events at each stage should be enough to troubleshoot
capacity issues. There is still a per event type counter at the last stage, for
monitoring overall data volume.

Signed-off-by: Anna Kapuscinska <[email protected]>
  • Loading branch information
lambdanis committed Sep 1, 2024
1 parent ed28783 commit 7791130
Show file tree
Hide file tree
Showing 4 changed files with 2 additions and 28 deletions.
2 changes: 2 additions & 0 deletions contrib/upgrade-notes/latest.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,5 @@ tetragon:
* `tetragon_ringbuf_perf_event_lost_total` -> `tetragon_observer_ringbuf_events_lost_total`
* `tetragon_ringbuf_queue_received_total` -> `tetragon_observer_ringbuf_queue_events_received_total`
* `tetragon_ringbuf_queue_lost_total` -> `tetragon_observer_ringbuf_queue_events_lost_total`
* `tetragon_msg_op_total` metric is removed. `tetragon_observer_ringbuf_queue_events_received_total` or
`tetragon_events_total` can be used as a replacement, depending on the use case.
8 changes: 0 additions & 8 deletions docs/content/en/docs/reference/metrics.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 0 additions & 19 deletions pkg/metrics/opcodemetrics/opcodemetrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,6 @@ import (
)

var (
MsgOpsCount = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: consts.MetricsNamespace,
Name: "msg_op_total",
Help: "The total number of times we encounter a given message opcode. For internal use only.",
ConstLabels: nil,
}, []string{"msg_op"})

LatencyStats = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Namespace: consts.MetricsNamespace,
Name: "handling_latency",
Expand All @@ -30,26 +23,14 @@ var (
)

func RegisterMetrics(group metrics.Group) {
group.MustRegister(MsgOpsCount)
group.MustRegister(LatencyStats)
}

func InitMetrics() {
// Initialize all metrics
for opcode := range ops.OpCodeStrings {
if opcode != ops.MSG_OP_UNDEF && opcode != ops.MSG_OP_TEST {
GetOpTotal(opcode).Add(0)
LatencyStats.WithLabelValues(fmt.Sprint(int32(opcode)))
}
}
}

// Get a new handle on a msgOpsCount metric for an OpCode
func GetOpTotal(opcode ops.OpCode) prometheus.Counter {
return MsgOpsCount.WithLabelValues(fmt.Sprint(int32(opcode)))
}

// Increment an msgOpsCount for an OpCode
func OpTotalInc(opcode ops.OpCode) {
GetOpTotal(opcode).Inc()
}
1 change: 0 additions & 1 deletion pkg/observer/observer.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ func (k *Observer) receiveEvent(data []byte) {
}

op, events, err := HandlePerfData(data)
opcodemetrics.OpTotalInc(ops.OpCode(op))
if err != nil {
// Increment error metrics
errormetrics.ErrorTotalInc(errormetrics.HandlerError)
Expand Down

0 comments on commit 7791130

Please sign in to comment.