Skip to content
This repository has been archived by the owner on Oct 18, 2024. It is now read-only.

Building Availability SLO for Kafka Cluster Utilizing Strimzi-Canary Metrics #219

Open
OuesFa opened this issue Jun 22, 2023 · 0 comments
Open

Comments

@OuesFa
Copy link
Contributor

OuesFa commented Jun 22, 2023

I am working towards constructing a Service Level Objective (SLO) for our Kafka cluster's availability using Strimzi-Canary metrics. The aim is to have two distinct resources for the SLO: one to monitor consumption and the other for production.

For the Production SLI (Service Level Indicator), the plan is to employ strimzi_canary_records_produced as the reference for total events and strimzi_canary_records_produced_failed for unsuccessful events.

However, when it comes to the Consumption SLI, there doesn't seem to be a direct equivalent metric for 'failed' events as in production. The closest metric I can find is consumer_error_total.

Would love to hear your thoughts on this approach and any suggestions on how I could effectively establish my Consumption SLO. Is there a more suitable method or metrics that I should consider?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant