Replies: 3 comments 1 reply
-
@wally007 As per system manager network calls, it does invoke below Rest calls to decide on these health alerts. We could map them to EMS or write a plugin/template which provides the same.
|
Beta Was this translation helpful? Give feedback.
-
It'd be great if it was added to EMS - but looking at the list you've provided - it doesnt seem to be the full list of what System Manager uses. In my previous issue when I reported that Harvest does not include EMS alerts in its health reporting This event doesnt seem to be among the events you've listed above. |
Beta Was this translation helpful? Give feedback.
-
Hi @cgrinds , Harvest's EMS template lets you pick specific EMS events and alert on those. We shipped with 60 high value EMS events, but I understand that doesn't fit your need. -> Correct. While it is great that one can cherry pick each EMS event separately, we just want to know overall cluster health status in Grafana and alert on it. Sounds like you'd like an EMS based cluster health roll-up. We'll take a look at adding that. Looks straightforward to build something similar to System Manager. SM is asking for the last 20 or 40 EMS events that have a severity of alert|error|emergency... -> This would be great. Harvest can do something similar and publish a cluster_health or cluster_health_status. If there are zero events of severity alert|error|emergency the value would be 0, otherwise 1, 2, 3, etc. Does that capture your request? -> Yes, this would be great (ability to get alert|error|emergency separately would be great) but it's okay if its grouped into a single metric. Thank you ! |
Beta Was this translation helpful? Give feedback.
-
Hello,
thank you for implementing EMS collector which we hope will fix issues with our health monitoring we had few months ago
#885
After reading the documentation for EMS collector - I would like to get clarification on how we can get overall health of the cluster as far as EMS collector is concerned ?
From what I read, we would have to write, collect and alert on 3k metrics if we wanted to replicate "Health" like is in the System Manager, is that correct ?
Is there anyway for harvest to just give us an overall EMS health ?
Or maybe change the behavior of the existing metric "cluster_new_status" to include EMS alerts ?
Or introduce a new metric that would provide overall EMS health ?
Thank you
Beta Was this translation helpful? Give feedback.
All reactions