You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When NeonBee runs clustered, health checks are divided in node-specific and global health checks. If health information is requested via the HealthCheckHandler, node-specific checks are executed on every single node and the results will be consolidated in the HealthCheckRegistry. The current implementation, however, also executes the global checks in the same way as there is no differentiation between the type of check. All check results of a global check will be compared in the consolidateResults(...) method of the HealthCheckRegistry, and only the first check will be added to the list of consolidated checks (which will be returned by the HealthCheckHandler).
This is a problem, because when having a large cluster, global checks are executed on every single node in parallel. Depending, on the type of check (e.g. health request to an external service) this can cause high load on the external services.
Another issue with this implementation is that the NeonBee nodes not necessarily share the same configuration and thus might not be able to perform the health check. In that case, the consolidateResults method has redundant data. Therefore, we need to make it clear to the user where this check is performed, such that the required configuration can be set-up.
Desired Solution
A better implementation would only execute the global check once. It would be sufficient if the check is executed on the local node which invokes the health check handler. I think - for now - we do not have make it configurable on which node the check is executed, but this might be something we could keep in mind for the future in case there is demand.
Alternative Solutions
No response
Additional Context
To give more detail about the current implementation, here is some log output which would be generated in the HealthCheckRegistry.sendDataRequests(...), when logging the data object returned by the invoked data request of each HealthCheckVerticle. Assuming there is 3 verticles in a cluster with a global check service.feature-flags.health,
Here, NeonBee would report always the status of the HealthCheckVerticle which registered first in the shared map. The other check results are discarded. Also, notice that the node which runs neonbee/_healthCheckVerticle-69fc018f-eaaf-499f-acce-82a0752dc919 is not setup to authenticate against the service. If this verticle registered first, this failing status would always be returned.
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
The Problem
When NeonBee runs clustered, health checks are divided in node-specific and global health checks. If health information is requested via the
HealthCheckHandler
, node-specific checks are executed on every single node and the results will be consolidated in theHealthCheckRegistry
. The current implementation, however, also executes the global checks in the same way as there is no differentiation between the type of check. All check results of a global check will be compared in theconsolidateResults(...)
method of theHealthCheckRegistry
, and only the first check will be added to the list of consolidated checks (which will be returned by theHealthCheckHandler
).This is a problem, because when having a large cluster, global checks are executed on every single node in parallel. Depending, on the type of check (e.g. health request to an external service) this can cause high load on the external services.
Another issue with this implementation is that the NeonBee nodes not necessarily share the same configuration and thus might not be able to perform the health check. In that case, the
consolidateResults
method has redundant data. Therefore, we need to make it clear to the user where this check is performed, such that the required configuration can be set-up.Desired Solution
A better implementation would only execute the global check once. It would be sufficient if the check is executed on the local node which invokes the health check handler. I think - for now - we do not have make it configurable on which node the check is executed, but this might be something we could keep in mind for the future in case there is demand.
Alternative Solutions
No response
Additional Context
To give more detail about the current implementation, here is some log output which would be generated in the
HealthCheckRegistry.sendDataRequests(...)
, when logging thedata
object returned by the invoked data request of eachHealthCheckVerticle
. Assuming there is 3 verticles in a cluster with a global checkservice.feature-flags.health
,Here, NeonBee would report always the status of the HealthCheckVerticle which registered first in the shared map. The other check results are discarded. Also, notice that the node which runs
neonbee/_healthCheckVerticle-69fc018f-eaaf-499f-acce-82a0752dc919
is not setup to authenticate against the service. If this verticle registered first, this failing status would always be returned.The text was updated successfully, but these errors were encountered: