Store availability data for hosts #1

mohierf · 2015-07-09T02:28:31Z

NOTE still some fixes to be made ... do not use on production servers !

The module manage host_check_result broks to compute and store availability data for all known hosts on a daily basis.

For every day, a document is stored in the availability collection with following fields :

hostname/service
day (YYYY-MM-DD) and day_ts (timestamp representing day at 00:00)
first received check state and timestamp
last received check state and timestamp
period for 0 state (UP)
period for 1 state (DOWN)
period for 2 state (UNREACHABLE)
period for 3 state (UNKNOWN)
period for 4 state (UNCHECKED)
host has been in downtime : 0/1

The sum of the 5 stored periods is always 86400, as the number of seconds per day. Before the first received check, the host is considered as in an UNCHECKED period, as well as after the last received check.

The Shinken WebUI uses this data collection to display availability information for each host (see shinken-monitoring/mod-webui#260).

mohierf · 2015-07-09T17:00:11Z

NOTE still some fixes to be made ... do not use on production servers !

maethor · 2015-07-22T11:40:08Z

To be sure I understand well. This collection is updated every time the mongo-logs get a new log for the hostname/service. So "period for UNCHECKED" is initialized to 86400, and decremented when we increment the others values? Am I right?

So when we query the availability from the WebUI, we only compute percentages of 86400?

Is it computing availability for all services, or only for hosts?

mohierf · 2015-07-22T11:49:28Z

You are right ... it is almost a real time information :-)

At the moment, I only implemented host checks but it will be reaaly simple to make it for all services.

I noticed some problems with this simple strategy :

you do not always get 100% of 86400 seconds, because first and last checks in the day are not received at 00:00 and 24:00 ... so you lose fews seconds every day!
you can not have availability information for periods smaller than a day

I have some ideas to cope with the first problem ... but I am not yet sure what is the best strategy ... to be discussed! @maethor

maethor · 2015-07-22T11:58:41Z

I plan to review entirely the source code of you plugin (to remove some if len(list) > 0:, for example :D), so in a few hours I will be happy to bring you some suggestion on the strategy :)

Availability for small period is quite hard. In fact, the best strategy to manage such things is the one used by perfdata databases. It consists in having precise information for the last hours, and then to aggregate the information more and more as the time goes. This is nice because we don't have to put any limit, and we are sure that the database size will not explode. But on the other hand, it can complexify a lot the implementation.

But I think I already have an idea to do this… :)

mohierf · 2015-07-22T12:02:58Z

Feel free to restart from scratch ...I simply made a moke-up to validate an idea that was to compute on the fly instead of parsing a big logs table in a database :-)

maethor · 2015-07-22T12:28:43Z

There is no need to restart from scratch. Your proof of concept is great :)

bittrance · 2016-08-24T15:24:22Z

What is the status of this feature? I see that building from latest that there is still no service-based availability in my mongo log. I am somewhat keen on implementing this. @mohierf, @maethor: any ideas/thoughts you want to share?

mohierf · 2016-08-24T15:55:19Z

@bittrance : as far as I remember (it's been quite a long time ...), you should have information for the hosts and the services.

The module log some information on start in the brokerd.log to inform about what it will manage. And you have some configuration parameters to include/exclude some services from the recording ... perharphs something to configure on your environment ?

I left this issue opened because @maethor had an idea for rewriting some part of the code.

bittrance · 2016-08-25T03:32:47Z

Indeed. Explicitly setting a serivces_filter resolves the issue. The text in the module config file says "default is to consider only the services which business impact is > 4". However, since services_filter is commented out in default config, https://github.com/shinken-monitoring/mod-mongo-logs/blob/master/module/module.py#L154 will actually leave filter_service_criticality unset, which means https://github.com/shinken-monitoring/mod-mongo-logs/blob/master/module/module.py#L373 will be bypassed. Which is right? should the default be services_filter = getattr(mod_conf, 'services_filter', 'bi:>=4') or should the docs in config file change?

mohierf · 2016-08-25T04:23:08Z

Because services_filter is commented out, it takes the default value defined in the source code and it is ... an empty string :(

You are right, we should change the doc in the configuration file !

mohierf added the enhancement label Jul 9, 2015

mohierf mentioned this issue Jul 9, 2015

Host availability tab in element view shinken-monitoring/mod-webui#260

Closed

mohierf added a commit that referenced this issue Jul 9, 2015

Store availability data for hosts (#1)

bbf7051

bittrance mentioned this issue Aug 25, 2016

services availability not shown bittrance/docker-shinken#4

Closed

mohierf added a commit that referenced this issue Aug 25, 2016

#1, #4 : error in configuration file comments

0776f21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store availability data for hosts #1

Store availability data for hosts #1

mohierf commented Jul 9, 2015

mohierf commented Jul 9, 2015

maethor commented Jul 22, 2015

mohierf commented Jul 22, 2015

maethor commented Jul 22, 2015

mohierf commented Jul 22, 2015

maethor commented Jul 22, 2015

bittrance commented Aug 24, 2016

mohierf commented Aug 24, 2016

bittrance commented Aug 25, 2016 •

edited

Loading

mohierf commented Aug 25, 2016

Store availability data for hosts #1

Store availability data for hosts #1

Comments

mohierf commented Jul 9, 2015

mohierf commented Jul 9, 2015

maethor commented Jul 22, 2015

mohierf commented Jul 22, 2015

maethor commented Jul 22, 2015

mohierf commented Jul 22, 2015

maethor commented Jul 22, 2015

bittrance commented Aug 24, 2016

mohierf commented Aug 24, 2016

bittrance commented Aug 25, 2016 • edited Loading

mohierf commented Aug 25, 2016

bittrance commented Aug 25, 2016 •

edited

Loading