Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History sync queries being too expensive #881

Open
deric opened this issue Jan 16, 2025 · 8 comments
Open

History sync queries being too expensive #881

deric opened this issue Jan 16, 2025 · 8 comments
Labels
area/schema bug Something isn't working

Comments

@deric
Copy link

deric commented Jan 16, 2025

I'm having issues with icingadb queries periodically overloading PostgreSQL backend. The query looks like this:

SELECT history.id, history.environment_id, history.endpoint_id, history.object_type, history.host_id, history.service_id, history.comment_history_id, history.downtime_history_id, history.flapping_history_id, history.notification_history_id, history.acknowledgement_history_id, history.state_history_id, history.event_type, history.event_time, history_host.id AS history_host_id, history_host.environment_id AS history_host_environment_id, history_host.name_checksum AS history_host_name_checksum, history_host.properties_checksum AS history_host_properties_checksum, history_host.name AS history_host_name, history_host.name_ci AS history_host_name_ci, history_host.display_name AS history_host_display_name, history_host.address AS history_host_address, history_host.address6 AS history_host_address6, history_host.address_bin AS history_host_address_bin, history_host.address6_bin AS history_host_address6_bin, history_host.checkcommand_name AS history_host_checkcommand_name, history_host.checkcommand_id AS history_host_checkcommand_id, history_host.max_check_attempts AS history_host_max_check_attempts, history_host.check_timeperiod_name AS history_host_check_timeperiod_name, history_host.check_timeperiod_id AS history_host_check_timeperiod_id, history_host.check_timeout AS history_host_check_timeout, history_host.check_interval AS history_host_check_interval, history_host.check_retry_interval AS history_host_check_retry_interval, history_host.active_checks_enabled AS history_host_active_checks_enabled, history_host.passive_checks_enabled AS history_host_passive_checks_enabled, history_host.event_handler_enabled AS history_host_event_handler_enabled, history_host.notifications_enabled AS history_host_notifications_enabled, history_host.flapping_enabled AS history_host_flapping_enabled, history_host.flapping_threshold_low AS history_host_flapping_threshold_low, history_host.flapping_threshold_high AS history_host_flapping_threshold_high, history_host.perfdata_enabled AS history_host_perfdata_enabled, history_host.eventcomman...

these queries generate over 250 GB in PostgreSQL temp because the result is probably too big to sort in memory (just 64 GB).

I guess I have to drop the history data... but my question is: does these queries really need to be so expensive?

Image

@oxzi oxzi added the bug Something isn't working label Jan 17, 2025
@oxzi
Copy link
Member

oxzi commented Jan 17, 2025

Thanks for this report. Having queries this expensive is definitely a bug.

But can you please provide us with some information to narrow this down?

  • How big is your history? How many rows are in the history table?
  • Do you know when this query is issued? I cannot immediately think of any code in Icinga DB itself that reads from the history, but only writes. Could this be triggered by the web module? If so, is it possible to find out under which URL this happens?

@deric
Copy link
Author

deric commented Jan 17, 2025

@oxzi PostgreSQL limits the maximum query size track_activity_query_size, so I was unable to get the full SQL. I'll try to increase that.

              table_name              | table_size | indexes_size | total_size 
--------------------------------------+------------+--------------+------------
 "public"."history"                   | 13 GB      | 22 GB        | 35 GB
 "public"."state_history"             | 19 GB      | 7223 MB      | 26 GB
 "public"."sla_history_state"         | 5271 MB    | 7845 MB      | 13 GB

history has around 60M rows.

select count(*) from "public"."history";
  count   
----------
 60302785

Could this be triggered by the web module?

Yes, it's possible. I wasn't able to narrow down the connected client yet.

It might be useful to introduce some partitioning to those tables.

@oxzi
Copy link
Member

oxzi commented Jan 17, 2025

Thanks for your fast reply.

@oxzi PostgreSQL limits the maximum query size track_activity_query_size, so I was unable to get the full SQL. I'll try to increase that.

That would be great, thanks.

history has around 60M rows.

Your history has become quite large. I don't want to put the blame on you, I just want to state the fact and that there are potentially performance issues linked to the size.

Unless you need all this date, you could enable the retention to shrink your history.

Out of curiosity and to get a better understanding of your and potentially similar setups: How old is the history? In which time interval were these 60M records created?

Could this be triggered by the web module?

Yes, it's possible. I wasn't able to narrow down the connected client yet.

@nilmerg: In the potential case this query was sent by Icinga DB Web, may this be related to the ticket number 56180?

It might be useful to introduce some partitioning to those tables.

Thanks. This is actually a good idea to look into.

@deric
Copy link
Author

deric commented Jan 17, 2025

Your history has become quite large. I don't want to put the blame on you, I just want to state the fact and that there are potentially performance issues linked to the size.

Yes, I'm definitely planning to shrink the data. Just thought this might help to optimize the performance.

Out of curiosity and to get a better understanding of your and potentially similar setups: How old is the history? In which time interval were these 60M records created?

If I'm parsing the timestamp correctly, the oldest data would be 2 years ago. We've tried to use icinga for tracking pings between hosts, but it was a bad idea.

The full query looks like this:

SELECT history.id, history.environment_id, history.endpoint_id, history.object_type, history.host_id, history.service_id, history.comment_history_id, history.downtime_history_id, history.flapping_history_id, history.notification_history_id, history.acknowledgement_history_id, history.state_history_id, history.event_type, history.event_time, history_host.id AS history_host_id, history_host.environment_id AS history_host_environment_id, history_host.name_checksum AS history_host_name_checksum, history_host.properties_checksum AS history_host_properties_checksum, history_host.name AS history_host_name, history_host.name_ci AS history_host_name_ci, history_host.display_name AS history_host_display_name, history_host.address AS history_host_address, history_host.address6 AS history_host_address6, history_host.address_bin AS history_host_address_bin, history_host.address6_bin AS history_host_address6_bin, history_host.checkcommand_name AS history_host_checkcommand_name, history_host.checkcommand_id AS history_host_checkcommand_id, history_host.max_check_attempts AS history_host_max_check_attempts, history_host.check_timeperiod_name AS history_host_check_timeperiod_name, history_host.check_timeperiod_id AS history_host_check_timeperiod_id, history_host.check_timeout AS history_host_check_timeout, history_host.check_interval AS history_host_check_interval, history_host.check_retry_interval AS history_host_check_retry_interval, history_host.active_checks_enabled AS history_host_active_checks_enabled, history_host.passive_checks_enabled AS history_host_passive_checks_enabled, history_host.event_handler_enabled AS history_host_event_handler_enabled, history_host.notifications_enabled AS history_host_notifications_enabled, history_host.flapping_enabled AS history_host_flapping_enabled, history_host.flapping_threshold_low AS history_host_flapping_threshold_low, history_host.flapping_threshold_high AS history_host_flapping_threshold_high, history_host.perfdata_enabled AS history_host_perfdata_enabled, history_host.eventcommand_name AS history_host_eventcommand_name, history_host.eventcommand_id AS history_host_eventcommand_id, history_host.is_volatile AS history_host_is_volatile, history_host.action_url_id AS history_host_action_url_id, history_host.notes_url_id AS history_host_notes_url_id, history_host.notes AS history_host_notes, history_host.icon_image_id AS history_host_icon_image_id, history_host.icon_image_alt AS history_host_icon_image_alt, history_host.zone_name AS history_host_zone_name, history_host.zone_id AS history_host_zone_id, history_host.command_endpoint_name AS history_host_command_endpoint_name, history_host.command_endpoint_id AS history_host_command_endpoint_id, history_host_state.host_id AS history_host_state_host_id, history_host_state.environment_id AS history_host_state_environment_id, history_host_state.state_type AS history_host_state_state_type, history_host_state.soft_state AS history_host_state_soft_state, history_host_state.hard_state AS history_host_state_hard_state, history_host_state.previous_soft_state AS history_host_state_previous_soft_state, history_host_state.previous_hard_state AS history_host_state_previous_hard_state, history_host_state.check_attempt AS history_host_state_check_attempt, history_host_state.severity AS history_host_state_severity, history_host_state.output AS history_host_state_output, history_host_state.long_output AS history_host_state_long_output, history_host_state.performance_data AS history_host_state_performance_data, history_host_state.normalized_performance_data AS history_host_state_normalized_performance_data, history_host_state.check_commandline AS history_host_state_check_commandline, history_host_state.is_problem AS history_host_state_is_problem, history_host_state.is_handled AS history_host_state_is_handled, history_host_state.is_reachable AS history_host_state_is_reachable, history_host_state.is_flapping AS history_host_state_is_flapping, history_host_state.is_overdue AS history_host_state_is_overdue, history_host_state.is_acknowledged AS history_host_state_is_acknowledged, history_host_state.acknowledgement_comment_id AS history_host_state_acknowledgement_comment_id, history_host_state.last_comment_id AS history_host_state_last_comment_id, history_host_state.in_downtime AS history_host_state_in_downtime, history_host_state.execution_time AS history_host_state_execution_time, history_host_state.latency AS history_host_state_latency, history_host_state.check_timeout AS history_host_state_check_timeout, history_host_state.check_source AS history_host_state_check_source, history_host_state.scheduling_source AS history_host_state_scheduling_source, history_host_state.last_update AS history_host_state_last_update, history_host_state.last_state_change AS history_host_state_last_state_change, history_host_state.next_check AS history_host_state_next_check, history_host_state.next_update AS history_host_state_next_update, history_service.id AS history_service_id, history_service.environment_id AS history_service_environment_id, history_service.name_checksum AS history_service_name_checksum, history_service.properties_checksum AS history_service_properties_checksum, history_service.host_id AS history_service_host_id, history_service.name AS history_service_name, history_service.name_ci AS history_service_name_ci, history_service.display_name AS history_service_display_name, history_service.checkcommand_name AS history_service_checkcommand_name, history_service.checkcommand_id AS history_service_checkcommand_id, history_service.max_check_attempts AS history_service_max_check_attempts, history_service.check_timeperiod_name AS history_service_check_timeperiod_name, history_service.check_timeperiod_id AS history_service_check_timeperiod_id, history_service.check_timeout AS history_service_check_timeout, history_service.check_interval AS history_service_check_interval, history_service.check_retry_interval AS history_service_check_retry_interval, history_service.active_checks_enabled AS history_service_active_checks_enabled, history_service.passive_checks_enabled AS history_service_passive_checks_enabled, history_service.event_handler_enabled AS history_service_event_handler_enabled, history_service.notifications_enabled AS history_service_notifications_enabled, history_service.flapping_enabled AS history_service_flapping_enabled, history_service.flapping_threshold_low AS history_service_flapping_threshold_low, history_service.flapping_threshold_high AS history_service_flapping_threshold_high, history_service.perfdata_enabled AS history_service_perfdata_enabled, history_service.eventcommand_name AS history_service_eventcommand_name, history_service.eventcommand_id AS history_service_eventcommand_id, history_service.is_volatile AS history_service_is_volatile, history_service.action_url_id AS history_service_action_url_id, history_service.notes_url_id AS history_service_notes_url_id, history_service.notes AS history_service_notes, history_service.icon_image_id AS history_service_icon_image_id, history_service.icon_image_alt AS history_service_icon_image_alt, history_service.zone_name AS history_service_zone_name, history_service.zone_id AS history_service_zone_id, history_service.command_endpoint_name AS history_service_command_endpoint_name, history_service.command_endpoint_id AS history_service_command_endpoint_id, history_service_state.service_id AS history_service_state_service_id, history_service_state.environment_id AS history_service_state_environment_id, history_service_state.state_type AS history_service_state_state_type, history_service_state.soft_state AS history_service_state_soft_state, history_service_state.hard_state AS history_service_state_hard_state, history_service_state.previous_soft_state AS history_service_state_previous_soft_state, history_service_state.previous_hard_state AS history_service_state_previous_hard_state, history_service_state.check_attempt AS history_service_state_check_attempt, history_service_state.severity AS history_service_state_severity, history_service_state.output AS history_service_state_output, history_service_state.long_output AS history_service_state_long_output, history_service_state.performance_data AS history_service_state_performance_data, history_service_state.normalized_performance_data AS history_service_state_normalized_performance_data, history_service_state.check_commandline AS history_service_state_check_commandline, history_service_state.is_problem AS history_service_state_is_problem, history_service_state.is_handled AS history_service_state_is_handled, history_service_state.is_reachable AS history_service_state_is_reachable, history_service_state.is_flapping AS history_service_state_is_flapping, history_service_state.is_overdue AS history_service_state_is_overdue, history_service_state.is_acknowledged AS history_service_state_is_acknowledged, history_service_state.acknowledgement_comment_id AS history_service_state_acknowledgement_comment_id, history_service_state.last_comment_id AS history_service_state_last_comment_id, history_service_state.in_downtime AS history_service_state_in_downtime, history_service_state.execution_time AS history_service_state_execution_time, history_service_state.latency AS history_service_state_latency, history_service_state.check_timeout AS history_service_state_check_timeout, history_service_state.check_source AS history_service_state_check_source, history_service_state.scheduling_source AS history_service_state_scheduling_source, history_service_state.last_update AS history_service_state_last_update, history_service_state.last_state_change AS history_service_state_last_state_change, history_service_state.next_check AS history_service_state_next_check, history_service_state.next_update AS history_service_state_next_update, history_comment.comment_id AS history_comment_comment_id, history_comment.environment_id AS history_comment_environment_id, history_comment.endpoint_id AS history_comment_endpoint_id, history_comment.object_type AS history_comment_object_type, history_comment.host_id AS history_comment_host_id, history_comment.service_id AS history_comment_service_id, history_comment.entry_time AS history_comment_entry_time, history_comment.author AS history_comment_author, history_comment.removed_by AS history_comment_removed_by, history_comment.comment AS history_comment_comment, history_comment.entry_type AS history_comment_entry_type, history_comment.is_persistent AS history_comment_is_persistent, history_comment.is_sticky AS history_comment_is_sticky, history_comment.expire_time AS history_comment_expire_time, history_comment.remove_time AS history_comment_remove_time, history_comment.has_been_removed AS history_comment_has_been_removed, history_downtime.downtime_id AS history_downtime_downtime_id, history_downtime.environment_id AS history_downtime_environment_id, history_downtime.endpoint_id AS history_downtime_endpoint_id, history_downtime.triggered_by_id AS history_downtime_triggered_by_id, history_downtime.parent_id AS history_downtime_parent_id, history_downtime.object_type AS history_downtime_object_type, history_downtime.host_id AS history_downtime_host_id, history_downtime.service_id AS history_downtime_service_id, history_downtime.entry_time AS history_downtime_entry_time, history_downtime.author AS history_downtime_author, history_downtime.cancelled_by AS history_downtime_cancelled_by, history_downtime.comment AS history_downtime_comment, history_downtime.is_flexible AS history_downtime_is_flexible, history_downtime.flexible_duration AS history_downtime_flexible_duration, history_downtime.scheduled_start_time AS history_downtime_scheduled_start_time, history_downtime.scheduled_end_time AS history_downtime_scheduled_end_time, history_downtime.start_time AS history_downtime_start_time, history_downtime.end_time AS history_downtime_end_time, history_downtime.scheduled_by AS history_downtime_scheduled_by, history_downtime.has_been_cancelled AS history_downtime_has_been_cancelled, history_downtime.trigger_time AS history_downtime_trigger_time, history_downtime.cancel_time AS history_downtime_cancel_time, history_flapping.id AS history_flapping_id, history_flapping.environment_id AS history_flapping_environment_id, history_flapping.endpoint_id AS history_flapping_endpoint_id, history_flapping.object_type AS history_flapping_object_type, history_flapping.host_id AS history_flapping_host_id, history_flapping.service_id AS history_flapping_service_id, history_flapping.start_time AS history_flapping_start_time, history_flapping.end_time AS history_flapping_end_time, history_flapping.percent_state_change_start AS history_flapping_percent_state_change_start, history_flapping.percent_state_change_end AS history_flapping_percent_state_change_end, history_flapping.flapping_threshold_low AS history_flapping_flapping_threshold_low, history_flapping.flapping_threshold_high AS history_flapping_flapping_threshold_high, history_notification.id AS history_notification_id, history_notification.environment_id AS history_notification_environment_id, history_notification.endpoint_id AS history_notification_endpoint_id, history_notification.object_type AS history_notification_object_type, history_notification.host_id AS history_notification_host_id, history_notification.service_id AS history_notification_service_id, history_notification.notification_id AS history_notification_notification_id, history_notification.type AS history_notification_type, history_notification.send_time AS history_notification_send_time, history_notification.state AS history_notification_state, history_notification.previous_hard_state AS history_notification_previous_hard_state, history_notification.author AS history_notification_author, history_notification.text AS history_notification_text, history_notification.users_notified AS history_notification_users_notified, history_acknowledgement.id AS history_acknowledgement_id, history_acknowledgement.environment_id AS history_acknowledgement_environment_id, history_acknowledgement.endpoint_id AS history_acknowledgement_endpoint_id, history_acknowledgement.object_type AS history_acknowledgement_object_type, history_acknowledgement.host_id AS history_acknowledgement_host_id, history_acknowledgement.service_id AS history_acknowledgement_service_id, history_acknowledgement.set_time AS history_acknowledgement_set_time, history_acknowledgement.clear_time AS history_acknowledgement_clear_time, history_acknowledgement.author AS history_acknowledgement_author, history_acknowledgement.cleared_by AS history_acknowledgement_cleared_by, history_acknowledgement.comment AS history_acknowledgement_comment, history_acknowledgement.expire_time AS history_acknowledgement_expire_time, history_acknowledgement.is_sticky AS history_acknowledgement_is_sticky, history_acknowledgement.is_persistent AS history_acknowledgement_is_persistent, history_state.id AS history_state_id, history_state.environment_id AS history_state_environment_id, history_state.endpoint_id AS history_state_endpoint_id, history_state.object_type AS history_state_object_type, history_state.host_id AS history_state_host_id, history_state.service_id AS history_state_service_id, history_state.event_time AS history_state_event_time, history_state.state_type AS history_state_state_type, history_state.soft_state AS history_state_soft_state, history_state.hard_state AS history_state_hard_state, history_state.check_attempt AS history_state_check_attempt, history_state.previous_soft_state AS history_state_previous_soft_state, history_state.previous_hard_state AS history_state_previous_hard_state, history_state.output AS history_state_output, history_state.long_output AS history_state_long_output, history_state.max_check_attempts AS history_state_max_check_attempts, history_state.check_source AS history_state_check_source, history_state.scheduling_source AS history_state_scheduling_source FROM history LEFT JOIN host history_host ON history_host.id = history.host_id LEFT JOIN host_state history_host_state ON history_host_state.host_id = history_host.id LEFT JOIN service history_service ON history_service.id = history.service_id LEFT JOIN service_state history_service_state ON history_service_state.service_id = history_service.id LEFT JOIN comment_history history_comment ON history_comment.comment_id = history.comment_history_id LEFT JOIN downtime_history history_downtime ON history_downtime.downtime_id = history.downtime_history_id LEFT JOIN flapping_history history_flapping ON history_flapping.id = history.flapping_history_id LEFT JOIN notification_history history_notification ON history_notification.id = history.notification_history_id LEFT JOIN acknowledgement_history history_acknowledgement ON history_acknowledgement.id = history.acknowledgement_history_id LEFT JOIN state_history history_state ON history_state.id = history.state_history_id WHERE (history.event_time <= $1) AND ((history_host.id IS NOT NULL) OR (history_service.id IS NOT NULL)) ORDER BY history_service_state.last_state_change desc LIMIT 11

The query is probably generated by IcingaDB web:

icingadb/history?view=detailed&limit=10&sort=service.state.last_state_change desc

@oxzi
Copy link
Member

oxzi commented Jan 17, 2025

Out of curiosity and to get a better understanding of your and potentially similar setups: How old is the history? In which time interval were these 60M records created?

If I'm parsing the timestamp correctly, the oldest data would be 2 years ago. We've tried to use icinga for tracking pings between hosts, but it was a bad idea.

Thanks for the insights. Depending on the amount of pings, some time series database might be the better option, I guess.

Thus, cleaning up the old data should at least result in a smaller history and potentially faster queries.

The query is probably generated by IcingaDB web:

icingadb/history?view=detailed&limit=10&sort=service.state.last_state_change desc

Indeed. Appending &format=sql to the URL shows a query similar to the posted one.

After taking a look at PostgreSQL's query plan via an EXPLAIN, the following was generated:

 Limit  (cost=2966.30..2966.33 rows=11 width=6827)
   ->  Sort  (cost=2966.30..3015.20 rows=19560 width=6827)
         Sort Key: history_service_state.last_state_change DESC
         ->  Hash Left Join  (cost=1248.20..2530.17 rows=19560 width=6827)
               Hash Cond: ((history.state_history_id)::bytea = (history_state.id)::bytea)
               ->  Hash Left Join  (cost=139.84..1370.46 rows=19560 width=6458)
                     Hash Cond: ((history.acknowledgement_history_id)::bytea = (history_acknowledgement.id)::bytea)
                     ->  Hash Left Join  (cost=138.28..1317.53 rows=19560 width=6166)
                           Hash Cond: ((history.notification_history_id)::bytea = (history_notification.id)::bytea)
                           ->  Hash Left Join  (cost=64.45..1192.34 rows=19560 width=5927)
                                 Hash Cond: ((history.flapping_history_id)::bytea = (history_flapping.id)::bytea)
                                 ->  Hash Left Join  (cost=46.80..1123.35 rows=19560 width=5715)
                                       Hash Cond: ((history.downtime_history_id)::bytea = (history_downtime.downtime_id)::bytea)
                                       ->  Hash Left Join  (cost=44.30..1069.49 rows=19560 width=4803)
                                             Hash Cond: ((history.comment_history_id)::bytea = (history_comment.comment_id)::bytea)
                                             ->  Hash Left Join  (cost=28.90..1002.74 rows=19560 width=4503)
                                                   Hash Cond: ((history_service.id)::bytea = (history_service_state.service_id)::bytea)
                                                   ->  Hash Left Join  (cost=9.74..930.22 rows=19560 width=3688)
                                                         Hash Cond: ((history.service_id)::bytea = (history_service.id)::bytea)
                                                         Filter: ((history_host.id IS NOT NULL) OR (history_service.id IS NOT NULL))
                                                         ->  Hash Left Join  (cost=2.31..870.37 rows=19560 width=3150)
                                                               Hash Cond: ((history_host.id)::bytea = (history_host_state.host_id)::bytea)
                                                               ->  Hash Left Join  (cost=1.16..786.78 rows=19560 width=2646)
                                                                     Hash Cond: ((history.host_id)::bytea = (history_host.id)::bytea)
                                                                     ->  Seq Scan on history  (cost=0.00..714.50 rows=19560 width=286)
                                                                           Filter: ((event_time)::bigint <= '1737121795000'::bigint)
                                                                     ->  Hash  (cost=1.07..1.07 rows=7 width=2360)
                                                                           ->  Seq Scan on host history_host  (cost=0.00..1.07 rows=7 width=2360)
                                                               ->  Hash  (cost=1.07..1.07 rows=7 width=504)
                                                                     ->  Seq Scan on host_state history_host_state  (cost=0.00..1.07 rows=7 width=504)
                                                         ->  Hash  (cost=6.08..6.08 rows=108 width=538)
                                                               ->  Seq Scan on service history_service  (cost=0.00..6.08 rows=108 width=538)
                                                   ->  Hash  (cost=17.96..17.96 rows=96 width=815)
                                                         ->  Seq Scan on service_state history_service_state  (cost=0.00..17.96 rows=96 width=815)
                                             ->  Hash  (cost=12.40..12.40 rows=240 width=300)
                                                   ->  Seq Scan on comment_history history_comment  (cost=0.00..12.40 rows=240 width=300)
                                       ->  Hash  (cost=2.22..2.22 rows=22 width=912)
                                             ->  Seq Scan on downtime_history history_downtime  (cost=0.00..2.22 rows=22 width=912)
                                 ->  Hash  (cost=13.40..13.40 rows=340 width=212)
                                       ->  Seq Scan on flapping_history history_flapping  (cost=0.00..13.40 rows=340 width=212)
                           ->  Hash  (cost=58.37..58.37 rows=1237 width=239)
                                 ->  Seq Scan on notification_history history_notification  (cost=0.00..58.37 rows=1237 width=239)
                     ->  Hash  (cost=1.25..1.25 rows=25 width=292)
                           ->  Seq Scan on acknowledgement_history history_acknowledgement  (cost=0.00..1.25 rows=25 width=292)
               ->  Hash  (cost=881.49..881.49 rows=18149 width=369)
                     ->  Seq Scan on state_history history_state  (cost=0.00..881.49 rows=18149 width=369)

For reference, without using your custom sort key - /icingadb/history?view=detailed&limit=10&format=sql -, a somewhat more optimized query plan was built:

 Limit  (cost=1.90..14.81 rows=11 width=6827)
   ->  Nested Loop Left Join  (cost=1.90..22954.61 rows=19560 width=6827)
         ->  Nested Loop Left Join  (cost=1.61..13601.00 rows=19560 width=6458)
               ->  Nested Loop Left Join  (cost=1.46..13107.37 rows=19560 width=6166)
                     ->  Nested Loop Left Join  (cost=1.18..12216.46 rows=19560 width=5927)
                           ->  Nested Loop Left Join  (cost=1.03..8945.04 rows=19560 width=5715)
                                 ->  Nested Loop Left Join  (cost=0.88..8451.55 rows=19560 width=4803)
                                       ->  Nested Loop Left Join  (cost=0.74..5229.03 rows=19560 width=4503)
                                             ->  Nested Loop Left Join  (cost=0.58..4649.27 rows=19560 width=3688)
                                                   ->  Nested Loop Left Join  (cost=0.44..4178.09 rows=19560 width=3184)
                                                         Join Filter: ((history_host.id)::bytea = (history.host_id)::bytea)
                                                         Filter: ((history_host.id IS NOT NULL) OR (history_service.id IS NOT NULL))
                                                         ->  Nested Loop Left Join  (cost=0.44..2263.22 rows=19560 width=824)
                                                               ->  Index Scan Backward using idx_history_event_time_event_type on history  (cost=0.29..1746.29 rows=19560 width=286)
                                                                     Index Cond: ((event_time)::bigint <= '1737121931000'::bigint)
                                                               ->  Memoize  (cost=0.15..0.17 rows=1 width=538)
                                                                     Cache Key: history.service_id
                                                                     Cache Mode: logical
                                                                     ->  Index Scan using pk_service on service history_service  (cost=0.14..0.16 rows=1 width=538)
                                                                           Index Cond: ((id)::bytea = (history.service_id)::bytea)
                                                         ->  Materialize  (cost=0.00..1.10 rows=7 width=2360)
                                                               ->  Seq Scan on host history_host  (cost=0.00..1.07 rows=7 width=2360)
                                                   ->  Memoize  (cost=0.14..1.87 rows=1 width=504)
                                                         Cache Key: history_host.id
                                                         Cache Mode: logical
                                                         ->  Index Scan using idx_host_state_host_id on host_state history_host_state  (cost=0.13..1.86 rows=1 width=504)
                                                               Index Cond: ((host_id)::bytea = (history_host.id)::bytea)
                                             ->  Memoize  (cost=0.15..0.87 rows=1 width=815)
                                                   Cache Key: history_service.id
                                                   Cache Mode: logical
                                                   ->  Index Scan using idx_service_state_service_id on service_state history_service_state  (cost=0.14..0.86 rows=1 width=815)
                                                         Index Cond: ((service_id)::bytea = (history_service.id)::bytea)
                                       ->  Index Scan using pk_comment_history on comment_history history_comment  (cost=0.14..0.16 rows=1 width=300)
                                             Index Cond: ((comment_id)::bytea = (history.comment_history_id)::bytea)
                                 ->  Memoize  (cost=0.15..0.17 rows=1 width=912)
                                       Cache Key: history.downtime_history_id
                                       Cache Mode: logical
                                       ->  Index Scan using pk_downtime_history on downtime_history history_downtime  (cost=0.14..0.16 rows=1 width=912)
                                             Index Cond: ((downtime_id)::bytea = (history.downtime_history_id)::bytea)
                           ->  Index Scan using pk_flapping_history on flapping_history history_flapping  (cost=0.15..0.17 rows=1 width=212)
                                 Index Cond: ((id)::bytea = (history.flapping_history_id)::bytea)
                     ->  Memoize  (cost=0.29..0.32 rows=1 width=239)
                           Cache Key: history.notification_history_id
                           Cache Mode: logical
                           ->  Index Scan using pk_notification_history on notification_history history_notification  (cost=0.28..0.31 rows=1 width=239)
                                 Index Cond: ((id)::bytea = (history.notification_history_id)::bytea)
               ->  Memoize  (cost=0.15..0.17 rows=1 width=292)
                     Cache Key: history.acknowledgement_history_id
                     Cache Mode: logical
                     ->  Index Scan using pk_acknowledgement_history on acknowledgement_history history_acknowledgement  (cost=0.14..0.16 rows=1 width=292)
                           Index Cond: ((id)::bytea = (history.acknowledgement_history_id)::bytea)
         ->  Index Scan using pk_state_history on state_history history_state  (cost=0.29..0.48 rows=1 width=369)
               Index Cond: ((id)::bytea = (history.state_history_id)::bytea)

The cost shrank from cost=2966.30..2966.33 to cost=1.90..14.81.

To be sure, I compared the two queries and the only difference was the ORDER BY. Thus, an index may help, but unfortunately there is already a straightforward index for the field itself.

CREATE INDEX idx_service_state_last_state_change ON service_state(last_state_change);

@lippserd
Copy link
Member

@deric Could you please test the performance of the following query and share the query plan?

SELECT * FROM history
INNER JOIN host_state history_host_state ON history_host_state.host_id = history.host_id
INNER JOIN service history_service ON history_service.id = history.service_id
INNER JOIN service_state history_service_state ON history_service_state.service_id = history.service_id
LEFT JOIN comment_history history_comment ON history_comment.comment_id = history.comment_history_id
LEFT JOIN downtime_history history_downtime ON history_downtime.downtime_id = history.downtime_history_id
LEFT JOIN flapping_history history_flapping ON history_flapping.id = history.flapping_history_id
LEFT JOIN notification_history history_notification ON history_notification.id = history.notification_history_id
LEFT JOIN acknowledgement_history history_acknowledgement ON history_acknowledgement.id = history.acknowledgement_history_id
LEFT JOIN state_history history_state ON history_state.id = history.state_history_id
WHERE history.id IN (
  SELECT history.id FROM history
  INNER JOIN host history_host ON history_host.id = history.host_id
  INNER JOIN service_state history_service_state ON history_service_state.service_id = history.service_id
  WHERE (history.event_time <= 1735686000000)
  ORDER BY history_service_state.last_state_change DESC LIMIT 26
);

@lippserd
Copy link
Member

And maybe fix the timestamp I've used 😆

@deric
Copy link
Author

deric commented Jan 17, 2025

@lippserd it would produce following query plan:

 Nested Loop Left Join  (cost=200919.74..3242886.88 rows=10 width=5550) (actual time=4790.756..12811.623 rows=26 loops=1)
   ->  Nested Loop Left Join  (cost=200919.18..3242879.66 rows=10 width=4399) (actual time=4790.747..12806.814 rows=26 loops=1)
         ->  Hash Semi Join  (cost=200919.04..3242877.93 rows=10 width=4206) (actual time=4790.734..12806.713 rows=26 loops=1)
               Hash Cond: ((history.id)::bytea = ("ANY_subquery".id)::bytea)
               ->  Gather  (cost=200779.19..3218342.44 rows=9293455 width=4206) (actual time=1597.321..10965.357 rows=3237159 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Parallel Hash Left Join  (cost=199779.19..2287996.94 rows=3872273 width=4206) (actual time=1572.231..9420.434 rows=1079053 loops=3)
                           Hash Cond: ((history.notification_history_id)::bytea = (history_notification.id)::bytea)
                           ->  Hash Left Join  (cost=40231.44..2118284.48 rows=3872273 width=4010) (actual time=18.023..7339.819 rows=1079053 loops=3)
                                 Hash Cond: ((history.flapping_history_id)::bytea = (history_flapping.id)::bytea)
                                 ->  Hash Left Join  (cost=40231.41..2108119.73 rows=3872273 width=3798) (actual time=18.004..7006.083 rows=1079053 loops=3)
                                       Hash Cond: ((history.downtime_history_id)::bytea = (history_downtime.downtime_id)::bytea)
                                       ->  Hash Left Join  (cost=39066.49..2096790.10 rows=3872273 width=3031) (actual time=17.859..6674.973 rows=1079053 loops=3)
                                             Hash Cond: ((history.comment_history_id)::bytea = (history_comment.comment_id)::bytea)
                                             ->  Parallel Hash Join  (cost=39066.46..2086625.35 rows=3872273 width=2731) (actual time=17.835..6388.696 rows=1079053 loops=3)
                                                   Hash Cond: ((history_service.id)::bytea = (history_service_state.service_id)::bytea)
                                                   ->  Hash Join  (cost=15480.24..2052868.31 rows=3872273 width=1400) (actual time=6.821..5877.865 rows=1079053 loops=3)
                                                         Hash Cond: ((history.host_id)::bytea = (history_host_state.host_id)::bytea)
                                                         ->  Parallel Hash Join  (cost=14995.61..2042078.50 rows=3872273 width=775) (actual time=6.318..5462.833 rows=1079053 loops=3)
                                                               Hash Cond: ((history.service_id)::bytea = (history_service.id)::bytea)
                                                               ->  Parallel Seq Scan on history  (cost=0.00..1999676.65 rows=10438088 width=280) (actual time=0.028..4583.359 rows=3367325 loops=3)
                                                               ->  Parallel Hash  (cost=14734.25..14734.25 rows=8042 width=495) (actual time=6.190..6.191 rows=6615 loops=3)
                                                                     Buckets: 32768  Batches: 1  Memory Usage: 7424kB
                                                                     ->  Parallel Seq Scan on service history_service  (cost=0.00..14734.25 rows=8042 width=495) (actual time=0.003..2.776 rows=6615 loops=3)
                                                         ->  Hash  (cost=456.74..456.74 rows=858 width=625) (actual time=0.484..0.484 rows=858 loops=3)
                                                               Buckets: 1024  Batches: 1  Memory Usage: 493kB
                                                               ->  Seq Scan on host_state history_host_state  (cost=0.00..456.74 rows=858 width=625) (actual time=0.017..0.263 rows=858 loops=3)
                                                   ->  Parallel Hash  (cost=23319.14..23319.14 rows=8218 width=1331) (actual time=10.920..10.921 rows=6615 loops=3)
                                                         Buckets: 32768  Batches: 1  Memory Usage: 13344kB
                                                         ->  Parallel Index Scan using idx_service_state_service_id on service_state history_service_state  (cost=0.41..23319.14 rows=8218 width=1331) (actual time=0.020..6.156 rows=6615 loops=3)
                                             ->  Hash  (cost=0.00..0.00 rows=1 width=300) (actual time=0.009..0.010 rows=0 loops=3)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                                   ->  Seq Scan on comment_history history_comment  (cost=0.00..0.00 rows=1 width=300) (actual time=0.009..0.009 rows=0 loops=3)
                                       ->  Hash  (cost=1161.11..1161.11 rows=117 width=767) (actual time=0.130..0.131 rows=91 loops=3)
                                             Buckets: 1024  Batches: 1  Memory Usage: 33kB
                                             ->  Index Scan using pk_downtime_history on downtime_history history_downtime  (cost=0.39..1161.11 rows=117 width=767) (actual time=0.014..0.106 rows=91 loops=3)
                                 ->  Hash  (cost=0.00..0.00 rows=1 width=212) (actual time=0.007..0.007 rows=0 loops=3)
                                       Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                       ->  Seq Scan on flapping_history history_flapping  (cost=0.00..0.00 rows=1 width=212) (actual time=0.006..0.006 rows=0 loops=3)
                           ->  Parallel Hash  (cost=121614.40..121614.40 rows=1167180 width=196) (actual time=1544.154..1544.154 rows=816987 loops=3)
                                 Buckets: 4194304  Batches: 1  Memory Usage: 611040kB
                                 ->  Parallel Seq Scan on notification_history history_notification  (cost=0.00..121614.40 rows=1167180 width=196) (actual time=0.033..175.841 rows=816987 loops=3)
               ->  Hash  (cost=139.01..139.01 rows=26 width=21) (actual time=1523.861..1523.864 rows=26 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  Subquery Scan on "ANY_subquery"  (cost=1.10..139.01 rows=26 width=21) (actual time=1523.282..1523.853 rows=26 loops=1)
                           ->  Limit  (cost=1.10..138.23 rows=26 width=29) (actual time=1523.279..1523.848 rows=26 loops=1)
                                 ->  Nested Loop  (cost=1.10..48759664.84 rows=9244951 width=29) (actual time=40.353..40.920 rows=26 loops=1)
                                       ->  Index Scan Backward using idx_service_state_last_state_change on service_state history_service_state_1  (cost=0.41..59136.44 rows=19724 width=29) (actual time=0.019..0.020 rows=1 loops=1)
                                       ->  Nested Loop  (cost=0.69..2444.95 rows=805 width=42) (actual time=40.327..40.891 rows=26 loops=1)
                                             ->  Seq Scan on host history_host  (cost=0.00..62.15 rows=805 width=21) (actual time=0.006..0.157 rows=723 loops=1)
                                             ->  Index Scan using idx_history_host_service_id on history history_1  (cost=0.69..2.93 rows=1 width=63) (actual time=0.055..0.056 rows=0 loops=723)
                                                   Index Cond: (((host_id)::bytea = (history_host.id)::bytea) AND ((service_id)::bytea = (history_service_state_1.service_id)::bytea) AND ((event_time)::bigint <= '1737127076458'::bigint))
         ->  Index Scan using pk_acknowledgement_history on acknowledgement_history history_acknowledgement  (cost=0.14..0.17 rows=1 width=193) (actual time=0.002..0.002 rows=0 loops=26)
               Index Cond: ((id)::bytea = (history.acknowledgement_history_id)::bytea)
   ->  Index Scan using pk_state_history on state_history history_state  (cost=0.56..0.72 rows=1 width=1151) (actual time=0.182..0.182 rows=1 loops=26)
         Index Cond: ((id)::bytea = (history.state_history_id)::bytea)
 Planning Time: 10.302 ms
 JIT:
   Functions: 181
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 18.113 ms, Inlining 175.125 ms, Optimization 2028.390 ms, Emission 2281.283 ms, Total 4502.911 ms
 Execution Time: 12816.787 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants