Skip to content

Commit

Permalink
postgres: add index for task_run_file_input(input_file_id)
Browse files Browse the repository at this point in the history
Deleting untracked file_inputs from the database takes a very long time.
The table has a multicolumn b-tree index for both it's columns.
The input_file_id file is the second part of the index which still
requires the full index to be scanned[^1]:
  Constraints on columns to the right of these columns are checked in
  the index, so they save visits to the table proper, but they do not
  reduce the portion of the index that has to be scanned.

Create an index for the input_file_id column.

Comparison of the query:
  EXPLAIN SELECT id FROM input_file WHERE NOT EXISTS (SELECT 1 FROM task_run_file_input AS trfi WHERE input_file.id = trfi.input_file_id);

Without index:
 Gather  (cost=2102870.71..2331530.97 rows=51672 width=4)
   Workers Planned: 2
   ->  Parallel Hash Anti Join  (cost=2101870.71..2325363.77 rows=21530 width=4)
         Hash Cond: (input_file.id = trfi.input_file_id)
         ->  Parallel Index Only Scan using input_file_pkey on input_file  (cost=0.42..1127.35 rows=28633 width=4)
         ->  Parallel Hash  (cost=1170539.13..1170539.13 rows=56766813 width=4)
               ->  Parallel Seq Scan on task_run_file_input trfi  (cost=0.00..1170539.13 rows=56766813 width=4)

With index:
 Gather  (cost=1000.99..24457.81 rows=51672 width=4) (actual time=194.390..206.568 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Nested Loop Anti Join  (cost=0.99..18290.61 rows=21530 width=4) (actual time=153.282..153.283 rows=0 loops=3)
         ->  Parallel Index Only Scan using input_file_pkey on input_file  (cost=0.42..1127.35 rows=28633 width=4) (actual time=0.035..13.810 rows=22907 loops=3)
               Heap Fetches: 19485
         ->  Index Only Scan using task_run_file_input_input_file_id_idx on task_run_file_input trfi  (cost=0.57..157.21 rows=7992 width=4) (actual time=0.006..0.006 rows=1 loops=68720)
               Index Cond: (input_file_id = input_file.id)
               Heap Fetches: 1352

[^1]: https://www.postgresql.org/docs/current/indexes-multicolumn.html
  • Loading branch information
fho committed Oct 2, 2024
1 parent 228adc4 commit f1a809a
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions pkg/storage/postgres/migrations/5.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CREATE INDEX CONCURRENTLY idx_task_run_file_input_input_file_id on task_run_file_input(input_file_id);

0 comments on commit f1a809a

Please sign in to comment.