-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MODINVSTOR-1277: Delete 6 unused instance database indexes #1101
base: master
Are you sure you want to change the base?
Conversation
…array support for identifiers https://folio-org.atlassian.net/browse/MODINVSTOR-1277 * Delete 6 unused instance database indexes: * instance_identifiers_idx_gin * instance_identifiers_idx_ft * instance_title_idx * instance_contributors_idx * instance_indextitle_idx * instance_publication_idx * Drop CQL array support for identifiers, delete identifiers array unit tests * Delete PgUtil.getWithOptimizedSql because the title index has been deleted * Delete populateRmbInternalIndex.sql because it runs for mod-inventory-storage <= 19.1.0
Quality Gate passedIssues Measures |
"tOps": "ADD", | ||
"tOps": "DELETE", | ||
"caseSensitive": false, | ||
"removeAccents": true | ||
}, | ||
{ | ||
"fieldName": "title", | ||
"tOps": "ADD", | ||
"tOps": "DELETE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep these 2 indexes? The title index is used by FQM and the indexTitle index probably should be, so removing them would hurt FQM's performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example query where PostgreSQL actually uses the title
or indexTitle
index? I doubt that real world queries from mod-lists really need these db indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list made with the query instance.title == 'abc'
would use that index. And I can't remember for sure, but I believe instance.title starts_with 'abc'
would, too (I can't remember for sure on that one, though... We still might be waiting on the changes in FQM that would make that the case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting that when an index is available, we go out of our way to define the corresponding FQM field in a way that will use that index wherever possible. So even if the index is "left"(lower(diku_mod_inventory_storage.f_unaccent(jsonb ->> 'title')), 600)
or something like that, FQM can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such kind of queries should better go to mod-search because mod-search has the instances, holdings and items combined (de-normalized) which allows better search and better sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mweaver-ebsco
In general if you have a known list of instances you are interested in you do further queries using a unique identifier like id or hrid, not title.
I don't see a real world use case for the queries instance.title == 'abc' and instance.title == 'abc*'. Can you give a real world use case?
And why can't you use mod-search that we have introduced to replace non-transactional mod-inventory-storage queries like these title searches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @julianladisch, seems that @mweaver-ebsco has just doubts but not strong objections. So, I'm approving the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have strong objections (sorry, I missed the follow-up questions). Please do not delete those 2 indexes. I'd love for mod-fqm-manager to use mod-search instead of relying on these indexes in the DB, but we're not there yet (that change will take significant architectural changes in FQM), so for now, these are definitely still being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general if you have a known list of instances you are interested in you do further queries using a unique identifier like id or hrid, not title.
FQM exists to cover uses like this where you're providing query conditions and not specific IDs (it can handle specific IDs, too, if needed). Title is one of the many fields users can search by. If they have those IDs handy, they should 100% use them, but we can't force them to use them.
I don't see a real world use case for the queries instance.title == 'abc' and instance.title == 'abc*'. Can you give a real world use case?
Maybe a user needs to export a list of loans, joined with the associated detailed inventory data, where the corresponding instance titles containing 'abc' or something similar ¯\_(ツ)_/¯ Maybe they are doing some work with various translations of some multi-volume work, where they are dealing with a large number of items with very similar names, and they need the circulation data alongside the corresponding item and instance data. Perhaps someone made a mistake and accidentally changed the title on 500 instances to "abc" somehow and they need to find and fix those quickly. With the instance title index in place, it could be fixed in minutes using Bulk Edit (which uses FQM). Without it, it could take hours.
And why can't you use mod-search that we have introduced to replace non-transactional mod-inventory-storage queries like these title searches?
Believe me, I really really really want to do exactly that. It's waaaaay up near the top of my own wishlist for mod-fqm-manager features. Unfortunately, that will require major architectural changes in mod-fqm-manager and a fairly significant amount of work, so we haven't been able to do it yet. It's not just a matter of switching to mod-search; it's introducing the ability to join search results from mod-search with query results from the DB. It is not a trivial change.
non-transactional mod-inventory-storage queries like these title searches?
I'm not sure I understand what you mean here. A query for instance.title = 'abc'
in mod-fqm-manager is entirely transactional, in the sense that it happens within a transaction, and it depends pretty heavily on this. It only becomes non-transactional if you interact with it in a non-transactional way (for example, mod-lists does interact with it non-transactionally, but this is by design). If mod-fqm-manager starts using mod-search to handle parts of its queries, it makes things significantly less transactional, since it will need to execute different parts of the query on different platforms, and then join those together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title
and indexTitle
have b-tree indexes using the default UTF8 locale. Therefore they only support exact match, they don't support right truncation, as explained on https://www.postgresql.org/docs/current/indexes-types.html#INDEXES-TYPES-BTREE :
Exact match using the title
b-tree index:
folio=# select jsonb->>'title' from instance where ("left"(lower(f_unaccent(jsonb ->> 'title'::text)), 600)) LIKE 'abc-team : a';
?column?
--------------
ABC-Team : A
(1 row)
Time: 0.776 ms
Right truncation uses a full table scan:
folio=# select jsonb->>'title' from instance where ("left"(lower(f_unaccent(jsonb ->> 'title'::text)), 600)) LIKE 'abc-team : a%';
?column?
--------------
ABC-Team : A
(1 row)
Time: 1821.425 ms (00:01.821)
starts_with 'abc'
is exactly the same as LIKE 'abc%'
and therefore also needs a full table scan.
Please provide a real world use case where an exact title match is needed. I don't see why one would search for the exact title 'abc', if the exact title is known one can search using the id or hrid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update NEWS by moving it's updates into not-released block.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the 2 indexes that are actively being used by mod-fqm-manager, as mentioned in my previous comment (adding a separate comment because github won't let me hit "request changes" without a new comment)
Purpose
Delete the 6 biggest database indexes that are never used.
Approach
Changes Checklist
Related Issues
https://folio-org.atlassian.net/browse/MODINVSTOR-1277