Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index limitations with the annotation and document_uri tables #8498

Closed
indigobravo opened this issue Feb 7, 2024 · 3 comments
Closed

Index limitations with the annotation and document_uri tables #8498

indigobravo opened this issue Feb 7, 2024 · 3 comments
Assignees

Comments

@indigobravo
Copy link
Member

Description

Whilst restoring a recent backup of the production h database a number of errors relating to index limitations on the annotation and document_uri tables were encountered. Due to this the indexes on the restored tables do not match production.

References

@marcospri
Copy link
Member

marcospri commented Mar 5, 2024

ix__document_uri_uri_normalized

The original issue what not about this index but about the unique one that combined a few columns.

For that the fix was to use md5 in the index

That also fixed the relatively common exception about the unique index in production because the combination of a claimant + uri + type + content_type is more likely to go over the limit than just uri.

After that we are still getting an issue creating the index on document_uri while restoring because some values are very large there.

This index is created with:

CREATE INDEX ix__document_uri_uri_normalized ON public.document_uri USING btree (uri_normalized);

which currently fails with:

ERROR:  index row size 2776 exceeds btree version 4 maximum 2704 for index "ix__document_uri_uri_normalized"
DETAIL:  Index row references tuple (950228,3) in relation "document_uri".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing

we can find the longest URIs with this query
(ids taken from running a much slower query first).

To fix this we need to:

  • Limit at the API/service level to the max length of normalized_uri so no new rows make it way to the DB.

  • Change this index to use m5d like the uniqueness one.

  • Or fix the problematic rows (how?) and keep the current index.

@marcospri
Copy link
Member

ix__annotation_tags

This index also fails to be created due to rows with tags that are to long.

For tags we already do validation of the length at the API level

We have to fix the existing tags in the DB.

Finding long tags in the DB

We removed the longest tags in the DB to leave the DB in a state where the index can be created again:

https://hypothes-is.slack.com/archives/C4K6M7P5E/p1707842794675309

Sample query to remove the queries:

update annotation set tags = '{}' where id in (..)

This should work on the next restore.

@indigobravo
Copy link
Member Author

I can confirm database restores now complete without error. Thanks for the work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants