Index limitations with the `annotation` and `document_uri` tables #8498

indigobravo · 2024-02-07T14:15:18Z

Description

Whilst restoring a recent backup of the production h database a number of errors relating to index limitations on the annotation and document_uri tables were encountered. Due to this the indexes on the restored tables do not match production.

References

Slack: 2024-02-07

The text was updated successfully, but these errors were encountered:

marcospri · 2024-03-05T14:04:01Z

ix__document_uri_uri_normalized

The original issue what not about this index but about the unique one that combined a few columns.

For that the fix was to use md5 in the index

That also fixed the relatively common exception about the unique index in production because the combination of a claimant + uri + type + content_type is more likely to go over the limit than just uri.

After that we are still getting an issue creating the index on document_uri while restoring because some values are very large there.

This index is created with:

CREATE INDEX ix__document_uri_uri_normalized ON public.document_uri USING btree (uri_normalized);

which currently fails with:

ERROR:  index row size 2776 exceeds btree version 4 maximum 2704 for index "ix__document_uri_uri_normalized"
DETAIL:  Index row references tuple (950228,3) in relation "document_uri".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing

we can find the longest URIs with this query
(ids taken from running a much slower query first).

To fix this we need to:

Limit at the API/service level to the max length of normalized_uri so no new rows make it way to the DB.
Change this index to use m5d like the uniqueness one.
Or fix the problematic rows (how?) and keep the current index.

marcospri · 2024-03-05T14:51:15Z

ix__annotation_tags

This index also fails to be created due to rows with tags that are to long.

For tags we already do validation of the length at the API level

We have to fix the existing tags in the DB.

Finding long tags in the DB

We removed the longest tags in the DB to leave the DB in a state where the index can be created again:

https://hypothes-is.slack.com/archives/C4K6M7P5E/p1707842794675309

Sample query to remove the queries:

update annotation set tags = '{}' where id in (..)

This should work on the next restore.

indigobravo · 2024-03-18T15:47:27Z

I can confirm database restores now complete without error. Thanks for the work on this.

marcospri self-assigned this Feb 7, 2024

marcospri mentioned this issue Feb 13, 2024

Create new unique index in document_uri #8502

Merged

marcospri mentioned this issue Mar 14, 2024

Validate max length for document.link.href #8595

Merged

indigobravo closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index limitations with the `annotation` and `document_uri` tables #8498

Index limitations with the `annotation` and `document_uri` tables #8498

indigobravo commented Feb 7, 2024

marcospri commented Mar 5, 2024 •

edited

Loading

marcospri commented Mar 5, 2024

indigobravo commented Mar 18, 2024

Index limitations with the annotation and document_uri tables #8498

Index limitations with the annotation and document_uri tables #8498

Comments

indigobravo commented Feb 7, 2024

Description

References

marcospri commented Mar 5, 2024 • edited Loading

ix__document_uri_uri_normalized

marcospri commented Mar 5, 2024

ix__annotation_tags

indigobravo commented Mar 18, 2024

Index limitations with the `annotation` and `document_uri` tables #8498

Index limitations with the `annotation` and `document_uri` tables #8498

marcospri commented Mar 5, 2024 •

edited

Loading