Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbatim PFB handover fails when using a common prefix #6843

Open
nadove-ucsc opened this issue Jan 23, 2025 · 7 comments
Open

Verbatim PFB handover fails when using a common prefix #6843

nadove-ucsc opened this issue Jan 23, 2025 · 7 comments
Assignees
Labels
- [priority] Medium manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team

Comments

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Jan 23, 2025

This was observed when testing #6815 due to a situation like this:

Image

Activity 65f821a0-c4c2-4f1b-83f1-292dfe24f719 is found in a primary bundle 1f47f146-6d32-adbd-bd87-93002479d031 (A) and also in a replica bundle (B). Its generated file (e47f0944-0bdb-422f-a251-3fab0c19a029) is also found in (A), and additionally in a separate replica bundle (C). Only (B) was indexed due to the use of common prefix 65/1. The resulting manifest contained a dangling reference to the generated file.

[edit @hannes-ucsc] … causing Terra to reject the manifest

@nadove-ucsc nadove-ucsc added the orange [process] Done by the Azul team label Jan 23, 2025
@nadove-ucsc
Copy link
Contributor Author

Here is an example of an import that failed for this reason

Image

@nadove-ucsc nadove-ucsc added the spike:1 [process] Spike estimate of one point label Feb 11, 2025
@nadove-ucsc
Copy link
Contributor Author

Assignee to consider whether there's any risk that this error might occur in anvilprod.

@nadove-ucsc
Copy link
Contributor Author

nadove-ucsc commented Feb 12, 2025

This will break anvilprod when filtering by non-dataset fields. As an example, consider a manifest with a filter to select files smaller than 1 gigabyte, and a primary bundle with the following structure:

Image

The manifest will include file 1 and every entity upstream from it, but will omit file 2. The sequencing activity contains foreign key references to both files via its generated_file_id column. File 2 will then become a dangling relation that will cause Terra to reject the manifest.

@nadove-ucsc
Copy link
Contributor Author

nadove-ucsc commented Feb 12, 2025

Other observations:

The foreign key anvil_assayactivity.antibody_id references the anvil_antibody table, but since we don't index contributions from that table, its replicas only occur as orphans (from replica bundles). If a manifest is created with filters that exclude orphans, dangling reference errors will occur for each anvil_assayactivity replica with a non-null antibody_id column. However, the anvil_assayactivity table is empty for all snapshots currently indexed on anvilprod, so this should be unobservable.

anvil_project presents a similar edge case (being a table described by the schema that we only index as orphans), but there are no foreign keys that reference that table, so it can't cause the dangling reference error.

@hannes-ucsc
Copy link
Member

I moved these latter two reproduction to a separate issue (#6898).

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Feb 12, 2025

I hope that this can be solved by making the inclusion of relations conditional upon whether files from a source with a common prefix are included in the manifest (in addition to any other conditions that #6898 might add). In order to demonstrate a handover with relations in a lower deployment, one would have to select only small datasets, under the assumption that there is a 1:1 relationship between datasets and sources and that the sources for those datasets will have been indexed without a common prefix.

@hannes-ucsc hannes-ucsc added the manifests [subject] Generation and contents of manifests label Feb 12, 2025
@achave11-ucsc
Copy link
Member

Blocker landed.

@nadove-ucsc nadove-ucsc added - [priority] Medium and removed spike:1 [process] Spike estimate of one point labels Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

3 participants