Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update existing file paths in manifests at generation to conform to new convention #1467

Merged
merged 40 commits into from
Aug 30, 2024

Conversation

GiaJordan
Copy link
Contributor

@GiaJordan GiaJordan commented Aug 16, 2024

This PR changes the functionality of file based manifest generation to also update the paths of existing files to account for the change in file path convention in schematic. Also includes more helpful error messages in cases where filepath validation can not be completed because the fileview needs to be updated.

Tests:

test_get_manfiest_with_files has been updated to check file paths for file based manifests and to ensure that the order of files and the paths themselves are as expected when generating a manifest when new files have been added to the dataset and the existing manifest has paths that need updating.

test_submit_filebased_manifest has been added as a complement to test_submit_metadata_manifest to actually test submission to synapse in addition to functionality of the method. This is to ensure that manifests with the new paths are able to be submitted.

test_fill_in_entity_id_filename has been updated to match how one of the utilized methods now returns values.

test_view_query_exception has been added to ensure that the appropriate exceptions are raised when the fileview can't be queried

@GiaJordan GiaJordan marked this pull request as draft August 16, 2024 16:59
@GiaJordan GiaJordan changed the title update existing file paths Update existing file paths in manifests at generation to conform to new convention Aug 16, 2024
@GiaJordan GiaJordan marked this pull request as ready for review August 21, 2024 17:26
@GiaJordan GiaJordan requested a review from linglp August 21, 2024 17:26
tests/test_store.py Outdated Show resolved Hide resolved
tests/test_store.py Outdated Show resolved Hide resolved
tests/test_store.py Show resolved Hide resolved
schematic/store/synapse.py Outdated Show resolved Hide resolved
tests/test_store.py Show resolved Hide resolved
tests/test_store.py Show resolved Hide resolved
tests/test_store.py Show resolved Hide resolved
tests/test_store.py Outdated Show resolved Hide resolved
tests/test_store.py Outdated Show resolved Hide resolved
@linglp
Copy link
Contributor

linglp commented Aug 22, 2024

@GiaJordan Thanks for the fix. If my understanding is correct, even without "path" being enabled in the asset view, when users generate a file-based manifest, "Filename" column is now being updated too to include project name in the path? If my understanding is correct, should we add a message and let users know that we are adding "project name" to the path and they could now enable "path" column by themselves in synapse too? But this might not be super important since DCA users can't see most of the schematic log.

@GiaJordan
Copy link
Contributor Author

GiaJordan commented Aug 22, 2024

@GiaJordan Thanks for the fix. If my understanding is correct, even without "path" being enabled in the asset view, when users generate a file-based manifest, "Filename" column is now being updated too to include project name in the path?

that's correct

should we add a message and let users know that we are adding "project name" to the path and they could now enable "path" column by themselves in synapse too?

From my perspective, it's more important to communicate this kind of thing to the users directly, as well as include in the release notes. Especially since most users will be using the DCA. From my presentation/recommendations and our discussion with @AmyHeiser during the team meeting earlier this week, this is the current plan. That message also exists in the documentation for the new validation rule.

schematic/store/synapse.py Outdated Show resolved Hide resolved
tests/test_metadata.py Outdated Show resolved Hide resolved
tests/test_metadata.py Outdated Show resolved Hide resolved
tests/test_store.py Outdated Show resolved Hide resolved
tests/test_store.py Outdated Show resolved Hide resolved
@GiaJordan
Copy link
Contributor Author

@thomasyu888 @BryanFauble @linglp
This PR is now ready for y'all to review again

tests/test_store.py Outdated Show resolved Hide resolved
tests/test_store.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and approve the changes as they are now. I noted a few items, but it should not affect any logic you've written.

Copy link

sonarcloud bot commented Aug 29, 2024

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for all your hard work on this! I think someone on fair should take a last pass on this

@thomasyu888
Copy link
Member

@linglp, @GiaJordan , I cloned this branch and did

docker build -t test -f schematic_api/Dockerfile .

Then I ran schematic profiler

INFO: [2024-08-29 20:40:34] manifest-validate - Monitoring manifest validation
INFO: [2024-08-29 20:40:34] httpx - HTTP Request: POST http://localhost:3001/v1/model/validate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&data_type=Patient&restrict_rules=True "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:40:35] httpx - HTTP Request: POST http://localhost:3001/v1/model/validate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&data_type=Patient "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:41:13] httpx - HTTP Request: POST http://localhost:3001/v1/model/validate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2Fncihtan%2Fdata-models%2Fmain%2FHTAN.model.jsonld&data_type=Biospecimen "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:41:13] manifest-submit - Monitoring manifest submission
INFO: [2024-08-29 20:41:58] httpx - HTTP Request: POST http://localhost:3001/v1/model/submit?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&dataset_id=syn51376664&asset_view=syn51376649&restrict_rules=True&use_schema_label=True&data_model_labels=class_label&table_manipulation=replace&manifest_record_type=table_and_file "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:42:10] httpx - HTTP Request: POST http://localhost:3001/v1/model/submit?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&dataset_id=syn51376664&asset_view=syn51376649&restrict_rules=True&use_schema_label=True&data_model_labels=class_label&table_manipulation=replace&manifest_record_type=file_only "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:42:22] httpx - HTTP Request: POST http://localhost:3001/v1/model/submit?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fdata_flow_config%2Fmain%2FHTAN%2Fdataflow_component.csv&dataset_id=syn51376664&asset_view=syn51376649&restrict_rules=True&use_schema_label=True&data_model_labels=class_label&table_manipulation=replace&manifest_record_type=file_only "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:42:24] manifest-storage - Monitoring storage endpoints
INFO: [2024-08-29 20:42:30] httpx - HTTP Request: GET http://localhost:3001/v1/storage/assets/tables?asset_view=syn23643253&return_type=json "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:42:36] httpx - HTTP Request: GET http://localhost:3001/v1/storage/project/datasets?asset_view=syn23643253&project_id=syn26251192 "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:43:03] httpx - HTTP Request: GET http://localhost:3001/v1/storage/project/datasets?asset_view=syn20446927&project_id=syn32596076 "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:43:03] manifest-generator - Monitoring manifest generation
INFO: [2024-08-29 20:43:18] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&title=example&data_type=Patient "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:43:30] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&title=example&data_type=Patient&output=excel "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:43:52] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&title=example&data_type=Patient&output=excel&dataset_id=syn51078367&asset_view=syn23643253 "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:44:36] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2Fncihtan%2Fdata-models%2Fmain%2FHTAN.model.jsonld&title=example&data_type=Patient "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:44:36] manifest-storage - Monitoring storage endpoints
INFO: [2024-08-29 20:44:42] httpx - HTTP Request: GET http://localhost:3001/v1/storage/assets/tables?asset_view=syn23643253&return_type=json "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:44:48] httpx - HTTP Request: GET http://localhost:3001/v1/storage/project/datasets?asset_view=syn23643253&project_id=syn26251192 "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:45:14] httpx - HTTP Request: GET http://localhost:3001/v1/storage/project/datasets?asset_view=syn20446927&project_id=syn32596076 "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:45:14] manifest-generator - Monitoring manifest generation
INFO: [2024-08-29 20:45:26] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&title=example&data_type=Patient "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:45:38] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&title=example&data_type=Patient&output=excel "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:46:01] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&title=example&data_type=Patient&output=excel&dataset_id=syn51078367&asset_view=syn23643253 "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:46:45] httpx - HTTP Request: GET http://localhost:3001/v1/manifest/generate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2Fncihtan%2Fdata-models%2Fmain%2FHTAN.model.jsonld&title=example&data_type=Patient "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:46:45] manifest-submit - Monitoring manifest submission
INFO: [2024-08-29 20:47:29] httpx - HTTP Request: POST http://localhost:3001/v1/model/submit?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&dataset_id=syn51376664&asset_view=syn51376649&restrict_rules=True&use_schema_label=True&data_model_labels=class_label&table_manipulation=replace&manifest_record_type=table_and_file "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:47:42] httpx - HTTP Request: POST http://localhost:3001/v1/model/submit?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&dataset_id=syn51376664&asset_view=syn51376649&restrict_rules=True&use_schema_label=True&data_model_labels=class_label&table_manipulation=replace&manifest_record_type=file_only "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:47:55] httpx - HTTP Request: POST http://localhost:3001/v1/model/submit?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fdata_flow_config%2Fmain%2FHTAN%2Fdataflow_component.csv&dataset_id=syn51376664&asset_view=syn51376649&restrict_rules=True&use_schema_label=True&data_model_labels=class_label&table_manipulation=replace&manifest_record_type=file_only "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:47:57] manifest-validate - Monitoring manifest validation
INFO: [2024-08-29 20:47:57] httpx - HTTP Request: POST http://localhost:3001/v1/model/validate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&data_type=Patient&restrict_rules=True "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:47:58] httpx - HTTP Request: POST http://localhost:3001/v1/model/validate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2FSage-Bionetworks%2Fschematic%2Fdevelop%2Ftests%2Fdata%2Fexample.model.jsonld&data_type=Patient "HTTP/1.1 200 OK"
INFO: [2024-08-29 20:48:34] httpx - HTTP Request: POST http://localhost:3001/v1/model/validate?schema_url=https%3A%2F%2Fraw.githubusercontent.com%2Fncihtan%2Fdata-models%2Fmain%2FHTAN.model.jsonld&data_type=Biospecimen "HTTP/1.1 200 OK"

@linglp
Copy link
Contributor

linglp commented Aug 30, 2024

@thomasyu888 thank you so much! This is amazing to see.

Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work on this Gianna. It looks good to me, and I am happy to see that the new test gets moved to integration folder as well.

@GiaJordan
Copy link
Contributor Author

@linglp @thomasyu888 @BryanFauble thank you all so much for your reviews, the comments you left were very helpful and I appreciate the time you took to give them

@GiaJordan GiaJordan merged commit 12b63e3 into develop Aug 30, 2024
7 checks passed
@GiaJordan GiaJordan deleted the develop-filepath2-manifest-gen-FDS-2278 branch August 30, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants