Refactor kinship to allow other code to initiate import directly #730

bbimber · 2024-02-28T15:03:21Z

This PR contains two main categories of changes. The motivation of these changes was to separate the process of computing the data for EHR genetics (e.g. kinship and inbreeding) from actually importing the data into the EHR. In total, the idea is that PRIMe-seq (which already has a mirrored copy of pedigree data), will execute the default EHR GeneticsCalculation pipeline. It will farm the computation to our cluster, which that server is already configured to do. When complete, this pipeline already saves the results as TSV files. I wrote a separate ETL that will be defined in ONPRC modules, which copies the resulting TSVs to a location visible to PRIMe, and then pings PRIMe via a new server-side action to cause PRIMe to take those TSVs and call EHRService.standaloneProcessKinshipAndInbreeding to actually import them.

The changes within EHR itself are primarily to refactor the portions of the code that import the TSVs be to static, allow it to be called separately, and expose this through EHRService. These changes are basically a refactor without touching much within the code itself.

When I got into the weeds of the R code, I noticed a number of other things that seemed worth cleaning up. These are not directly related to the importing of data, but should be broadly useful:

I did some general style improvements in the R code, and removed some ancient strange patterns. This resulted in more code being touched than strictly needed, but most changes are minor and make the script a little more standard.
I think I understand the intent behind "options(error = dump.frames)", but I dont think it was giving the result it should. In my hands, not only was this not writing anything to a file, the script did not die on errors. I'd argue it's a lot more useful if it dies when there's an error (such as not having kinship2 installed), rather than continue onward after errors and accumulate strange downstream errors. I tested this pretty thoroughly locally and I think R's default error handling (i.e. dont specify anything with options()) does as good a job as anything here. Here is an example (which requires a login), showing how the existing 23.7 code ignores R errors: https://prime-seq.ohsu.edu/pipeline-status/Internal/PMR/details.view?rowId=433466.

This was originally opened for 23.7 under #650. This PR backs out some of the changes originally part of that PR. Notably, this is not attempting to address the WNPRC issue around hybrid animals, nor does it try to perform sanity checking of minimum expected kinship coefficients in R.

…re calculated separately

- Make R scripts exit immediately on error - Bugfix to expected kinship/validation

…y computed kinship data

- Refactor kinship script to optionally merge species where hybrids are present and process together

…ess hours

bbimber · 2024-02-28T23:27:48Z

@labkey-martyp: this is a simplification of the earlier PR. There's a JHU postgres failure, but it does not seem related to the genetics calculations.

labkey-martyp · 2024-03-04T11:53:48Z

Thanks @bbimber let me test this out on data from the centers, otherwise it looks good.

bbimber · 2024-03-04T14:53:48Z

Thanks @labkey-martyp. Like I posted on the support thread, please keep in mind that you can just run populateKinship standalone, where you replace the TSV loading with RLabkey/SelectRows. That should make it quick to point to any accessible center's server (even a production server) to non-invasive testing on real data.

labkey-martyp

Ok I tested this out across centers. Looks good with just a couple requests. Check them out, let me know if you have any follow up on them.

ehr/src/org/labkey/ehr/pipeline/GeneticCalculationsImportTask.java

ehr/resources/pipeline/kinship/populateKinship.r

bbimber · 2024-03-11T13:43:17Z

Thanks @labkey-martyp. The last commit addresses those 2 comments.

labkey-martyp

Looks good. Thanks Ben.

bbimber added 10 commits February 28, 2024 06:52

Refactor genetics pipeline to allow standalone import of TSVs that we…

0af2f0a

…re calculated separately

Improve R scripts:

b64fab3

- Make R scripts exit immediately on error - Bugfix to expected kinship/validation

Add action to allow external server to trigger re-import of externall…

233c7a6

…y computed kinship data

Correct typos

bf5c98d

Minor cleanup

aed23f7

- Refactor kinship script to further reduce memory

e8cd2a0

- Refactor kinship script to optionally merge species where hybrids are present and process together

Avoid another fringe kinship case

8518391

Add setting to control whether kinship import is allowed during busin…

335b231

…ess hours

Back out some changes not directly related to remote kinship execution

2cbf599

Minor script cleanup

16b8c2f

bbimber requested a review from labkey-martyp February 28, 2024 23:26

labkey-martyp requested changes Mar 11, 2024

View reviewed changes

ehr/src/org/labkey/ehr/pipeline/GeneticCalculationsImportTask.java Outdated Show resolved Hide resolved

ehr/resources/pipeline/kinship/populateKinship.r Outdated Show resolved Hide resolved

Code review

7f2c987

bbimber force-pushed the 23.11_fb_kinshiprefactor branch from f5ed491 to 7f2c987 Compare March 12, 2024 00:32

labkey-martyp approved these changes Mar 12, 2024

View reviewed changes

bbimber merged commit adf9436 into release23.11-SNAPSHOT Mar 12, 2024
4 of 5 checks passed

bbimber deleted the 23.11_fb_kinshiprefactor branch March 12, 2024 02:19

bbimber mentioned this pull request Mar 15, 2024

Use PreparedStatement instead of looping Table.insert #739

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor kinship to allow other code to initiate import directly #730

Refactor kinship to allow other code to initiate import directly #730

bbimber commented Feb 28, 2024 •

edited

Loading

bbimber commented Feb 28, 2024

labkey-martyp commented Mar 4, 2024

bbimber commented Mar 4, 2024

labkey-martyp left a comment

bbimber commented Mar 11, 2024

labkey-martyp left a comment

Refactor kinship to allow other code to initiate import directly #730

Refactor kinship to allow other code to initiate import directly #730

Conversation

bbimber commented Feb 28, 2024 • edited Loading

bbimber commented Feb 28, 2024

labkey-martyp commented Mar 4, 2024

bbimber commented Mar 4, 2024

labkey-martyp left a comment

Choose a reason for hiding this comment

bbimber commented Mar 11, 2024

labkey-martyp left a comment

Choose a reason for hiding this comment

bbimber commented Feb 28, 2024 •

edited

Loading