Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor kinship to allow other code to initiate import directly #730

Merged
merged 11 commits into from
Mar 12, 2024

Conversation

bbimber
Copy link
Collaborator

@bbimber bbimber commented Feb 28, 2024

This PR contains two main categories of changes. The motivation of these changes was to separate the process of computing the data for EHR genetics (e.g. kinship and inbreeding) from actually importing the data into the EHR. In total, the idea is that PRIMe-seq (which already has a mirrored copy of pedigree data), will execute the default EHR GeneticsCalculation pipeline. It will farm the computation to our cluster, which that server is already configured to do. When complete, this pipeline already saves the results as TSV files. I wrote a separate ETL that will be defined in ONPRC modules, which copies the resulting TSVs to a location visible to PRIMe, and then pings PRIMe via a new server-side action to cause PRIMe to take those TSVs and call EHRService.standaloneProcessKinshipAndInbreeding to actually import them.

The changes within EHR itself are primarily to refactor the portions of the code that import the TSVs be to static, allow it to be called separately, and expose this through EHRService. These changes are basically a refactor without touching much within the code itself.

When I got into the weeds of the R code, I noticed a number of other things that seemed worth cleaning up. These are not directly related to the importing of data, but should be broadly useful:

  • I did some general style improvements in the R code, and removed some ancient strange patterns. This resulted in more code being touched than strictly needed, but most changes are minor and make the script a little more standard.

  • I think I understand the intent behind "options(error = dump.frames)", but I dont think it was giving the result it should. In my hands, not only was this not writing anything to a file, the script did not die on errors. I'd argue it's a lot more useful if it dies when there's an error (such as not having kinship2 installed), rather than continue onward after errors and accumulate strange downstream errors. I tested this pretty thoroughly locally and I think R's default error handling (i.e. dont specify anything with options()) does as good a job as anything here. Here is an example (which requires a login), showing how the existing 23.7 code ignores R errors: https://prime-seq.ohsu.edu/pipeline-status/Internal/PMR/details.view?rowId=433466.

This was originally opened for 23.7 under #650. This PR backs out some of the changes originally part of that PR. Notably, this is not attempting to address the WNPRC issue around hybrid animals, nor does it try to perform sanity checking of minimum expected kinship coefficients in R.

@bbimber
Copy link
Collaborator Author

bbimber commented Feb 28, 2024

@labkey-martyp: this is a simplification of the earlier PR. There's a JHU postgres failure, but it does not seem related to the genetics calculations.

@labkey-martyp
Copy link
Contributor

Thanks @bbimber let me test this out on data from the centers, otherwise it looks good.

@bbimber
Copy link
Collaborator Author

bbimber commented Mar 4, 2024

Thanks @labkey-martyp. Like I posted on the support thread, please keep in mind that you can just run populateKinship standalone, where you replace the TSV loading with RLabkey/SelectRows. That should make it quick to point to any accessible center's server (even a production server) to non-invasive testing on real data.

Copy link
Contributor

@labkey-martyp labkey-martyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I tested this out across centers. Looks good with just a couple requests. Check them out, let me know if you have any follow up on them.

ehr/resources/pipeline/kinship/populateKinship.r Outdated Show resolved Hide resolved
@bbimber
Copy link
Collaborator Author

bbimber commented Mar 11, 2024

Thanks @labkey-martyp. The last commit addresses those 2 comments.

Copy link
Contributor

@labkey-martyp labkey-martyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks Ben.

@bbimber bbimber merged commit adf9436 into release23.11-SNAPSHOT Mar 12, 2024
4 of 5 checks passed
@bbimber bbimber deleted the 23.11_fb_kinshiprefactor branch March 12, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants