Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

22.7 fb kinship comparison #503

Open
wants to merge 1 commit into
base: release22.7-SNAPSHOT
Choose a base branch
from

Conversation

dnicolalde
Copy link
Collaborator

Rationale

Back in 2019, changes done by ONPRC made our kinship calculations have errors.

I spend sometime fixing the problem and adding a method to compare the previous records with the newly generated kinship coefficients. It will not erase the previous records if it detect changes. As you can imagine this will likely decrease the performance therefore I was experimenting with a library called parallel and giving half of the cores to do the calculation.

This script can be further improve to discard values that are the same and only add new values to the table. In cases where there are changes to coefficients already calculated the pipeline job can send an email letting people know how many records are going to change, and adding a flag to the UI that will allow the pipeline to overwrite the values if the changes are needed. This will be cases where new familial relationships are discover due to genetic testing or finding legacy pedigree records.

Related Pull Requests

Changes

…e into memory. Calculates and compares with new records
@dnicolalde dnicolalde changed the base branch from develop to release22.7-SNAPSHOT January 17, 2023 22:26
@bbimber
Copy link
Collaborator

bbimber commented Jan 18, 2023

@dnicolalde: just curious, do you know specifically what was changed in 2019 and why it was changed? I dont see commits of consequence then. In 2017 there was a change to make it calculate per species.

The idea of validation seems reasonable, especially what's happened here, but I think the java side @labkey-martyp is proposing makes a lot of sense. It also seems like if one is going to add true row-by-row checking, then the place might be in the java pipeline code, at import time. That'd be more efficient than within R.

If Marty's testing is right, I think the change to no longer infer and split by family might be a very prudent change. Subsetting by inferred family, especially within a population where that could be hard, strikes me as a very easy source of difficult to detect errors (i.e. two animals that are actually somewhat related being assigned to different families).

@dnicolalde
Copy link
Collaborator Author

I believe what happen is that ONPRC added species to the calculation. If that data point was not added the system did not infer relationship between animals of different species. In the version we were using prior the species was deduced based on the animal id. We had to add species to the supplemental_pedigree table and populate that data to the best of our ability, since that table holds information of founding animals, which sometimes had different id structures. Here is the commit:

WNPRC-EHR-Services/wnprc-modules@1e8a747

@bbimber
Copy link
Collaborator

bbimber commented Jan 18, 2023

@dnicolalde: just curious, which species are interbred? I'm a little surprised there are examples of that.

Edit: nevermind, i get it. Yes, I can understand how this would be a problem.

@bbimber
Copy link
Collaborator

bbimber commented Sep 5, 2023

@dnicolalde, can you please see what I'm proposing over here: #650. The rationale is not cross-species, but I'm proposing some refactors on the kinship R scripts. I'd like to think through the cross-species issue and get a general purpose solution to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants