22.7 fb kinship comparison #503

dnicolalde · 2023-01-17T22:24:54Z

Rationale

Back in 2019, changes done by ONPRC made our kinship calculations have errors.

I spend sometime fixing the problem and adding a method to compare the previous records with the newly generated kinship coefficients. It will not erase the previous records if it detect changes. As you can imagine this will likely decrease the performance therefore I was experimenting with a library called parallel and giving half of the cores to do the calculation.

This script can be further improve to discard values that are the same and only add new values to the table. In cases where there are changes to coefficients already calculated the pipeline job can send an email letting people know how many records are going to change, and adding a flag to the UI that will allow the pipeline to overwrite the values if the changes are needed. This will be cases where new familial relationships are discover due to genetic testing or finding legacy pedigree records.

Related Pull Requests

Changes

…e into memory. Calculates and compares with new records

bbimber · 2023-01-18T00:15:09Z

@dnicolalde: just curious, do you know specifically what was changed in 2019 and why it was changed? I dont see commits of consequence then. In 2017 there was a change to make it calculate per species.

The idea of validation seems reasonable, especially what's happened here, but I think the java side @labkey-martyp is proposing makes a lot of sense. It also seems like if one is going to add true row-by-row checking, then the place might be in the java pipeline code, at import time. That'd be more efficient than within R.

If Marty's testing is right, I think the change to no longer infer and split by family might be a very prudent change. Subsetting by inferred family, especially within a population where that could be hard, strikes me as a very easy source of difficult to detect errors (i.e. two animals that are actually somewhat related being assigned to different families).

dnicolalde · 2023-01-18T18:50:29Z

I believe what happen is that ONPRC added species to the calculation. If that data point was not added the system did not infer relationship between animals of different species. In the version we were using prior the species was deduced based on the animal id. We had to add species to the supplemental_pedigree table and populate that data to the best of our ability, since that table holds information of founding animals, which sometimes had different id structures. Here is the commit:

WNPRC-EHR-Services/wnprc-modules@1e8a747

bbimber · 2023-01-18T18:52:45Z

@dnicolalde: just curious, which species are interbred? I'm a little surprised there are examples of that.

Edit: nevermind, i get it. Yes, I can understand how this would be a problem.

bbimber · 2023-09-05T17:08:29Z

@dnicolalde, can you please see what I'm proposing over here: #650. The rationale is not cross-species, but I'm proposing some refactors on the kinship R scripts. I'd like to think through the cross-species issue and get a general purpose solution to that.

Changes to populate kinship, the R script first read the current tabl…

69aa6fd

…e into memory. Calculates and compares with new records

dnicolalde requested review from keith-labkey and labkey-martyp January 17, 2023 22:25

dnicolalde changed the base branch from develop to release22.7-SNAPSHOT January 17, 2023 22:26

bbimber mentioned this pull request Sep 5, 2023

Refactor genetics calculations to allow import to be triggered separately #650

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

22.7 fb kinship comparison #503

22.7 fb kinship comparison #503

dnicolalde commented Jan 17, 2023

bbimber commented Jan 18, 2023

dnicolalde commented Jan 18, 2023

bbimber commented Jan 18, 2023 •

edited

Loading

bbimber commented Sep 5, 2023

22.7 fb kinship comparison #503

Are you sure you want to change the base?

22.7 fb kinship comparison #503

Conversation

dnicolalde commented Jan 17, 2023

Rationale

Related Pull Requests

Changes

bbimber commented Jan 18, 2023

dnicolalde commented Jan 18, 2023

bbimber commented Jan 18, 2023 • edited Loading

bbimber commented Sep 5, 2023

bbimber commented Jan 18, 2023 •

edited

Loading