Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add sample sets to the python branch LD prototype
This change incorporates sample set functionaliy by adding to the data structure that is tracking the samples under each node. This is different from how we do things in the two-site statistics, where we first obtain every sample under every node, then intersect with the sample sets. Since we're doing a branch update algorithm, we want to be able to update the branches without having to intersect our sets with the sample sets every time we add or remove a branch. This would be very expensive because we iterate over every branch in a fixed (fully materialized) tree when we add or remove a branch from the modified tree. In doing this, we also update the summary functions to be compatible with the existing site statistics code, so now we have unbiased estimators for pi2, Dz, and D2. We'll worry about testing these in sites when we implement the C versions. These changes also include a correctness fix for the orthogonal "McVean" prototype. This allows us to compute LD for samples that do not have MRCAs. All tests now agree between the prototype and the proposed branch algorithm, but I've still excluded the slower tests.
- Loading branch information