Skip to content

Commit

Permalink
Add sample sets to the python branch LD prototype
Browse files Browse the repository at this point in the history
This change incorporates sample set functionaliy by adding to the
data structure that is tracking the samples under each node. This is
different from how we do things in the two-site statistics, where we
first obtain every sample under every node, then intersect with the
sample sets. Since we're doing a branch update algorithm, we want to be
able to update the branches without having to intersect our sets with
the sample sets every time we add or remove a branch. This would be very
expensive because we iterate over every branch in a fixed (fully
materialized) tree when we add or remove a branch from the modified
tree.

In doing this, we also update the summary functions to be compatible
with the existing site statistics code, so now we have unbiased
estimators for pi2, Dz, and D2. We'll worry about testing these in sites
when we implement the C versions.

These changes also include a correctness fix for the orthogonal "McVean"
prototype. This allows us to compute LD for samples that do not have
MRCAs.

All tests now agree between the prototype and the proposed branch
algorithm, but I've still excluded the slower tests.
  • Loading branch information
lkirk authored and mergify[bot] committed May 7, 2024
1 parent 972308e commit b1d7c4d
Showing 1 changed file with 296 additions and 118 deletions.
Loading

0 comments on commit b1d7c4d

Please sign in to comment.