-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relatedness matrix-vector product #2980
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2980 +/- ##
==========================================
+ Coverage 89.73% 89.78% +0.05%
==========================================
Files 29 29
Lines 31600 31879 +279
Branches 6122 6170 +48
==========================================
+ Hits 28355 28624 +269
- Misses 1853 1859 +6
- Partials 1392 1396 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Is it worth doing a numba version of this to see how it performs before we get into C implementation? |
I figured out why the following expression is GRM * w when For the uncentered GRM, where To see this, substitute the below to the above and change of order of summation because of the definition of Was this kind of derivation what you had first in mind? I was only able to do this after knowing the results. |
My feeling is "no" because we've already basically got the C implementation in #2710. However, FYI:
|
Exactly! (Well, you've written it out in more detail than I had, which is helpful!) |
This is a quick numba version for trait simulation. I removed some debugging features and weight updates that are not needed for trait simulation. For a 1 Megabase region simulated on 10k people, it takes 0.4s. |
I added a slightly more detailed version on overleaf. Check it out if it sounds useful. |
Hah, love it. Let's see - what do we compare the numba version to? Maybe the best thing to do is to make a scaling plot - say, for a 1e8-long sequence, runtime against number of samples, to verify it's scaling not-much-worse-than linear and extrapolate to what sample sizes are do-able? |
I realized there is a simpler algorithm - see |
Here's the overview: in either algorithm. Recall that what we want to compute is: given a vector of weights, for each sample
And, they differ in that:
|
The "cancellation" strategy is similar to what is done for pair coalescence counts and for some of the tsdate internals, which seem to work well enough with large tree sequences; but we should probably do some systematic testing. |
This looks great! I'm all for the new cancellation-based algorithm. We do this sort of thing in loads of places, and it's fine, right? I've not grokked the full details of how all this is working, but the shape of the algorithm looks just right. I think part of what's getting in my way is the "stack". In my mind a "stack" is the list of nodes that you traverse on the way back to root, or the list of functions on the way back to "main". Can we use a different name for this, if this is not a good analogy? It would really help me and maybe other people too. |
I'll be working on this next week. Thanks for the timing plots! They look great, but the new one should be even better. Do we know how many mat-vecs we need in practice (order of magnitude)? Extrapolating from the plots above, maybe a 1e8 chromosome will take <= 5s with that many samples? |
|
I'm thinking this will get the basics in and then we can follow up with PRs to do:
|
efa0924
to
b6cb26e
Compare
I've just rebased this on top of #1623, so I can remove the centring from the testing. |
Can you rebase this please @petrelharp? I'll try and do my bit tomorrow or Friday. |
b3f8468
to
39c4c14
Compare
Done! |
5cbe67f
to
bb5b603
Compare
I got this more-working than it was @petrelharp. There's a few issues:
|
I've added this work to the next release milestone. Hoping to get a release out in a week or two, if that is too ambitious for this let me know. |
I was a bit worried about how to efficiently deal with clearing out state between windows, but there's a nice way. Naively, we could start at each sample and traverse up the tree to the root to compute the state for that sample. That'd be O(N log N). However, if we clear the state of each node we pass through on the way, that lets us know "we've already been here and everywhere above this"; so we just have to traverse up until we find a node with But I think I'll do that in another PR? |
I'll fix up the windows issues here and push an update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I've made a few perf-oriented comments. Some are easy and should be done, others less so (making the _z function inline) so feel free to ignore that. There's a few iffy restrict pointers declared, which should be removed as well.
Happy to merge and log issues for required follow ups after that?
c/tskit/trees.c
Outdated
} | ||
|
||
static void | ||
tsk_matvec_calculator_add_z(const tsk_matvec_calculator_t *self, tsk_id_t u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you restructure this so that any pointers needed are passed in as const restrict parameters? You can then mark it as static inline void
function. I think it could be a bottleneck otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a go at this, but I am more or less guessing what you mean - have a look?
c/tskit/trees.c
Outdated
static void | ||
tsk_matvec_calculator_remove_edge(tsk_matvec_calculator_t *self, tsk_id_t p, tsk_id_t c) | ||
{ | ||
tsk_id_t *restrict parent = self->parent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I'm iffy about the use of restrict here, as parent is accessed in functions that are called within the scope of the restrict declaration. I'd remove it to be on the safe side.
c/tskit/trees.c
Outdated
static void | ||
tsk_matvec_calculator_insert_edge(tsk_matvec_calculator_t *self, tsk_id_t p, tsk_id_t c) | ||
{ | ||
tsk_id_t *restrict parent = self->parent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar not sure about that restrict
I've made those changes - I think? If they look good, go ahead and merge! |
9e5becf
to
885732f
Compare
I see two remaining lines not hit by tests; I'll get those in the follow-up PR with windows. |
649cba9
to
39174d2
Compare
NEVER MIND, I've just put the code for windows in this PR - it changes the logic a bit (we don't need the virtual samples!) - but in a separate commit so you can have a separate look if you like. |
Okay - this is ready! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful @petrelharp, such an elegant algorithm!
I spotted a few minor things, just needs a tiny bit more testing on the CPython side. It's ready for a squash and merge then. (Multiple commits are fine, I think it's probably good to squash out the intermediate "let's get this working" stuff)
return 0; | ||
} | ||
|
||
static inline void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - that's exactly what I had in mind.
// do this: self->v[c] -= sign * self->v[p]; | ||
p_row = GET_2D_ROW(v, num_weights, p); | ||
c_row = GET_2D_ROW(v, num_weights, c); | ||
for (j = 0; j < num_weights; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation here that if num_weights is O(100) these loops could be usefully vectorised using AVX instructions and so on.
} | ||
if (!PyArg_ParseTupleAndKeywords(args, kwds, "OO|sii", kwlist, &weights, &windows, | ||
&mode, &span_normalise, ¢re)) { | ||
goto out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're not doing bad parameter testing in test_lowlevel.py. Doesn't have to be comprehensive, but we do need to make sure the error path is covered (it's a common and hard to track down source of segfaults)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I got confused about that.
python/tests/test_lowlevel.py
Outdated
weights=np.ones(bad_weight_shape), mode=mode, **params | ||
) | ||
|
||
@pytest.mark.skip(reason="Haven't implemented windows.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implemented now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah I knew I wrote tests for those!
ae25b78
to
93db1d4
Compare
All done! |
Looks like your error checking tests are overly specific |
Whoops not quite |
Looks like error strings changed across python versions. |
add centre option windows!!! closes tskit-dev#2996
93db1d4
to
75ba0c3
Compare
Here's an implementation of the "relatedness matrix-vector" product operation. (WIP; currently just in python). It is a simplification of the code in #2710.
This could be generalized to the following: given a set of sample weights$w$ and a summary function $f( )$ , for a node $n$ in tree $T$ let $w_T(n)$ be the sum of the weights of all samples below $n$ , and then for each sample $u$ compute
$$S(u) = \sum_T \ell_T \sum_{n \ge_T u} (t_{p(n)} - t_n) f(w_T(n)) ,$$
i.e., the sum over all nodes above the sample in all trees of the summary function for the node multiplied by the area of the edge above the node.
Besides the left/right child/sib arrays, it just needs to keep track of three (num nodes)-length vectors as it iterates over trees: the sum of the weights below the node (
w
); the last position the node's contribution was flushed at (x
); and the total contribution to the output for all samples below the node (stack
). Every time we add or remove an edge we need to mvoe all the contributions on the path above the edge to the children.