-
Hi, I'm doing some visualizations of data from your older "Inferring whole-genome histories" paper (and hoping to run the same code on the "A unified genealogy" data when it's publicly released). There are a bunch of times in the treeseq data which look something like 4.999999999999343. Am I safe to assume that means 5? Or is it intentionally "five minus epsilon", close to 5 but not quite there? I'm guessing all the 4.999... stuff is just floating point math being its occasionally irritating self, but I figured I'd double check. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 0 comments 7 replies
-
Hi @clawsoon - I suspect that (a) you are using an older version of tsinfer, where most times are integers (in more recent versions nodes are placed at frequencies of 0..1, apart from the 2 oldest nodes at times 2 and 3 which I hope to change soon ) and (b) these are "synthetic ancestors" created by the path compression process (you can check by looking to see if they have the flag Synthetic ancestors are intentionally placed at a time X-minus-epsilon: you can't place these at time (say) 5, because their parent is at time 5, and children must always be strictly younger than their parents. So it's not just floating point error, but a deliberate choice. The time value 5 (or 5/N in more recent tsinfer versions) simply means that 5 samples possess the focal allele on which the ancestor is based. Since synthetic ancestors don't have a focal allele, there's no obvious time to assign to them, so they just get placed a little below their known parent node. However, I should emphasise that the absolute node times in a tree sequence output by tsinfer are fairly meaningless (although their relative order is, of course, important). So you could swap all the values of 4.999999999999343 for 5 and increase all the nodes at time 5 and above by 1, and the tree sequence should be just as meaningful. If you want to use the node times in any meaningful way, then you should run |
Beta Was this translation helpful? Give feedback.
Hi @clawsoon - I suspect that (a) you are using an older version of tsinfer, where most times are integers (in more recent versions nodes are placed at frequencies of 0..1, apart from the 2 oldest nodes at times 2 and 3 which I hope to change soon ) and (b) these are "synthetic ancestors" created by the path compression process (you can check by looking to see if they have the flag
tsinfer.NODE_IS_PC_ANCESTOR
).Synthetic ancestors are intentionally placed at a time X-minus-epsilon: you can't place these at time (say) 5, because their parent is at time 5, and children must always be strictly younger than their parents. So it's not just floating point error, but a deliberate choice. The tim…