split_ultimate_ancestor doesn't always change the root when its children change #993

duncanMR · 2025-01-23T12:52:18Z

Post-processing is supposed to remove the virtual-root-like ancestor (0) and then split the ultimate ancestor (1) into multiple roots whenever its children change. However, it seems that this process misses some cases. For example:

import msprime
import tsinfer
import numpy as np

sim_ts = msprime.sim_ancestry(
    100,
    sequence_length=1e6,
    random_seed=2,
    recombination_rate=1e-8,
    population_size=1e4,
)
sim_mts = msprime.sim_mutations(sim_ts, rate=1.29e-8, random_seed=2)
sample_data = tsinfer.SampleData.from_tree_sequence(sim_mts)
base_ts = tsinfer.infer(sample_data, post_process=False)
ts = tsinfer.post_process(base_ts)

roots = []
edges_right = ts.edges_right
edges_parent = ts.edges_parent
problem_trees = []
for tree in ts.trees():
    if len(tree.roots) == 1: #skip first and last trees
        root = tree.root
        roots.append(root)
        right = edges_right[edges_parent == root]
        if not np.all(right == right[0]): #some children change at different positions
            problem_trees.append(tree.index)

print(f"{len(problem_trees)} root changes are missed out of {len(roots)} roots") #116 out 1244

Here is one of the problematic trees:

tree = ts.at_index(351)
print(f'Root: {tree.root}, children = {tree.children(tree.root)}')
#Root: 1744, children = (1683, 1151)
tree.prev()
print(f'Root: {tree.root}, children = {tree.children(tree.root)}')
#Root: 1744, children = (1683, 1027, 1335)

The calculation of root breaks in split_ultimate_ancestor seems to be correct in this example:

edges_left = base_ts.edges_left
edges_right = base_ts.edges_right
edges_parent = base_ts.edges_parent
left_children_changes = np.sort(edges_left[edges_parent == 1])
right_children_changes = np.sort(edges_right[edges_parent == 1])
children_changes = set(left_children_changes).union(right_children_changes)

I modified post_process to return the root_breaks, and they match the children_changes I calculated above exactly. So I think the problem might be with the step to modify the tables to add new roots, but I'm not sure where the bug is yet.

The text was updated successfully, but these errors were encountered:

benjeffery · 2025-01-23T13:46:49Z

Nice find - are the problematic trees on GeL as it would be good to extract some test cases for the test suite.

duncanMR · 2025-01-23T14:11:42Z

I just found #850 and tskit-dev/tsdate#452 which discuss this. I'm confused, since aren't all the local root nodes created by splitting up the ultimate ancestor (1) whenever its children change? I don't understand why we would expect to have local roots with children that change over multiple trees after post-processing. @hyanwong, could you please clarify?

duncanMR · 2025-01-23T15:44:29Z

I had made a silly error here: I forgot that simplification will remove the ultimate root nodes where there is a younger MRCA, and there is obviously no guarantee that these new roots will have the same children over multiple trees. Even when the ultimate root node is present in the simplified tree, the edges with that node as parent will not always share the same left and right coordinates, because simplification will also merge adjacent edges with the same parent and child. Since it's not a bug, I'm closing the issue.

duncanMR closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split_ultimate_ancestor doesn't always change the root when its children change #993

split_ultimate_ancestor doesn't always change the root when its children change #993

duncanMR commented Jan 23, 2025

benjeffery commented Jan 23, 2025

duncanMR commented Jan 23, 2025

duncanMR commented Jan 23, 2025

split_ultimate_ancestor doesn't always change the root when its children change #993

split_ultimate_ancestor doesn't always change the root when its children change #993

Comments

duncanMR commented Jan 23, 2025

benjeffery commented Jan 23, 2025

duncanMR commented Jan 23, 2025

duncanMR commented Jan 23, 2025