Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing comma on P lines after pruning with odgi prune #549

Open
sivico26 opened this issue Jan 18, 2024 · 5 comments
Open

Trailing comma on P lines after pruning with odgi prune #549

sivico26 opened this issue Jan 18, 2024 · 5 comments

Comments

@sivico26
Copy link

sivico26 commented Jan 18, 2024

Dear odgi team,

While working on #548 (see that issue for current context), I could not load my graph with my tools after pruning. After digging a bit, I discovered that some of the pruned graph's P lines have trailing ,s, which goes against convention. To be clear, what I mean is something like this:

P    taxa.chr1    5+,12+,32+, ... ,72+,89+,    *    ## The ',' after the 89+ should not be there.

You can check this yourself by taking the P lines of the graph I shared with @subwaystation and running the following sed command:

grep "^P" og_opt_transfer.gfa > og_finito_paths.txt
sed -E 's|[0-9]+[-\+]\,||g' og_finito_paths.txt > path_issues_og.txt

If you check the contents of path_issues_og.txt, you will find the following:

P       PvirN.Pvir_N_chr4       875723430+      *
P       PvirN.Pvir_N_chr5       874282853+      *
P       PvirN.Pvir_N_chr6               *
P       PvirN.Pvir_N_chr7               *
P       PvirN.Pvir_N_chr8       128166406+      *
P       PvirN.Pvir_N_chr9       875435902+      *

The third column should have the last node of the path, but you can see that the affected paths do not because those had the trailing comma and thus also matched the sed expression.

I thought this was due to pruning the last nodes of a path from the graph, which then had to be removed from the P lines, and the function in odgi sort that does it is not removing the trailing ,. However, in this case, I removed nodes using odgi prune -TEc 1, which should remove only nodes having no paths crossing them. Hence, the P lines should not have been touched by odgi prune in the first place.

This may need more digging, but I would love to hear your thoughts.
Cheers,

@subwaystation
Copy link
Member

The pruning is indeed the problem. @AndreaGuarracino Any ideas how this can happen? Some empty steps are left over?

@sivico26
Copy link
Author

I just wanted to add that the pruning is also incomplete. When I try to prune this graph to remove nodes with 0 path coverage, with the commands mentioned before, the resulting graph still has nodes with no path going through them. Why would odgi prune do an incomplete job?

I have been busy lately, so I did not open another issue. But I can do it and explain it in more detail if you want,

@AndreaGuarracino
Copy link
Member

I've never used odgi prune too much. I can fix these issues but I need a few reproducible examples. @sivico26, can you "cut" the GFA to try getting smaller pieces that trigger bugs with particular odgi prune command lines?

@sivico26
Copy link
Author

sivico26 commented Feb 2, 2024

@AndreaGuarracino, honestly I have as much clue on how to cut it appropriately as you do... I guess a way of doing it would be to compute the path coverages, take the list of nodes with cov = 0, and randomly pick one and extract a big enough subgraph around such a node, and then pray test if the problem reproduces.

Do you imagine other approaches?

@AndreaGuarracino
Copy link
Member

I'm lazier than you xD I thought you could divide the graph in two (half of the paths in graph 1 and half of the paths in graph 2), take the part that still triggers the problem, and divide it in two, ... until you have a fairly small graph. Or your approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants