Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odgi extract somehow keeps the node which needs to be removed #577

Open
yeeus opened this issue Jun 19, 2024 · 2 comments
Open

odgi extract somehow keeps the node which needs to be removed #577

yeeus opened this issue Jun 19, 2024 · 2 comments

Comments

@yeeus
Copy link

yeeus commented Jun 19, 2024

I also met this problem using odgi extract like #572 , odgi somehow keeps the node which needs to be removed in my bed file

I have a node 1435085 in the raw graph which is too long and so simple with the depth only two:

$ odgi position -i HLA_H_to_J.gfa -g 1435087 -I
#target.graph.pos       target.path.pos dist.to.ref     strand.vs.ref
1435087,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700,167958,+      0       +
1435087,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138,167964,+   0       +

$ odgi depth -i HLA_H_to_J.gfa -g 1435087
#node.id        depth   depth.uniq
1435087 2       2

which needs to be removed in my case, so I made a bed file containing the intervals with length longer than 1k and depth lower than 5:

$ grep -e  '^HG00096_hap1_chr6\|NA21487_hap2_chr6' HLA_H_to_J.gfa.depth_low_5_1k.bed | less
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       56842   58458
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       85426   87672
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       113506  114634
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       126078  127190
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       128886  131659
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       139423  141156
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       149926  153037
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       161924  168157
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       322952  324609
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    56832   58448
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    113497  114624
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    126068  127180
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    128876  131667
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    139431  141164
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    149934  153048
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    161935  168163
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    322992  326051

but when I extracted with the following command, I got:

$ odgi extract -i HLA_H_to_J.gfa -t12 -P -I -b HLA_H_to_J.gfa.depth_low_5_1k.bed -R HLA_H_to_J.gfa.path_to_fully_retain.txt -o HLA_H_to_J.gfa.remove_depth_low_5_1k.og
$ odgi position -i HLA_H_to_J.gfa.remove_depth_low_5_1k.og -g 1435085 -I
#target.graph.pos       target.path.pos dist.to.ref     strand.vs.ref
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:114634-326513,47290,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:127190-326513,34734,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:153037-326513,8887,+  0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:141156-326513,20768,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:131659-326513,30265,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:87672-326513,74252,+  0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:0-322952,161924,+     0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:58458-326513,103466,+ 0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:127180-326553,34755,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:141164-326553,20771,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:114624-326553,47311,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:153048-326553,8887,+       0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:131667-326553,30268,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:0-322992,161935,+  0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:58448-326553,103487,+      0       +

$ odgi depth -i HLA_H_to_J.gfa.remove_depth_low_5_1k.og -g 1435085
#node.id        depth   depth.uniq
1435085 15      15

it seems the node was not be removed...

And I have two questions:

  1. Why the node wasn't be removed?
  2. What's the meaning of depth.uniq in the output of odgi depth, which seems always equals with depth?

Best wishes!

@zhangyixing3
Copy link

zhangyixing3 commented Jun 20, 2024

For the second question, depth.uniq is not equals with depth, One path may contain a node multiple times for PGGB gfa, in minigraph cactus gfa ,depth.uniq is equals with depth. My own understanding.

@yeeus
Copy link
Author

yeeus commented Jun 20, 2024

OK, fine. The graph I used was generated by minigraph cactus, so depth.uniq always equals with depth. And maybe dear developers (@ekg @subwaystation) can provide another output representing the depth for only one haplotype. That means in the above case:

#node.id        depth   depth.uniq
1435085 15      15

will be

#node.id        depth   depth.uniq     *something*
1435085 15      15    *2*(only in HG00096_hap1 and NA21487_hap2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants