-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSA on smoothxg GFAs #5
Comments
Yes, that makes sense. I think you're probably running across a lot of bubbles during the kmer generation. This is the basic flaw of the GCSA2 indexing strategy, at least as it's currently implemented. (We might simplify things for ourselves by just indexing the actual paths directly rather than the graph and its implied recombinations..) It's worth trying to get this to work though. Usually, by decreasing the graph complexity (with pruning) and/or reducing the GCSA2 index kmer size you can always build the index. I think you may need to use I would also just try to index with a much smaller kmer size for the GCSA2 index. For instance:
This would result in a |
This looks like the "small graph with many paths" scenario in the wiki. The |
Dear Erik,
I am trying to GCSA index a graph from
edyeet
->seqwish
->smoothxg
->vg view
->vg prune -r
.Input are 3 small genomes (<100Mb), each with around 40 "contigs". The graph has 190k segments and 240k edges.
Running
vg index
to generate the gcsa index results in very large tmp files (>2Tb) and practically does not finish.I am not sure where to start digging at the moment. I am trying to index the
seqwish
output now directly.Bests,
F
The text was updated successfully, but these errors were encountered: