Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 47 pleafletfinder #69
Fix 47 pleafletfinder #69
Changes from 7 commits
cbe2af2
6acc133
ebe668f
df3b72c
140616b
5d1d946
ab81b90
bb65a59
56f39ac
146dcaa
0ad2998
d5813ac
063b76a
4765fd9
594b19e
4ddc64f
b7d99f7
8cfcb19
042e5bc
df64155
ce5d471
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional notes to add:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here that this does not respect PBC; possible TODO: work with
capped_distances()
insteadThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by PBC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Periodic_boundary_conditions
like Pacman geometry. Particles are simulated within a finite box of size [Lx, Ly, Lz], Particles near x=0 and x=Lx are actually very close to each other.
So the leaflet is actually an infinite sheet and has no "edge". Without considering periodic boundaries you'll identify the leaflets, but you'll miss some of the edges in the graph representation, which might be important for some analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. May I suggest the following. Have a working implementation without taking into account PBCs and without breaking the trajectory into multiple blocks. As soon as that is done and this PR can be accepted, merge it so that we do not miss the release this is intended for.
Next version of the implementation will take into account PBCs and also try to see how to parallelize over trajectory blocks.
@orbeckst , @richardjgowers do you agree with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with an initial version that does not do PBC if
Breaking trajectory into blocks can also wait in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc networkx is wholly Python
This function in MDAnalysis:
https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/lib/_cutil.pyx#L322
Is a c++ function which does the same, pass it node and edge indices and it will return a list of the indices in each subgraph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your comment correctly, you are suggesting to change from netwrokx to the function MDAnalysis implements, right?
When I started the implementation of the Leaflet Finder for my paper I used the class that exists in MDAnalysis (link). There networkx is being used. In addition, I was very cautious when I built my version to use highly used python packages for things I did not want to reimplement.
Did you reimplement a Connected component calculator or did that pre-exist Leaflet Finder?
Also, I am extremely bad with abbreviations and I have to google them all the time. Can you expand iirc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, iirc = if I remember correctly.
Yeah I think the MDAnalysis function will be faster. It was added after the original leaflet finder class.
If you want to keep this the same for a closer comparison to the original class that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardjgowers did you benchmark
find_fragments()
against networkx?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, we should update the serial LeafletFinder with capped_distances and find_fragments...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we use networkx for the initial version because this is what was described in the paper.
We can then open an new issue to optimize parallel leafletfinder and add PBC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
n_blocks
– it's not used at the moment. Omit?(Unless you immediately want to parallelize over blocks, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parallelization happens in a single frame.
blocks
specify the number of partitioning the data over.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it and change the way I used it in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, you are partitioning a (M, M) matrix A into sub-blocks. A_{ij} will represent a contact.
Because we evaluate the atomgroup with itself, this is symmetric, A_ij = A_ji. Couldn't you save on the order of M^2/2 evaluations by only taking the diagonal and lower triangle blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me give it a try and see what is happening with the tests. This is an exact transport from the code I used for the paper. I will fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have a look at optimizing this?
If it should be optimized, add it to the #72 issue.
For this PR we can put the code as it was for the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that this is currently ignore: no parallelization over trajectory blocks. Or are you already working on adding the parallelization over frames, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COuld be removed if
n_blocks
not used.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is being used to produce the correct number of tasks. Because it is optional, if it is not set we set it manually based on the number of jobs. Now that I think about it
n_jobs
is not really needed since I parallelize based on the number of blocks.