-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpointing (quasi-Newton solver) #693
Conversation
code compiles; does nothing
checkpoint interface complete additional states need to be saved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check more later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is all my feedback.
src/Utils/hiopOptions.cpp
Outdated
|
||
constexpr char msgcf[] = "Path to checkpoint file. If present character '#' will be replaced " | ||
"with the iteration number."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation only replaces the first occurrence of '#'.
I thought about it some more over lunch, and I think You can get the IMHO, this is an improvement because if HiOp is part of a complex program, the outer program may want to encapsulate the HiOp part of the checkpoint in some group that's potentially several levels down from the DataStore root group. It also leaves the door open for their to be multiple instances of HiOp active in the running program that all need to be checkpointed. The current design can only support checkpointing of one HiOp instance in a given DataStore. |
I take it that |
{ | ||
using IndType = sidre::IndexType; | ||
const IndType size = vec->get_local_size(); | ||
sidre::View* dest = nlp_group->createViewAndAllocate(name, sidre::DOUBLE_ID, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LiDO, we often use the same DataStore for the whole run. If we were to call this multiple times, I think it will fail because the views would already exist.
Right now this method always makes a new view. It would be better if it checked first if the view already exists. If it already exists in the Group, use that one. If it doesn't already exist, create a new view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last commit does that. note that an exception will be thrown if the view exists and has a different number of elements than HiOp's state variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add minor comments
|
||
} | ||
|
||
void hiopAlgFilterIPMQuasiNewton::checkpointing_stuff() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style Guidelines.
There should be spaces before and after each operator, e.g. line 1675, 1655,...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style Guidelines. There should be spaces before and after each operator, e.g. line 1675, 1680,...
can you be more specific about the guideline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean space before and after operator "==".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our CMake build that uses BLT auto formats everything using clang-format (and a configuration file) with a make style
. Also make check
verifies that the code matches the clang-format style configuration, so a PR can't be merged without complying with the style.
The load/save methods offer the most flexibility for LiDO and its clients. I haven't double checked that we have the right HiOp object handle to call them. |
LiDO for sure instantiates the algorithm class somewhere in there. That "algorithm" object should be used. For example, attach it to the interface. Will provide an example in this PR. |
I'll try to look at it early this week. It's PA season.
…On Sun, Sep 22, 2024, 5:14 PM Cosmin G Petra ***@***.***> wrote:
@cnpetra <https://github.com/cnpetra> requested your review on: #693
<#693> Checkpointing (quasi-Newton
solver).
—
Reply to this email directly, view it on GitHub
<#693 (comment)>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACM3KAWWTXVWUBXXPWCVWLZX5MNFAVCNFSM6AAAAABNI3FQ5WVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGM2TKOJTGQ2DGMI>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments.
Adds API for checkpointing and support for restart via user options.
closes #686