-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging Octo-Tiger on Grace Hopper #496
Comments
@dmarce1 I tried to compile the new branch and I get the following error
|
cc @G-071 and @JiakunYan |
The code hangs here
|
Patrick-
This narrows it down a bit, looks like it is between entry into the main
loop and when the distributed part of the solver kicks in. I may need to
add some more debugging language to figure out exactly where it though. If
so I'll push something today.
Thanks
Dominic
…On Thu, Sep 19, 2024, 11:20 Patrick Diehl ***@***.***> wrote:
The code hangs here
New Omega = 9.687093e-01
t=21 END : DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver) (2.939855e+00 s elapsed)
TS 16:: t: 2.423595e+02, dt: 1.655967e-03, time_elapsed: 3.066656e+00, rotational_time: 2.347759e+02, x: 1.492980e+00, y: -4.062294e+00, z: -3.587781e-01, a: 3.727446e+00, ur: 2.053696e-06, ul: 2.037199e-06, vr: 6.826120e-01, vl: 6.800910e-01, dim: 0, ngrids: 8393, leafs: 7344, amr_boundaries: 5960
t=21 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver)
-----------------------------------------------
t=21 BEGIN: check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid)
t=21 END : check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid) (3.230400e-02 s elapsed)
t=21 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid)
t=21 BEGIN: gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid)
(rebalancing 8489 nodes with 7428 leaves)
t=21 END : gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid) (4.129000e-03 s elapsed)
t=21 BEGIN: scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid)
t=21 END : scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid) (1.547000e-02 s elapsed)
t=21 BEGIN: form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid)
(6072 amr boundaries)
t=21 END : form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid) (1.114730e-01 s elapsed)
t=21 BEGIN: solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid)
t=21 BEGIN: (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity)
t=22 END : (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity) (3.010200e-02 s elapsed)
t=22 END : solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid) (7.064400e-02 s elapsed)
t=22 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid) (2.020480e-01 s elapsed)
t=22 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver) (2.345370e-01 s elapsed)
t=22 END : main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver) (3.301262e+00 s elapsed)
t=22 BEGIN: main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver)
t=22 BEGIN: DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver)
—
Reply to this email directly, view it on GitHub
<#496 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO4RTXZOHKA3CNBHWCG23TZXL2VTAVCNFSM6AAAAABNSWBKFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRRGQ3DENJTGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Here is the new output
|
This is running without SILO output enabled? I think the problem may be SILO related. If it is being run with SILO output can you please re-run it with disable_output=on? |
I think I have the bug narrowed down to diagnostics(), it is likely in this section of code in node_server_actions_2.cpp. I have added some more debug output which will hopefully let us narrow it down further. EDIT: My bet is this is in all_hydro_bounds. If so, it may be hard to narrow it down using verbose debugging output past which kind of boundary exchange (there are three kinds, a) the restrict step which updates refined cells from their children, b) the decomp step which exchanges ghost cells between grids on the same level, and c) the AMR step which interpolates ghost cells at AMR boundaries). diagnostics_t node_server::diagnostics(const diagnostics_t &diags) { |
I just pushed a branch called verbose_debug. To enable the debugging output, set --verbose=1, to disable, --verbose=0. I've attach an example of the output. It gives comments at the beginning and end of functions, along with the start time and the time elapsed during the execution of the function. When the comment has something like "(from root)" this means the code is within a function that executes for each node, and only the root node is emitting output.
The text was updated successfully, but these errors were encountered: