Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libhio file/node on Lustre hangs #53

Open
gshipman opened this issue May 20, 2021 · 7 comments
Open

Libhio file/node on Lustre hangs #53

gshipman opened this issue May 20, 2021 · 7 comments

Comments

@gshipman
Copy link
Contributor

Hi folks,

I'm trying to use libhio on a lustre file system and it is hanging here (see stack trace below).
Interestingly it does not hang when using file/node on GPFS.

0x00002aaab5a9e9d3 in __pwrite64_nocancel () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00002aaab5a9e9d3 in __pwrite64_nocancel () from /lib64/libpthread.so.0
#1  0x00002aaab1dd3ebd in hioi_file_write (file=0xa, ptr=0x7fffffe1c4a0, count=1921280) at hio_internal.c:622
#2  0x00002aaab1ddae2c in builtin_posix_module_element_io_internal (posix_module=0xa, element=0x7fffffe1c4a0, 
    offset=1921280, iovec=0x2aaab5a9e9d3 <__pwrite64_nocancel+10>, count=-20878584, reading=128)
    at builtin-posix_component.c:2123
#3  0x00002aaab1dda7d6 in builtin_posix_module_process_reqs (dataset=0xa, reqs=0x7fffffe1c4a0, req_count=1921280)
    at builtin-posix_component.c:2233
#4  0x00002aaab1de3312 in hio_element_write_strided_nb (element=0xa, request=0x7fffffe1c4a0, offset=1921280, 
    reserved0=46912680618451, ptr=0x3a668ffec16b08, count=46912496271488, size=1921280, stride=0)
    at api/element_write.c:151
#5  0x00002aaab1de31cc in hio_element_write_strided (element=0xa, offset=140737486374048, reserved0=1921280, 
    ptr=0x2aaab5a9e9d3 <__pwrite64_nocancel+10>, count=16438317289663240, size=46912496271488, stride=69)
    at api/element_write.c:99
#6  0x00002aaab1de3191 in hio_element_write (element=0xa, offset=140737486374048, reserved0=1921280, 
    ptr=0x2aaab5a9e9d3 <__pwrite64_nocancel+10>, count=16438317289663240, size=46912496271488)
    at api/element_write.c:85
#7  0x000000000196e912 in hioc_writeat2 (unit=10, serial=-1981280, data=0x1d5100, 
    offset0=0x2aaab5a9e9d3 <__pwrite64_nocancel+10>, buf_bytes=1921280) at hio.c:752
#8  0x000000000196b875 in hio_module::my_hio_file_write2 (
    unit=<error reading variable: Cannot access memory at address 0xa>, serial=.FALSE., 
    numdata=<error reading variable: Cannot access memory at address 0x1d5100>, pos0=3779645987002334536, 
    data_c1=<error reading variable: Cannot access memory at address 0x3a668ffec16b48>, 
    data_i32=<error reading variable: Cannot access memory at address 0x2aaaaaad00c0>, 
    data_i64=<error reading variable: Location address is not set.>, 
    data_r32=<error reading variable: Location address is not set.>, 
    data_r64=<error reading variable: value requires 1921280 bytes, which is more than max-value-size>)
    at module_hio.f90:255
@hjelmn
Copy link
Collaborator

hjelmn commented May 21, 2021

Something is not right with that trace. Look at the count and size. Maybe turn on logging and see what the write size is there?

We tested heavily on Trinitite's lustre file system.

@gshipman
Copy link
Contributor Author

gshipman commented May 24, 2021

@hjelmn, what do I do to turn on logging? Does it require a recompile?
Also, I noted that I can run successfully using NFS, on two nodes, 32 PPN I get two data files as expected. If I take the exact same run setup (copy the entire directory over to Lustre) then I get the hang on CTS. I get the hang on both CTS and Trinity when using Lustre.

@hjelmn
Copy link
Collaborator

hjelmn commented May 24, 2021

Setting enable_tracing to true will produce a trace directory which will contain per-process IO traces. You set it on the hio context.

@gshipman
Copy link
Contributor Author

Okay, I've got verbosity and tracing working. Looks like the ENV variables work for this..

HIO_verbose=20  
HIO_enable_tracing=1

The short of it.. Unfortunately I'm not getting any additional information on Lustre. I do get full tracing on GPFS and NFS.
I'm wondering now if I should try to disable group locking Lustre.

The long of it.. Here is what I do get from my Lustre runs on a CTS1 system:
Nothing in trace/posix/trace.* although files exist for each rank.

In stdout I do get some verbose info, it hangs here:

Using libhio with context Cx60j dataset restart::0 with file mode = file_per_node, block size = 16777216, data_root = /path (DataWarp? no), uri = Cx60j-dmp000000
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] posix:dataset_open: successfully created posix dataset restart:0 on data root /path. open time 28983 usec. file mode: 1
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] posix: created element 0x5f057a0 (identifier data) for dataset restart
 writing chunk mixed cell values to save   4.28E+01 Mbytes
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix: finished write. bytes transferred: 1921216, time: 12877 usec

Here is the full tracing prior to this..

Creating PIO file: Cx60j-dmp000000   cycle =         0
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix: created module for data root /path. using umask 7
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] posix: created module for data root /path. using umask 7
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Opening dataset restart::0 with flags 0x242 with backend module 0x58c8770
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting value for dataset_file_mode to file_per_node from API
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting value for dataset_posix_directory_mode to single from API
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix: unlinking existing dataset restart::0
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] Opening dataset restart::0 with flags 0x242 with backend module 0x5ece700
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] filesystem query: path: /path, type: 1, flags: 0x7, block s
ize: 1048576 block count: 9881195790 blocks free: 7304476361 stripe count: 2 stripe max count: 240 stripe unit: 65536 stripe size: 0 stripe max size
: 4294967296
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix: successfully created dataset directories /path/Cx60j
-dmp000000
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Loading dataset manifest on restart:0
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] compatibility version of manifest: 3.0
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_filesystem_type
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_expected_size
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_data_root
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_buffer_size
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_uri
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_file_mode
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_posix_directory_mode
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_block_size
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key lock_mode
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key posix_stripe_exclusivity
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key stripe_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key stripe_size
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key raid_level
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key posix_file_api
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] Setting enumeration value to 2
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest configuration key dataset_use_bzip
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key bytes_read
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key bytes_written
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key flush_time
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key flush_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key write_time_usec
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key read_time_usec
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key read_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key write_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key close_time
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_bytes_read
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_flush_time
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] Loading dataset manifest on restart:0
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] compatibility version of manifest: 3.0
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_filesystem_type
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_expected_size
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_data_root
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_flush_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_bytes_written
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_write_time_usec
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_read_time_usec
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_read_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_write_count
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] found manifest performance key aggregate_close_time
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_buffer_size
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_uri
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_file_mode
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_posix_directory_mode
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_block_size
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key lock_mode
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] Setting enumeration value to 1
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key posix_stripe_exclusivity
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key stripe_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key stripe_size
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key raid_level
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key posix_file_api
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] Setting enumeration value to 2
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest configuration key dataset_use_bzip
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key bytes_read
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key bytes_written
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key flush_time
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key flush_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key write_time_usec
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key read_time_usec
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key read_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key write_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key close_time
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_bytes_read
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_flush_time
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_flush_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_bytes_written
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_write_time_usec
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_read_time_usec
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_read_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_write_count
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] found manifest performance key aggregate_close_time
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix:dataset_open: successfully created posix dataset restart:0 on data root /path. open time 29006 usec. file mode: 1
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix: created element 0x58cbe10 (identifier data) for dataset restart
Using libhio with context Cx60j dataset restart::0 with file mode = file_per_node, block size = 16777216, data_root = /path (DataWarp? no), uri = Cx60j-dmp000000
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] posix:dataset_open: successfully created posix dataset restart:0 on data root /path. open time 28983 usec. file mode: 1
2021-05-24 19:46:15 [hio:20 sn048:1 Cx60j] posix: created element 0x5f057a0 (identifier data) for dataset restart
 writing chunk mixed cell values to save   4.28E+01 Mbytes
2021-05-24 19:46:15 [hio:20 sn048:0 Cx60j] posix: finished write. bytes transferred: 1921216, time: 12877 usec

@hjelmn
Copy link
Collaborator

hjelmn commented May 25, 2021

Yeah. I would try disabling group locking. It is a bug lustre feature. I was able to hard lock trinitite nodes with it at times.

@gshipman
Copy link
Contributor Author

Okay, I think I can mod this

fs_attr->fs_flags |= HIO_FS_SUPPORTS_STRIPING | HIO_FS_SUPPORTS_RAID | HIO_FS_SUPPORTS_BLOCK_LOCKING;

Just remove the HIO_FS_SUPPORTS_BLOCK_LOCKING.

@hjelmn
Copy link
Collaborator

hjelmn commented May 25, 2021

Yeah. That should do it. Let me know if that makes a difference. I don't remember if I tested that on CTS-1 or not. Should have been.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants